Adventures in data cleaning: Did the New York Times undercount risky San Francisco skyscrapers?

It looks like the New York Times may have undercounted the number of risky skyscrapers in downtown San Francisco, 48 instead of 39. It’s a seemingly small difference – 20 percent if you do the math – but if it’s significant if you consider how many people work in these large buildings. A June 15 story focused on steel moment buildings cited in a USGS report.

I made a quick map using the addresses from the NYT story, then I wanted to make one that included photos of the buildings. This time I went directly to the report, noticing that the first address wasn’t listed in the story, it seemed like a good idea to see if there were any more discrepancies.

TL;DR

To check how I got them:

  • The USGS report, starting on page 360, in .PDF
  • The .KML file I made, for more fact-checking and map making (pretty please send links to your maps or put them in the comments – I’m a casual mapper, using new tools and working quickly!)
  • Here are the additional nine addresses from the report that weren’t in the NYT story:
  1. The Mills Building, 221 Montgomery Street
  2. 225 Bush Street
  3. 140 Montgomery Street
  4. 120 Montgomery Street
  5. 45 Fremont Street
  6. 55 2nd Street
  7. 555 Mission Street
  8. 611 Folsom Street
  9. 680 Folsom Street

The clumsy adventure

To start, I downloaded the 454-page .PDF, then extracted five pages with the buildings listed by using the >Print>Pages>Save as .PDF function in Preview for Mac. Then I converted the .PDF to .CSV with Sejda. After that, it was time for Terminal to merge the extracted data from those pages into one file with the command:

cat *.csv >merged.csv

Still too messy to be useful without a lot of tedious cleanup:

So I tried the quickest and dirtiest way I know: copy the table from the .PDF into Word, then from Word (where it’s recognized as a table) copy it into Excel.

There are a couple hundred buildings listed, but the ones cited in the story are steel moment frames. Erected before a 1994 building code outlawed a flawed welding technique, they harbor particular risk in a quake of magnitude seven or higher.

From the USGS report: Steel moment frame listed as “Steel MF,” “Steel moment frame” and “MF.”

From there it was a question of sorting the buildings listed as “Steel MF,” noting that a couple are listed alternatively as “Steel moment frame” and one as simply as “MF.” Messy messy messy: also, totally typical. (There were also about 15 more listed as Steel MF in combination with some other reinforcement, since it would require more reporting to figure out if they’re as risky, these were left out.)

Then I checked the addresses against the story, added polygons for the nine new addresses to the previous uMap, downloaded it as a .KML file and started playing around in Google Maps.

The resulting map is a little disappointing. For starters, the polygons from uMap (which uses OpenStreetMap) don’t jibe that well with Google. As for the images – since the real a-ha if you live or work in San Francisco is how many of these buildings you’re in or around – I always forget how bad these are in the noob version of Google Maps. When you’re editing in the map, they are Polaroid-style pop-ups that resize whatever pic you throw in. The published version looks nothing like that and the overall effect with these building shots (all vertical) is horrific. Ugh. There’s no way to resize the window from this version of Google Maps – the alternatives are Google Fusion tables (which wouldn’t solve the problem here since AFAIK it works with points, not polygons) or  programming via the Google Maps API.

Why this happened

So how did the New York Times undercount the number of especially shaky high rises? Going on my experience with newsrooms (long) and with data (short but painful) my first guess is that the USGS mistakenly gave the Times an Excel or .CSV file that was different from what ended up in the final report.

The reporter knew there were enough buildings to warrant a story, somewhere around 40, the graphics person had the file, made the map and those numbers were plugged into the story and fact checked without going back to the published report.

Or there was some glitch between the formats – given how annoying the process of getting information from .PDF into anything – it’s easy enough. Data cleaning is the least interesting, most tedious part of any project. In this case, if I’m right, there are 20 percent more risky buildings than originally reported.

A quickie map of San Francisco’s earthquake prone skyscrapers


See full screen

See full screen – search for San Francisco if you see a world map.

The New York Times recently ran a story about San Francisco high rises – mostly downtown and South of Market – with steel frames that harbor particular risk in a quake of magnitude seven or higher. About 40 of these skyscrapers, erected before a 1994 building code outlawed a flawed welding technique, were cited in an April USGS report.

It’s one of those stories that could’ve used in interactive map at its core, but instead (it’s the news business, kid!) the map was a small, static graphic (see below) and the story ended with a list of the addresses.

Image courtesy NYT.

So here’s a simple map of those 39 steel moment-frame buildings. A few necessary caveats: this is the handiwork of a casual mapper trying out a new tool. I’ve been looking for a way to use OpenStreetMap to make personalized maps and spotted some earthquake maps from the Japanese OSM community with uMap, so it seemed worth a try. It was heavy going for a map made on the fly – the polygon tool was clunky and importing the list as a cleaned up .CSV wasn’t happening.

Still, a few things pop out: A few of these risky buildings are also near construction sites. In OSM, these are shown in sage green. (The light green represents parks.)

The struggle to use the uMap polygon tool is real. This is a closeup of 550 California Street, with a 19-story office building under construction nearby.

The Folsom Bay Tower will be a 39-story, 422-foot (129 m) residential skyscraper.

Park Tower at Transbay will have 43 stories, First & Mission’s Oceanwide Center features 636-foot-tall tower on Mission at First Street and a 910-foot-tall tower on the opposite corner on First Street.

And much like the reporter, shocked to discover the NYT offices are in one of these buildings, there were a few a-ha moments. A family member works in one and I’ve been inside at least a handful recently – an event at Autodesk, a movie at Embarcadero Center, a Wikimedia meetup, met a friend staying at the Marriott, emerged from the Montgomery Street Station in front of one three or four times, etc.

It’s an unscientific sample size of one (well, two if you count the reporter) but would wager that most people who live or work in San Francisco are around, if not inside, these buildings frequently.

Pompeian red? It’s actually ochre, researchers say

Pompeii: all about ochre?

Those rich reds adorning paintings in Pompeii were originally ochre —  Italian researchers say they now think that sensuous Pompeian red is the result of an accident.

Researchers at the national science council (CNR) say the original signature color at the ill-fated city of Pompeii was probably yellow –  ochre to be specific.

Before Mount Vesuvius blew its top in 79 A.D. and buried the city, it emitted high-temperature gas which turned the original yellow color that dark red. It’s not an entirely new discovery – ochre was also the main color at Herculaneum, sister city also buried by Vesuvius.

“Thanks to the investigations we have ascertained that the symbolic color of the archaeological sites in Campania is the result of the action of high temperature gas leakage which preceded the eruption of Vesuvius in 79 A.D.,” says Sergio Omarini of CNR.
“Experts already knew about the color alteration, but this research makes it finally possible to quantify to the extent of it.”

Researchers went back to texts by Pliny and Vitruvius to see how their contemporaries made red – cinnabar, mercury compound, red lead, lead compound and the rarest and most expensive pigments, mainly used in the paintings.

To check out the composition in the paintings, scientists used a non-invasive X-ray fluorescence (XRF) spectrometer that reveals the presence of chemical elements that exclude red lead and cinnabar – leading them to believe ochre was the original color.

Somehow Pompeian ochre just doesn’t lend the same tone.

Got a buck? Help out real-life Da Vinci Code quest for lost Leonardo

This is just about as cheap a thrill as they get: by pledging even just a dollar, you can help fund a project to find a lost Leonardo Da Vinci fresco in Florence, Italy.

Photographer Dave Yoder has been working on for a number of years  on a quest funded by the National Geographic Society to uncover The Battle of Anghiari in Palazzo Vecchio.

Because of the complications of doing just about anything in Italy – this involves going between ancient palazzo walls after all — it requires expensive expertise.

He’s put up a Kickstarter page to fund the sci-fi movie-worthy gamma camera needed to locate the painting which probably lies between the walls. (It seems Vasari couldn’t bring himself to cover Leonardo’s masterpiece when commissioned to paint over it in 1563).

Dave is a friend and it’s a fascinating project – one I also enjoyed reporting on — so I hope you’ll consider kicking in what you might spend on a cappuccino. Higher pledges $35 and up will earn you a digital e-book or prints of the project.

You can check out his pics on the project so far here.

To donate or for more information, see Kickstarter

Check out Soundtracker, like Pandora for Italian music


As someone who has a hard time remembering what it was like to listen to music before you could hit “shuffle” or curate a digital playlist, I’m a big fan of automated music recommendation and Internet radio service Pandora.

But that streaming service offers almost no Italian music, whether you want classic folk, pop power ballads or moody dubs in dialect.

Enter Soundtracker,  launched in 2010 by two Italian entrepreneurs. Best part: it offers a lot more than just Italian music and the interface is in English.

Register for the site (it’s free) and start listening to artists you know before stone-stepping to those you don’t.

Start with Pino Daniele and you’ll soon be listening to Quintorigo, Almamegretta, 99 Posse and Bandabardo’.

Not sure how the algorithm works, but  it seems a little more freewheeling than Pandora — starting with 70s melodic rocker with a social conscience Fabrizio De’ Andre station got me to an aggro hip-hop number from Caparezza in under four tracks.

You can also download it as an app for your iPhone, Windows Phone 7 and, if you’re so inclined, share your location and tracks with your friends.

Buon ascolto!

Social Media? Italians Prefer Chatting in Cafes

If you’ve spent any time in Italy, the results of a new survey won’t surprise you: Italians still prefer socializing in person, usually at the neighborhood cafe, to social media.

Some 1,200 people polled by apéritif maker Sanbitter — via Facebook — found that most Italians still prefer to discuss the matters of the day in person at a cafe first before heading online to update their far-flung friends and relatives about it.

What are Italians hashing out over  caffe’ macchiato or a glass of Prosecco before tweeting about it?

Nearly half (48%) are talking politics, 42% discuss sports (read: soccer), while work, gossip and shopping are about the same (37%, 35%, 33% respectively). Last but not least, movies 25%.

Social media will get a strong foothold in the boot country, probably sooner rather than later. There are already more cell phones than Italians and the national penchant for updating via SMS messages has produced everything from poetry contests to price checks and charity efforts.

And, let’s not forget, the Italian fascination with social media led to the first movie ever about Facebook, a 2009 romantic comedy of errors called “Feisbum.”

Caravaggio’s Bacchus Seduces in Hi-Res Imagery

By Nicole Martinelli Tourists have long crowded in museums to admire Caravaggio’s Bacchus, but a new 3.4 billion-pixel image of the painting allows for an amazingly detailed look at an old master’s work from your computer screen.

It’s the first in a series of super-high-resolution digital versions of masterpieces from Italy’s Uffizi Gallery, including Sandro Botticelli’s The Birth of Venus and the Annunciation by Leonardo da Vinci.

This image of Bacchus makes Michelangelo Merisi da Caravaggio’s revolutionary realism — as seen in the gritty fingernails of his reclining model in the sensual painting nicknamed “drunk Bacchus” — easy to zoom in on and linger over.

Minute details usually mulled over by art historians, such as the rumored self-portrait of the artist reflected in the wine decanter, are just a few clicks away. The Tuesday launch is a kind of love letter to the Baroque bad boy, believed born on this day in 1571.

It’s the latest project from HAL9000, a company specializing in art photography that captured a high-res version of Leonardo Da Vinci’s The Last Supper three years ago.

Continue reading

Italy’s New Driving Laws: Go Faster, Just Don’t Drink

The Italian government recently passed a series of strict new driving laws that will affect locals and tourists on the roads in the Bel Paese.

A few of the new rules to keep in mind:

  • DUIs. No more jail time for drivers with a blood alcohol level (BAC) of 0.08 to 0.05 (already stricter than many places, including the US) but fines are a lot heftier, ranging from 500 to 2,000 euros. (In lieu of jail time, there are plans to institute community service and driver’s ed courses.) Those fines double if you cause an accident and your car can also be impounded for up to 180 days. If you cause an accident with a BAC of 1.5, your license will be suspended for two years. If your driver’s license is suspended for drunk driving, forget about driving anything for awhile. You can no longer drive a scooter or mini car (like an Ape), either. Drivers under age 21 or anyone who hasn’t had a license for more than three years cannot drink alcohol and drive — period. Fines for these drivers with a BAC of “zero to 0.5” start at 155 to 624 euros, double if they cause an accident and increase along with BAC levels exponentially.
  • Drugs. Jail time has been doubled for drivers found under the influence of drugs, from three to six months. Convicted drug users will have their licenses revoked — instead of suspended as previously — if they are found at fault in an accident. Police officers will also have drug-test kits with them instead of taking suspected drug users in for hospital tests.
  • Speed limits. The speed limit remains 130 km/h speed limit (80 mph) on most Italian autostrade, but shoots up to 150 km/h on autostrade with “tutor” speed limit cameras installed.
  • Scooters. Now required to wear goggles or eye protection “where necessary.”  Scooter licenses will also require a practice driving test.
  • Bicycles. Cyclists are now required to wear reflective vests at night.

As far as I know, the complete law hasn’t been published in English yet. The Transport Ministry has a complete list of all the articles in the law, you could do worse than use Google Translate on it meanwhile.

Photo used with a Creative Commons license, thanks to cruelgargle on flickr.

High-Tech Referee Help for Soccer

It won’t be able to change the contested calls in the World Cup, but scientists at Italy’s National Research  Council are working on a host of non-invasive solutions that would help referees judge games.

In Bari, at the Institute of Intelligent Systems for Automation (Issia), researchers are perfecting a prototype system that has already been tested on the field for games of the Udine team.

It’s basically about 10 high-speed cameras in what are typically referee blind spots. There are four cameras aimed at catching “phantom” goals and either six or eight to judge those ever-shifting offsides violations.

The high-speed cameras capture about 200 images per second and are fully automatic. They can record, process and transmit video sequences in just a few seconds and send results wirelessly to the linesman.

It’s about time to end these hair-pulling, damning-the-ref moments, right?

But until now there has been a lot of resistance to implementing these systems.
Back in 2004, I wrote about a similar computer-based system that Italian researchers were hoping the national teams and FIFA would adopt for Newsweek.

I had no idea that it would be such a controversial story — with Italian league officials refusing to speak about it and the FIFA flak brushing off the idea of tech referee help by saying, “Football is a game played by humans that should be judged by humans.”

Another concern — that didn’t make it into the article —  was that these systems would be so expensive that some countries wouldn’t be able to afford them, resulting in a de facto major league based on economics. It would ruin the global aspect of the sport if the games were judged more precisely and differently in just a handful of countries where the game is played.

After forcefully resisting technology, now at least FIFA president Sepp Blatter is willing to consider it, telling AP it’s time that “we have to open again this file, definitely.”

Video: Italian Hand Speak

Inspired by Sara Rosso’s video of Italians dancing with their hands, I took my Flip HD out to Milan’s Piazza Duomo to capture a bit of hand jive for practice.

A couple of random observations: most of the pairs, for as much as they vary in age, sex, etc., have one person doing the talking and the gesticulating. Non-Italians often think everyone here flails with their arms as they speak, but as you can see, the movements are more like punctuation: concise, controlled, specific.

My favorite is probably the guy near the metro stairs who “draws” elaborate figures while entertaining his friend. This guy really did quite a dance around with his arms and was hypnotizing to watch.

This was a lot harder to shoot than I would’ve thought: even in Milan where you can easily stumble out for a cappuccino find yourself in a fashion shoot, a movie set or someone’s holiday snaps, people are aware you’re filming them. (Sara turned her camera on some relatives for those great close-ups).

I’d like to shoot a companion version in Southern Italy for contrast — next time I’m closer to the Boot heel I will — but I expect that there it’ll be even more challenging to get close enough with such a small camera.