Painful lessons in data journalism: scraping with Python

3932023011_32e5e18838_z

Lost in the woods. CC-licensed, Chris-Håvard Berge on Flickr.

Lost and found ads can be a good way to sniff out a story.

Take the ones on Craigslist about iPhones. There’s a woman who gained a husband in a quickie wedding at city hall but left her iPhone behind. Or a drunk college kid who dropped his phone on the passenger seat of a good samaritan who took him home.

Is there a bigger story about lost and stolen iPhones? To find out, I scraped all 50 states of Craigslist lost and found ads using Python and BeautifulSoup. If you want to check out or improve that code, it’s on GitHub. The full story (with charts and things!) is over at Cult of Mac.

The project required more fist clenching and eye straining than anticipated – even though writing a basic scraper for Craigslist is considered an easy-peasy programming project.

Let me just say it: as a novice Pythonista, I am challenged by nearly everything. I mean, command line interface, seriously? But I can get past that. I slogged through (and recommend) Learning Python the Hard Way, as well as finished some examples in Scraping for Journalists.
Continue reading

Check out Soundtracker, like Pandora for Italian music


As someone who has a hard time remembering what it was like to listen to music before you could hit “shuffle” or curate a digital playlist, I’m a big fan of automated music recommendation and Internet radio service Pandora.

But that streaming service offers almost no Italian music, whether you want classic folk, pop power ballads or moody dubs in dialect.

Enter Soundtracker,  launched in 2010 by two Italian entrepreneurs. Best part: it offers a lot more than just Italian music and the interface is in English.

Register for the site (it’s free) and start listening to artists you know before stone-stepping to those you don’t.

Start with Pino Daniele and you’ll soon be listening to Quintorigo, Almamegretta, 99 Posse and Bandabardo’.

Not sure how the algorithm works, but  it seems a little more freewheeling than Pandora — starting with 70s melodic rocker with a social conscience Fabrizio De’ Andre station got me to an aggro hip-hop number from Caparezza in under four tracks.

You can also download it as an app for your iPhone, Windows Phone 7 and, if you’re so inclined, share your location and tracks with your friends.

Buon ascolto!