Painful lessons in data journalism: scraping with Python

3932023011_32e5e18838_z

Lost in the woods. CC-licensed, Chris-Håvard Berge on Flickr.

Lost and found ads can be a good way to sniff out a story.

Take the ones on Craigslist about iPhones. There’s a woman who gained a husband in a quickie wedding at city hall but left her iPhone behind. Or a drunk college kid who dropped his phone on the passenger seat of a good samaritan who took him home.

Is there a bigger story about lost and stolen iPhones? To find out, I scraped all 50 states of Craigslist lost and found ads using Python and BeautifulSoup. If you want to check out or improve that code, it’s on GitHub. Here’s the full story, with charts and things!

The project required more fist clenching and eye straining than anticipated – even though writing a basic scraper for Craigslist is considered an easy-peasy programming project.

Let me just say it: as a novice Pythonista, I am challenged by nearly everything. I mean, command line interface, seriously? But I can get past that. I slogged through (and recommend) Learning Python the Hard Way, as well as finished some examples in Scraping for Journalists.
Continue reading