This is part 1 of a multi-part series. Part 2 is here.
Bike theft sucks. For quite some time, I’ve been wanting to compile some stats on bike theft in order to understand how and where it happens and how it can be prevented. I archived San Francisco craigslist listings from July 2nd, 2011 til October 17th, 2011 in the “bikes” section with “stolen” in the title. This includes listings all over the Bay Area – San Jose, Mountain View, Oakland, Berkeley, San Leandro, and so on. I was curious about how bikes get stolen and what kind of patterns there were in the posts and incidents. I’ve compiled a little graph that shows the frequency of words in the listings. I used a messy combination of Google Refine, Google Spreadshets and Open Office Calc and a word frequency counter that was a bit slow but quite useful. The data consists of about 633 entries.
We’ll see what other data I can squeeze out of the set in the future.
I had to do some manual cleaning of the data – I purposely omitted words under 4 letters in length, like “the”, “a” and so on, since they probably wouldn’t have been too informative. There were also various html-related words like “http”, “href” and “nofollow” that I removed as well.
Here’s a graph of the word counts:
and here’s the corresponding table:
some interesting things:
A lot of the terms are fairly obvious, but some words stick out. Most of the posts are obviously requests for help and the words show this – “please”, “reward”, “thanks”, “return”. As far as bike companies go, it appears that “shimano” dominates the component world and “specialized” is the most popular bicycle manufacturer. The color “black” is the most popular, but “white”, “blue” and “silver” also show up. Another word that stands out is “photobucket”, the popular image-sharing site.
I’m going to try to continue poring over this data and see if anything else interesting emerges.