"We have 458,832 pages of documents. 23,654 of you have reviewed 206,594 of them. Only 252,238 to go..."This is a great example of crowdsourcing in journalism. Guardian posted online half a million MPs' receipts and claim forms and ask readers to review and report fishy ones for investigation. As you can see by the numbers, the public has responded and the discoveries are fascinating to read unless you are an MP. This kind of journalism deserves to be called "journalism by the people and for the people."
[Aug 26 update] I just heard this morning on NPR that the New York Times and Washington Post are also using crowdsourcing and asking readers to help dig through hundreds of pages of CIA inspector general's reports on the agency's interrogation program that the Department of Justice released Monday this week.
#2: Interactive-MP-Media-Appearance-Timeline: mashup from BBC and Guardian's websites
#3: Mashup screencasts
"By crawling this data using the MP's name as the search key we were able to extract information about the TV and radio programmes in which a given MP had appeared. ... This project shows how powerful the linked-data concept is when used in conjunction with other data that has been exposed in a similar way. As more media organisations expose their domains in this manner, more interesting and wide-reaching visualisations and web-applications can be built."
This is another good example of data mining journalism.
Interesting features can emerge from digging into a media organization's own archive and, better yet, in linking into other media organizations' data. Public broadcasting has vast amount of archive which can be potential gold mine for data mining journalism. But to turn it into gold, we have to overcome several big obstacles.
- Digitization. Most of the archive is still on tape or other "ancient" formats. Digitizing them will be a costly and time-consuming task. But the work has begun under the American Archive Project.
- Metadata standardization. For effective data mining journalism, digital content scattered in hundreds of stations has to be stored and catalogued using the same metadata scheme so that computers can quickly search, assemble and present it in a meaningful way.
- Technical infrasture and training. When the gold mine is ready, we still need machinaries and skilled workers to dig it. Ideally the technology will be designed to be so easy to use that even journalists and the public can handle it without much training.
It's a very charming and intriguing profile of the left (Jim Buckmaster, CEO) and right (Craig Newmark, founder) halves of the brain of Craigslist. NewMark calls himself "the Forrest Gump of the Internet" and Jim Buckmaster writes haiku to fight spammers. They are so unlike business executives yet very successful in what they do. One key principle they stubbonly adhere to is listening to the users, even when what they hear goes against the norm and trend of the world. They never go against what their users tell them.
- "With more than 47 million unique users every month in the US alone—nearly a fifth of the nation's adult population—it is the most important community site going and yet the most underdeveloped. Think of any Web feature that has become popular in the past 10 years: Chances are craigslist has considered it and rejected it. If you try to build a third-party application designed to make craigslist work better, the management will almost certainly throw up technical roadblocks to shut you down."
- "Craigslist gets more traffic than either eBay or Amazon .com. eBay has more than 16,000 employees. Amazon has more than 20,000. Craigslist has 30."
- "Jim Buckmaster is tall and thin, Newmark is short and round, and when they stand together they look like a binary number."
- "Only programmers, customer service reps, and accounting staff work at craigslist. There is no business development, no human resources, no sales. As a result, there are no meetings."