After years of struggling to beat Netflix's Cinematch recommendation algorithm by a baseline of 10%, two groups have emerged. While both teams produced qualifying systems, BellKor's Pragmatic Chaos submitted their entry 24 minutes earlier than 2nd prize team The Ensemble. Earlier this year ReadWriteWeb covered the Netflix Prize and asked the question, "Will the $1 million dollars be won in 2009?" While the answer is a resounding "yes", it was not January forerunner BellKor that took the prize, but rather an amalgamation of 4 teams that triumphed.
Sponsor
As reported a month ago, a group made up of researchers from AT&T, Yahoo! Research Israel, Commendo Research and Consulting in Austria, and Montreal's Pragmatic Theory announced having beaten Cinematch by 10%. As per the Netflix Prize rules, other teams were given 30 days to submit their entries before a winner was declared. With only 24 hours before the contest deadline, two teams jockeyed for position on the Netflix Prize leaderboard. BellKor posted both an additional Netflix submission and a blog post documenting those last excitement-filled hours of the competition.
Of the thousands of entries, Gavin Potter, a retired management consultant with no formal machine-learning training managed to rise to number 17 on the Netflix Prize Leaderboard. Potter writes, "The competition has trained several hundred, if not more, people how to properly implement machine learning algorithms on a real world, large scale dataset...This is, almost undoubtedly, the world's largest set of data on repeated decision making and it's ripe for analysis. The analysis may not win the competition, but it sure should provide some insights into the way that humans make decisions."
The public knowledge acquired from the process of producing these algorithms will not only affect Netflix's ability to suggest customer desires across its movie titles, but it will also form a baseline for other business systems. In addition to streaming entertainment providers, companies like Amazon and Pandora have worked hard to produce the best possible predictive technologies. If these company can tap into our unique tastes, they can suggest products and services we didn't even know we wanted. So a 10% improvement on recommendations can equate to a lucrative sales increase.
A second shorter term Netflix prize is expected in the near future. According to the New York Times' Steve Lohr, the Netflix Prize 2 will be concerned with "taste profiles" based on demographic and behavioral data.
Are you pissed off about potholes, graffiti or broken street lights? Similar to the Federal government's efforts with Data.gov and Google's recent Public Sector release, CitySourced is offering users a chance to take government matters into their own hands. This year's TC50 third place runner-up, CitySourced is a crowd pleaser on a number of levels. If you're the type of person who writes letters to congressmen, editors and counsillors, you're likely to help power this app.
Sponsor
CitySourced offers citizens a chance to photograph their local pet peeves directly from their iPhones. Users send their pictures and complaints to their local municipalities with a couple clicks. From here, governments are recognize the needs of their constituencies and are forced to take action.
While programs like Apps for America and Apps for Democracy work to crowd source programmer-driven applications, CitySourced can be utilized by a non-technical user. In addition to the decision-making data being generated from this service, cities also offer users an active outlet for their frustrations. Instead of sending out arbitrary rants and suggestions to their Twitter accounts, users still get a chance to complain while receiving a direct line to their municipal reps. If cities have the courage to make these complaints public, the site could become as entertaining as Craigslist's Best Of page while still maintaining its usefulness.
If you're a lobbyist / advocate, conspiracy theorist or Freakonomics fan, then you'll love DataMasher. The map-based mash up site just took the Sunlight Foundation's $10,000 grand prize in theApps for America 2: The Data.gov Challenge. DataMasher offers users with no programming experience a chance to compare government data sets on a state-by-state basis. The tool is just one of the 3rd party mash ups using Data.gov's federal government information.
Sponsor
While the last Apps for America challenge focused on Congressional tracking, this new challenge encouraged participants to use Data.gov's raw machine-readable data. Developers pulled stats from a slew of Federal agencies including the Bureau of Justice, the Bureau of Transportation and the Agency for Healthcare Research and Quality. Although this may seem like an easy feat, a number of government and semi-public agencies have been criticized for refusing to standardize public data. This recent Apps for America challenge is meant to encourage government transparency on all levels for the purpose of creating new citizen-driven solutions. Below are the winners of the challenge: First Place: DataMasher: This site offers an easy-to-use interface that allows regular citizens to combine and mix data sets without any programming knowledge. From here, data is displayed on a State-by-State basis in map format. Compare cancer hot spots to CO2 emissions, SAT scores to crime rates and even political contributions to State spending.
Second Place: Gov Pulse: This application allows users to browse the Federal Register and create feeds on the most important proposals and information. Users can browse the latest government-related notices, respond to regulatory amendments and comment on everything from endangered species to homeland security.
Third Place: This We Know: This application gives you government-related info based on your zip code. It offers information on the number of factories within a 7 mile radius, the number of pounds of pollutants released, violent crime rates, cancer rates and related bills in Congress. This would actually be a great tool for environmental health advocates looking to make the connection between cancer hot spots and chemical pollutants.
Best Data Visualization: Quakespotter: This site creates a 3d visualization of earthquakes and matches it to data taken from those areas on Twitter.
According to this year's Comscore stats, consumer publishing platform Wikia has surpassed DIY social network competitor Ning for monthly unique visitors. Since July 2008 the company's traffic has more than doubled from 2.8 million to 6.5 million unique US visitors per month. Despite abandoning Wikia search in early March, it seems Wikipedia co-founder Jimmy Wales has built another great company. As of this evening, Wikia's CEO Gil Penchina is announcing the company's profitability due to its custom sponsorships program.
Sponsor
Says Penchina, "I'm sick and tired of hearing about these dead pooled companies. In this type of economy we're excited to announce our growth and profitability. I think we're about to see a bunch of success stories. Silicon Valley is finally getting its mojo back."
While Wikia hosts nearly 3 million pages of content with a number of niche community sites, it's the fan pages that drive the majority of advertising and marketing revenue. Wikia's small team of less than 10 sales staff create packages that consist of everything from branded banner ads to embedded shows and contests. In addition to sponsors like World of Warcraft, a number of television studios are also in partnership talks.
Says Penchina, "In many cases, these sites are like small franchises and the editors are really dedicated. The input we've had from editors regarding advertising are suggestions I generally agree with." In the World of Warcraft Wiki the community has asked that no advertisements be permitted that might negatively affect game play. For this reason, Penchina's team does not allow advertisements for WoW gold.
By providing an environment where die hard fans and premium brands can coexist, Wikia is doing a great job maintaining its authenticity while also turning a profit. While the service has struggled to establish itself as a separate brand from its Wikipedia origins, it appears that the fan communities have done everything they can to make it a success from the ground up.
Prior to 2001, gilded hard cover encyclopedias were cracked to fact check everything from raptor names to State capitals. Today the world's most popular English encyclopedia is more often used to identify pop culture icons and social media companies. A recent Telegraph article listed the 50 most-viewed Wikipedia articles of 2008 and 2009 and while the results are slightly inaccurate, they're pretty interesting. Below are this year's most visited Wikipedia pages measured in hits per day.
Sponsor
1. Wiki (131,383 page hits per day): For both 2008 and 2009 the "Wiki" page and the Wikipedia page have maintained a spot in the top 10 visited pages. It's fairly safe to say that the majority of visitors to these articles are looking for definitions, community information and editing tips.
2. The Beatles (111,896): In the Telegraph's list for 2008, two different Beatles pages are listed as numbers 14 and 18 for 2008; however, according to the original Wikistics source statistics the "Beatles" page is ranked at number 20. In 2009, the page became the second most visited page on Wikipedia due to automated requests. The fact that the Fab Four's catalogue is due to be re-released in digitally remastered format within the year also can't hurt page traffic.
3. Michael Jackson (79,734): Not surprisingly, Michael Jackson's page is among the most viewed pages on Wikipedia. The day after Jackson's death the page received 5.9 million views. Of the top 10 most-viewed Wikipedia pages of 2009, Jackson's name is also mentioned on the Deaths in 2009 page and briefly in the Beatles page due to his controversial purchase of most of the Lennon-McCartney Beatles catalog in 1985.
*Favicon.ico (78,077): While the Telegraph articles lists this as number 4, it's irrelevant as the Wikistics stat source cites that the Favicon.ico ranking includes browser-based requests for the Wikipedia icon.
4. YouTube (72,318): Whether looking to cite corporate info or simply interested in finding out what the fuss is all about, Wikipedians have flocked to both YouTube and Facebook pages for the last two years.
6. Barack Obama (49,401): In 2008 the Barack Obama page was the 3rd most visited page on Wikipedia and not surprisingly, interest has dwindled post-election. Sarah Palin's page (64,465) was the 4th most visited page in 2008 and John McCain's page (34,486) was the 13th most visited page.
7. Deaths in 2009 (48,758): Apparently the public is clamoring to remember those they've lost in 2009. Both the Deaths in 2008 page and the 2009 page have made the top 10 list of most visited Wikipedia pages. It looks like memorial sites like My Death Space and Respectance aren't such a strange idea after all.
8. United States (46,545): This page offers basic information on politics, economics, demographics and customs of the United States. With a large population and a large number of Wikipedians hailing from the US, the page is a popular one. Surprisingly it is not listed on the community's most vandalized pages. Meanwhile both the US Democratic Party and Republican Party pages are listed.
10. Wikipedia Current Events Portal (40,962): This page lists daily news topics and the latest Wikinews articles. It is a great source for breaking news stories. The page also links to recent deaths and ongoing events such as the automotive industry crisis.
For the Telegraph's entire list visit the article. You can also check them against Wikistics list of yearly page hits for 2008 and 2009.
Another interesting resource is Wikipedia's most popular articles within the last hour. While recently deceased celebrities appeared on this list at the time this article was written, there were definitely some interesting anomalies. For instance, the Ernie Davis Wikipedia page saw a dramatic increase in hits. When cross-referenced against real time search engine Collecta it appears HBO was airing the Ernie Davis biography "The Express". Audience members were simultaneously watching television while searching for Davis' biography.
The Wikimedia Foundation just emailed ReadWriteWeb to announce receipt of $500,000 in grant funding from The William and Flora Hewlett Foundation. The grant is a part of a $100 million dollar program to fund open education resources, and given Wikimedia's mission to encourage the growth, development and distribution of free, multilingual content, the Hewlett Foundation couldn't have chosen a better org.
Sponsor
Wikimedia has contributed to open education in a number of ways including by providing full courses and textbooks through Wikiversity and Wikibooks and a number of learning resources and commons material through Wikisource and Wikicommons.
Three days ago the organization celebrated Wikipedia's 3 millionth English article and 2 days ago it launched its official iPhone app. In a week of landmark announcements, the company has managed to charm the Hewlett Foundation and kick start its strategic planning process.
"The Hewlett Foundation's support comes at a critical time," said Wikimedia Foundation's Executive Director Sue Gardner, "We've just begun the planning that will help us identify how to maximize our impact around the world. This support will help us to execute our priorities for the current year, and enable us to plan for the future."
In today's blog post by Chief Strategy Officer Mike Maser, Digg announced that it will be rolling out its beta ad program later this week. In addition to the community's existing banner ads, the company is launching an initial set of ads to appear in rotation with regular content. From here, users will interact with the ads in the same way they interact with articles - by digging, burying and commenting on them. Advertising with a high number of Diggs will fetch lower ad revenue and buried advertisers will be charged more.
Says Maser to the community, "The success of this system depends on your participation and feedback, as it will help advertisers to create the best possible experience for the Digg community. Our goal with Digg Ads is to encourage advertisers to create content as compelling as organic Digg stories, and to give you more control over which ads you see on Digg.
It will be interesting to see which advertisers attempt to game the system by digging their own ads, and how fast these ads will be buried. The official June announcement of the Digg ad program received more than 400 comments within the community, and surprisingly many of them are very positive. While critics argue that the ads will simply be buried and advertisers will stop paying for placement, others called this "marketing democracy." A few commenters pointed to the fact that they already use Adblock - a Firefox extension that allows users to filter out advertising content. Nevertheless, others chastise Adblock users for not supporting the community they enjoy. In a community as opinionated as Digg's, it will be interesting to see how the first users react to this new play for revenue.
Wikipedia is aflutter with angry psychologists demanding that the community take down reproductions of 10 original Rorschach inkblot plates and their statistically common responses. The Rorschach tests have been used since the 1920's to determine psychological disorders through the analysis of images. Twenty-five percent of all forensic cases utilize the Rorschach test in assessing defendant competency and criminal responsibility. According to the New York Times, Dr. James Heilman of Moose Jaw, Saskatchewan originally uploaded the files and discussion has exploded ever since with doctors on both sides of the argument.
Sponsor
Although Swiss psychologist Hermann Rorschach (the creator of the test died) in 1922, the inkblots are still widely used in personality and psychological assessment today. However, once an image's copyright owner passes away, that image is automatically released into the public domain 70 years after his/her death unless an extension is filed. While many argue that Wikipedia's release of the inkblots invalidates testing and causes potential harm to patients, others argue that the images are already widely accessible and too relevant to the article to omit.
For now, the Wikipedia discussion page states,"Prior discussion has determined that Rorschach inkblots images shall be displayed in this article, and removal of pictures without consensus at Talk:Rorschach test/images [the discussion page] will be reverted."
Times reporter Noam Cohen writes about those against the posted images saying, "For them [the psychologists], the Wikipedia page is the equivalent of posting an answer sheet to next year's SAT."
The fact that both of these tests are based on normative results adds another dimension to the Wikipedia debate - whether or not the inkblot test is a valid metric in the first place. In the late nineties, based on reviewing the demographics of students with the lowest averages in the country, critics called the SAT racist, urban-centric and classist. With the test determining college placement, scholarship eligibility and in some cases, job placement, it remains an important one. For this reason, it was redrafted in 2005 to be more tolerant of diversity and more reflective of classroom curriculum.
With the Rorschach inkblots having been established since the 1920's, what are the chances that each of us aren't already showing signs of major psychosis? If there's a doctor in the house, by all means, let us know if and how the psychological indicators of the test have changed over time.
There's no doubt that a number of those awaiting SATs and psychometric testing might choose to game the system. While higher SAT scores improve college eligibility, average Rorschach inkblot results might alleviate the fear of being estranged from friends and family. Unless the person being psychologically profiled wants to shirk criminal responsibility or can see themselves as a danger to themselves or others, it makes sense to want to establish "normality".
But why is Wikipedia more responsible to protect Rorschach testing than scientific journals or medical websites? Admittedly, I am not an expert in medicine, psychology or the forensic sciences and I have no idea how these Wikipedia images will affect the patient community. However, as a tech blogger, I understand this issue to be Wikipedia's dedication to free and educational content - even when that education is widely debated. It will be interesting to see if those against the inkblot posting will be able to determine a consensus to have them removed.
While there are some pretty nifty machine-based language tools out there, no machine will ever trump human translation. Machine-based tools are fine for simple greetings and pleasantries. However, only human translators can help us understand the political and cultural nuances inherent in foreign texts. This is important on two accounts. Firstly, rather than bouncing ideas off a culturally insular echo-chamber, we have a chance to learn from others with distinctly different view points. And secondly, for the first time ever, world history moves from being a confined regional fact to an evolving and diverse discussion.
Sponsor
Human translation lets us address collective global issues while also seeing the negative and positive impact of our choices. For this reason a number of groups have come forward to produce open translation (or crowd sourced translation) projects. Here are just a few of those efforts:
1. Project Lingua: This service aims to reduce language barriers on the web. With Project Lingua, volunteers translate alternative media sources from citizen journalists on the Global Voices network.
2. Worldwide Lexicon: This project first parses information with machine translators and real humans review the translations to ensure they are accurate. From here, the group republishes the sites in a number of languages in order to encourage cross-cultural dialogue. The group also built Der Mundo - what WWL describes as a "general purpose translation community for blog and RSS feeds."
3. WikiProject Echo: WikiProject Echo is a program where volunteer translators contribute their efforts to expanding the scope of Wikipedia. Volunteers will certainly have their hands full translating this amount of data as the site advertises 2.9 million English articles alone.
4. TED Open Translation Project: For polyglots, the TED Open Translation Project is a great way to practice superior language skills while contributing to a cause-worthy project. We're big fans of this educational series. All translators and reviewers are credited on the web page for a talk they've translated as with the above Arabic translation.
5. Cucumis: Cucumis also employs volunteer translators and all translation is thoroughly peer-reviewed. Once a translator's work is accepted, they receive points. The points are redeemable for translations from others within the community.
Have you got a few minutes to spare to help make government activities more transparent? Watchdog organization The Sunlight Foundation launched a new project called TransparencyCorps today. Modeled after Amazon's Mechanical Turk, the project asks visitors to perform small tasks that a human can do better than a machine. The first two tasks include summarizing congressional earmark requests in a form and uploading a photo of yourself calling for increased openness in government.
The innovative system is a pleasure to use and is being open sourced for other organizations interested in crowdsourcing similar tasks. You can honestly do something useful and important in 5 minutes or less on this site.
Sponsor
The earmark summary task starts by running earmark request documents through an automated system to fill out a few key data fields, then asks multiple Transparency Corps users to verify and complete the summaries. Once those fields, like money requested and address of recipient, are filled out - then the data will be available in a structured format. That means it will be easier to search, analyze, visualize and mash-up. That's right - your spare minutes could be turned into structured government data for watchdogs and developers to work their magic with. Structured government data enables all kinds of research to be done, including discovery of patterns of official activity that need scrutiny and change.
TransparencyCorps participants get points for every small task they do and can get themselves on a charming leader board of "transparency leaders." It's all very cute but this really is important work to be done.
We'd love to see an iPhone app to do this kind of work while waiting for the bus or in the line at the grocery store. How about a Facebook app that pushes out notifications to our friends' newsfeeds: "I just took 2 minutes and summarized a congressional earmark request to fund an environmental study of a proposed industrial park!"
Unlike Mechanical Turk, where there are scads of workers because they get paid small sums, TransparencyCorps volunteers are unpaid. Promotion will no doubt be the site's biggest challenge. If ease of use can be maximized and some effective promotion done, we think this could be a really great project.