An Interview With TweetDeck Founder Iain Dodsworth
A small startup company called InfoChimps released for sale yesterday three very large sets of data extracted from 500 million Twitter messages. Included in the offering are the senders and recipients of 1 billion @ messages, Retweets and Favorites. We wrote in-depth about the release late last night. This morning we interviewed Iain Dodsworth, creator of the most popular Twitter client, TweetDeck, about the value he might find in that data and the direction he's aiming to take TweetDeck in the future.
Sponsor
Dodsworth: Straight off the bat - an archive of tweets could form the basis of a profiler and that's very interesting. Sentiment analysis (which I am ALL over) requires that kind of base corpus.
RWW: InfoChimps isn't releasing full text yet, but they would do a custom slice if you wanted it.
Dodsworth: It's the historical element that a large number of services are missing and where they will fall flat - analysis based on the last few hundreds tweets is almost pointless.
RWW: I'm curious what "a profiler" might mean to you and what this data could help make possible in those terms.
Dodsworth: For me a true profiler would be akin to the holy grail - we would analyse who a person converses with, who RTs them the most, essentially all interactions. Then we would track activity metrics (how many tweets sent, replies) and then we would analyse language patterns (usage of certain words) to ascertain how they express themselves and pinpoint sentiment. Off the top of my head this could lead to elements of intention prediction and I'm steering TweetDeck to have this kind of very very basic Artificial Intelligence at its heart.
I'm currently researching intent predicition inside high frequency trading systems and it's fascinating and could directly relate to TweetDeck and social media systems/services in general.
[Dodsworth's background is in developing for financial services, at places like Prudential Financial and PricewaterhouseCoopers.]
RWW: What would intention prediction look like in this context? On twitter?
Dodsworth: At its most basic if TweetDeck could predict what the user was probably about to require next, based on current activity, then it could start to collate that data in the background - cross twitter/facebook/linkedin data for example. I'm looking at it right now from a cross-service data gathering perspective where our servers do the gathering and hopefully get around the issues of API limits for example.
This is based on future functionality we're mapping out now which is a lot more complex than looking at someone's profile or seeing how many RTs one of your tweets has.
I'm thinking the scope is full social graph rather than just twitter/facebook.
RWW: I guess I'm having a hard time imagining what "what the user was probably about to require next, based on current activity, then TweetDeck could start to collate that data in the background - cross twitter/facebook/linkedin data for example" might look like. Like, if I'm looking at a person's profile, I'd probably like to see their LinkedIn data?
Dodsworth: Good example...or see how a certain person you're tweeting with right now stacks up against "similar" people you've spoken to - a box could pop up mid-conversation and give you a tonne of metrics on this person. How full of [crap] are they? Are they a social media guru? Would you be wise to tell this person anything sensitive? Based on previous language patterns, is the person you're tweeting with right now probably lying? A bit out there but possible in theory.
It's been a long and winding road for serial volunteer and social media philanthropist Sloane Berrent.
Since her unplanned departure from an L.A.-based startup in 2008, Berrent has traveled through eight countries, documenting and publicizing the struggles of those in developing areas through her blog posts, tweets, images, videos, and her own presence at events at home and abroad. From post-Katrina New Orleans to a trash dump in Manila to a monastery in Burma, read on for her story of trying to achieve social good through social media.
Sponsor
RWW: "Social media for social good" has become the catchphrase du jour, it seems. What does it actually mean; how much can social media users affect social change, and how?
I am a strong believer in the idea that the things you do online are meant to facilitate your offline interactions. People are so fast to click a button, and that can be great. Retweeting, forwarding, and Facebook walls are great engagements. But what's more difficult is the donate button. That's the big hurdle and disconnect. I'm trying to provide these inspirational opportunities in timeboxed campaigns. Social media is slowly catching on, but there's a lot of noise. Standing out is hard; it's important to have an offline component.
Berrent was visibly disturbed by what she witnessed at this Manila trash dump, where she saw shoeless children running through piles of debris.
RWW: Tell me about your experiences with Kiva borrowers. What kinds of people and enterprises have you seen? In your opinion, does microlending have a measurable impact on struggling local economies?
Kiva is really unique. It has a lot of power users - more than any nonprofit I've ever seen. One man has made a thousand loans. It's individual stories, and people really connect. You get updates on that person, and people say it's their favorite email of the month. As a microlending company, Kiva is one spoke in the larger wheel of microfinance. On a global scale, it has a very big impact.
Typically, when you go to a village or province, certain industries are prevalent. In a fishing community, maybe the borrower bought a fishnet or a fishing boat. In an area with a lot of bamboo, it's going to be crafts. I worked in eleven branch offices. I met over 40 different female borrowers individually and over 250 in my time there.
I can see that the money Kiva provides makes a difference. Microfinance is a very slow process, and there are gems and sparks of people who break through the poverty cycle. When you see villages changing, it's really something. It's like watching grass grow, but it's really beautiful grass.
RWW: Now you're working on a seven-day, seven-city tour to raise awareness and funds for malaria prevention through bed nets. Where did this idea come from?
It's a city-by-city competition on who can raise the most money for malaria nets, but also an opportunity for anyone to donate who wants to get involved. The tour starts this Saturday night in New York City and continues for the next seven days in Miami, New Orleans, Chicago, Seattle, San Francisco, and ends in Los Angeles on Friday...
I'd just finished Kiva training, and I was going to the Philippines for three months. And all I could think was, "When I come back, I'm going to be thirty." I've honed in a lot on my direction - using the Internet to help people. And what if I could use this opportunity to give back, involving people in different parts of the country - something really ambitious?
I wanted it to be about saving lives. I wanted to say, "I saved this many lives on my birthday." I've done a lot of work in HIV and AIDS; I looked into that and polio and malaria, and that's what stuck with me. The campaign has no administrative fees. One hundred percent of the funds go to malaria... in rural northern Ghana. Providing malaria nets will really be a part of saving lives there.
Berrent met this monk in Burma and spent the afternoon pagoda-hopping with him.
RWW: What needs or gaps do you see in philanthropic efforts online?
I think it's not having a strategy to begin with, not knowing the tools in your toolbox before you start. There's a lot to be said for jumping in and having fun, but nonprofits don't have the resources to play around online. They think it's about getting interns and getting followers and fans without figuring out why a medium is important and how to make it successful for them.
RWW: What's one surprise - good or bad - that you've come across since you started working with Kiva? What did you not expect from this experience, and what did you learn?
I learned that it's much more complicated than the website makes it seem. There's an entire division devoted to foreign exchange currency. The operational cost analysis, the challenges of technology in the developing world, the processes of remittance - it's incredibly complex. There are regional specialists. On the site, you can make a loan in five clicks, but a lot of machinery comes together to make it that way.
RWW: What's next for you? Is there more globe-trotting in your immediate future? How do you think the web will continue to be part of your life and career?
One of the best parts of this past year has been that I've gone through long periods where I didn't have Internet access. That's brought me a heightened and renewed sense of my purpose in the world and my authentic desire to make the world a better place. I'd like to be able to continue to support campaigns - even for-profit ventures - that I believe in, and I think social business is a wonderful intersection of the two.
I want to explore avenues with online and offline components, while continuing to blog and tell stories I'm passionate about.
And all this is just the tip of the iceburg that is Sloane Berrent's fascinating story. For a fuller look at her travels and timeline, check out this list of her nine favorite posts on her blog, The Causemopolitan, covering humanitarianism, her work in New Orleans, the phenomenon of serendipity in international travel, and much more.
Many thanks to Sloane Berrent for the use of her videos and images as well as for sharing her story with us and our readers.
Some people go to Washington to try to make the government more honest; others try to make it smaller. Technologist Tim O'Reilly is spending time in Washington, and bringing Washington officials to San Francisco, to do something different - perhaps something more realistic. O'Reilly is trying to help government become a platform for innovation. A "government as platform" would supply raw digital data and other forms of support for private sector innovators to build on top of.
Tim O'Reilly is a publisher of technical books, the organizer of a series of conferences on diverse topics, an investor in web startup companies and smart electrical grid technologies. He's credited with shepharding the term Web 2.0 into public consciousness and he regularly uses his extensive influence to call on technologists to "do something worthy," especially in the face of ecological and political crisis. Now he's brokering meetings of Obama administration officials and bleeding-edge geeks.
Sponsor
"What I learned when I went to Washington," O'Reilly told me by phone as he drove down a California highway earlier this month, "was how much the dialogue is determined by the companies that go there." O'Reilly is a man in the habit of helping determine dialogue around important issues and the opportunity the Obama administration offers to change government is no different. "[Google CEO] Eric Schmidt told me - 'tell a big story - talk to people and then share what you've learned.'"
O'Reilly is talking to people, but he's helping people talk to eachother as well. He's introducing officials like Vivek Kundra, the new CIO of the Federal government, and Federal CTO Aneesh Chopra to ground-breaking hackers like geek rennaisance man Chris Messina and YCombinator founder Paul Graham. He's bringing together geospatial visionaries and the government officials that provide them the GPS data they work with.
"What I've learned from all these conversations," O'Reilly says,"is about government as a platform. It's not just social media use by government, or government using wikis. No, it's something more profound. How do you think like a platform provider? We've moved our government from a lean vehicle for collective action, and over the last 200 years it has become so strong that it's now 40% of GDP. I want to go back to the original vision of the role of government: a convener of things that we as individuals and companies can't do alone. Standard setting, pilot programs; government providing enabling technologies for citizens to serve themselves.
"This morning we did a call with the White House and some geohackers, talking about what's wrong with government geodata now and how could it be fixed. The government people said we need to translate this into real projects that will appeal to politicians. 'If you fix this kind of geodata then we'll be able to provide this service - street safety, education attainment, public policy objectives,' was what they wanted to hear from the hackers. It's really about social innovation, building better tools for us as a nation to use technology to focus on real problems."
Healthcare, education and innovation policy are the three sectors O'Reilly says have the most momentum when it comes to government as platform.
"The old model," O'Reilly argues, "said we'll build services ourselves or we'll make deals with a few prefered providers that we'll then offer to our customers. This is very similar to what we saw recently in the cell phone market. Rather than providing all the apps themselves, Apple provided a platform and said to developers 'go build on it.' That's where I think the government is trying to go. Instead of offering a website, here's an API [application programming interface]. Can we spark innovation against what we're doing? It's not about picking a provider or partner and then your conduit to the private sector is them, instead its about evangelizing your platform so far more people develop on top of it."
"There are absolutely other companies coming to Washington and saying otherwise, to stick with old model," he says,"but there's an opportunity for government to say if people want to build services on this then we need the data we make public to be granular and timely. We should not be publishing updates once a month. Real time, local, responsive to users - those are new thinking for government. It's just like the 90's when government was discovering websites, now they are discovering web services and we're saying this is what they need to look like."
That conversation will become very public when O'Reilly hosts the Gov 2.0 conference next month. The lineup of geeks and people from the government is already intruiging and O'Reilley has said that some of the holes in the schedule are placeholders for very high-profile speakers who haven't yet sent final confirmation.
"Vivek [Kundra, US CIO] says he wants to make working for government sexy," O'Reilly says. "It's a huge part of our economy and there's a lot of opportunity for entreprenuers. Why are we letting beltway bandits get away with overchanging government to do work? We're missing opportunities to get our best thinking into government planning."
Making work for the government sexy is going to be a very big challenge. If there's a person and a paradigm that just might be able to do it, though, Tim O'Reilly and this vision of "government as platform" might be the right combination.
In part 2 of my one-on-one interview with Tim Berners-Lee, we explore a variety of topics relating to Linked Data and the Semantic Web. If you missed it, in Part 1 of the interview we covered the emergence of Linked Data and how it is being used now even by governments.
In Part 2 we discuss: how previously reticent search engines like Google and Yahoo have begun to participate in the Semantic Web in 2009, user interfaces for browsing and using data, what Tim Berners-Lee thinks of new computational engine Wolfram Alpha, how e-commerce vendors are moving into the Linked Data world, and finally how the Internet of Things intersects with the Semantic Web.
Sponsor
Semantic Web and Search Engines Like Google, Yahoo
TBL: Not really, but the takeup by the search engines is interesting. In a way I was happy to see that, it was a milestone for those things to come out of the search engines. The search engines had typically not been keen on the Semantic Web - maybe you could argue that their business is making order out of chaos, and they're actually happy with the chaos. And if you provide them with the order, they don't immediately see the use of it.
"The search engines have not been keen on the Semantic Web [...] their business is making order out of chaos, and they're actually happy with the chaos."
Also I think there was misunderstanding in the search engine industry that the Semantic Web meant metadata, and metadata meant keywords, and keywords don't work because people lie. Because traditionally in information retrieval systems, keywords haven't proven up to the task of finding stuff on the Web. One of the reasons is that people lie, the other is that they can't be bothered to enter keywords. So keywords have gotten a bad reputation, then metadata in general was tarred with this 'keywords don't work' brush. Because a lot of Semantic Web data included metadata, then people thought that with Semantic Web data -- again, that people will lie and won't have the time to produce it.
Google rich snippets example; image credit: Matt Cutts
Now I think there's a realization that when you're putting data online, that people are motivated NOT to lie. For example when your band is going to produce its next album, or when your band is going to play next downtown, you're motivated to put that information up there on the Semantic Web. There's an awful lot of cases when actually data is really important to people; and it's on the web anyway. So I think it's great that some of the search engine companies are starting to read RDFa.
Does this mean that they [search engines] will start to absorb the whole RDF data model? If they do, then they will be able to start pulling all of the linked data cloud in.
"The web of linked data and the web of documents actually connect in both directions, with links."
Will they know what to do with it? Because when it's data in a very organized form, I think some people have been misunderstanding the Semantic Web as being something that tries to make a better search engine - i.e. when you type something into a little box. But of course the great thing about the Semantic Web is that you can query it, you can ask a complicated query of the Semantic Web, like a SQL query (we call it a SPARQL query), and that's such a different thing to be able to do. It really doesn't compare to a search engine.
You've got search for text phrases on one side (which is a useful tool) and querying of the data on the other. I think that those things will connect together a lot.
So I think people will search using a search text engine, and find a webpage. On the front of the webpage they'll find a link to some data, then they'll browse with a data browser, then they'll find a pattern which is really interesting, then they'll make their data system go and find all the things which are like that pattern (which is actually doing a query, but they'll not realize it), then they'll be in data mode with tables and doing statistical analysis, and in that statistical analysis they'll find an interesting object which has a home page, and they'll click on that, and go to a homepage and be back on the Web again.
So the web of linked data and the web of documents actually connect in both directions, with links.
User Interfaces for Semantic Content
RWW: At the recent SemTech conference, Tom Tague of Thomson Reuters' Calais project suggested that user interfaces for semantic content are key in getting more take-up. With that in mind, I wonder if you've seen some great interfaces or designs for semantic applications in recent months - if so which ones and why did they impress you?
TBL: I think that whole area is very exciting at the moment. The only piece of hacking I've done over the past few years has been on a thing called the Tabulator [a data browser and editor], which is addressing exactly that. Partly because I wanted to be able to look at this data. And now there are lots of different ways that people need to be able to look at data. You need to be able to browse through it piece by piece, exploring the world of data. You need to be able to look for patterns of particular things that have happened. Because this is data, we need to be able to use all of the power that traditionally we've used for data. When I've pulled in my chosen data set, using a query, I want to be able to do [things like] maps, graphs, analysis, and statistical stuff.
So when you talk about user interfaces for this, it's really very very broad. Yes I think it's important. There's also the distinction we can make between the generic interfaces and the specific interfaces.
There will always be specific interfaces; for example if you're looking at calendar data, there's nothing else like a calendar that understands weeks, months and years. If you're looking at a genome, it's good to have a genetics-specific user interface.
"I want to be able to do maps, graphs, analysis, and statistical stuff."
However you also need to be able to connect that data, through generic interfaces. So if my genome data was taken during an experiment which happened over a particular period, I need to be able to look at that in the calendar - so I can connect the genetics to the calendar.
So one of the things I hope to see is domain-specific things for various different domains, and the generic user interfaces. And hopefully the generic interfaces will be able to tie together all of the domains.
Next Page: Wolfram Alpha; e-Commerce and Linked Data
During my recent trip to Boston, I had the opportunity to visit MIT. At the end of a long day of meetings with various MIT tech masterminds, I made my way to the funny shaped building (see photo right-below) where the World Wide Web Consortium (W3C) and its director Tim Berners-Lee work. Berners-Lee is of course the man who invented the World Wide Web 20 years ago.
This was my first meeting with the Web's creator, whose work and philosophy was a direct inspiration for me when I launched ReadWriteWeb back in 2003.1
Sponsor
After shaking hands, I told Tim Berners-Lee that this blog's name was in part inspired by the first browser, which he developed, called "WorldWideWeb". That was a read/write browser; meaning you could not only browse and read content, but create and edit content too. It was a shame then when Mosaic, a read-only browser, became the first mainstream web browser in the mid-90s. It wasn't until the rise of Web 2.0 that the read/write philosophy gained widespread acceptance.2 On that note, we launched into the interview...
Note: the interview will be published in two parts, with Part 1 today on the topic of Linked Data. Part 2 will explore other topics and will run tomorrow.
How Linked Data Relates to The Semantic Web
RWW: Earlier this year you gave an inspiring talk at TED about Linked Data. You described Linked Data as a sea change akin to the invention of the WWW itself - i.e. we've gone from a Web of documents to a Web of data. Can you please explain though how Linked Data relates to the Semantic Web, is it a subset of it?
TBL: They fit in completely, in that the linked data actually uses a small slice of all the various technologies that people have put together and standardized for the Semantic Web.
Linked Data uses a small slice of the technologies that make up the Semantic Web.
We started off with the Semantic Web roadmap, which had lots of languages that we wanted to create. [However] the community as a whole got a bit distracted from the idea that actually the most important piece is the interoperability of the data. The fact that things are identified with URIs is the key thing.
The Semantic Web and Linked Data connect because when we've got this web of linked data, there are already lots of technologies which exist to do fancy things with it. But it's time now to concentrate on getting the web of linked data out there.
Web inventor Tim Berners-Lee and ReadWriteWeb founder Richard MacManus
How Linked Data Has Evolved via Grassroots
RWW: Linked Data has had a lot of grassroots support, which you mentioned in your TED speech. This is something Semantic Web technologies, such as RDF, have struggled to get over the years. Has the W3C been pushing the more bottom-up Linked Data world, because of the frustration over lack of take-up of top-down Semantic Web?
TBL: A lot of the initial RDF and OWL projects came out of the academic world; and some of them were projects to show what you could do in a closed world. And the files were zipped up and left on a disc. While they were interesting projects, and while the systems were useful systems, the semantic web community maybe missed the point of the 'web' bit and focused too much on the 'semantic'. However the work that's been done in the Semantic Web, the standards, was really valuable. It's relatively recently for example that SPARQL [an RDF query language] has been developed.
"It's time now to concentrate on getting the web of linked data out there."
Somebody drew an analogy the other day: can you imagine trying to promote a world of databases without SQL? Even though it's not an interoperable protocol, it's just a query language. So similarly, all that's been put into RDF, rdfs and OWL is very valuable to the linked data community.
The Linked Data community tend to use a subset of that [Semantic Web technologies], of OWL for example. But they certainly use SPARQL. So you could argue that really it wasn't ready to be deployed widely.
Linked Data started as a very informal Design Issues note that I put in; it was a grassroots movement from very early on. So yes W3C has been emphasizing the importance of Linked Data. It's been the Semantic Web Interest Group of course, and various [other Semantic Web] activities, which has been pushing it. But also Linked Data has been seized on - a group of people for example put together DBpedia.3 That wasn't commissioned, that was that they just thought it would be a really cool idea.
Graph of Linked Data sets on the Web, as at March 2009
Linked Data and Governments
RWW: In a recent Design Issues note, you urge governments to put their data online as Linked Data (although you'd also be happy for governments to just make available the raw data - presumably so that others can then structure it). What do you realistically expect, for example, the U.S. or U.K. governments to do over the next year? And in the near future, do you foresee different governments interconnecting their Linked Data sets?
TBL: One can't generalize, governments are (like most big organizations) fascinatingly diverse inside them. So you'll find that there are places inside governments where you get a champion who gets linked data and who's just written a script and produced some linked data. So in the UK government for example, you'll find there's RDFa [in the code of its website] for civil service jobs. So if somebody wants to make a database of all the jobs, they can do that very easily.
"The first step of actually putting the data out there is the one that nobody else can do."
There are other cases where the easiest thing for somebody to do is to just put data up in whatever form it's available. Comma separated values (CSV) files are remarkably popular. They're exported sometimes from spreadsheets. It's remarkable how much information is in spreadsheets. Or sometimes pulled out of a database and then put up on the web. It's not as good, not as useful to the community, as if Linked Data had been put up there and linked. But the first step of actually putting the data out there is the one that nobody else can do.
The way to go is for government departments to go the extra step and convert [their data] into Linked Data. One of the nice things about Linked Data, when they have a pile of it, is that they could run a SPARQL server on it. SPARQL servers are a commodity product, a solution for all of the people who say 'but actually I wanted to have XML.' A SPARQL server will generate an XML file [and] allow somebody to write out, effectively, a URL for the XML file.
"Linked Data is the backplane, it's the thing that you connect to in both directions."
In fact, I don't see why SPARQL servers shouldn't provide CSV files, something which as far as I know isn't in the standards. But I'd recommend it, certainly in government context, because CSV files are what people have and what people want.
So the message [for government] is to use RDF. Linked Data is the backplane, it's the thing that you connect to in both directions. As a [web] producer your job is to make sure that you produce Linked Data one way or another. And as a consumer, there are lots of ways to consume that data once it's out there as Linked Data.
Part 2 of ReadWriteWeb's interview with Tim Berners-Lee will be published tomorrow...
Footnotes:
1. The very first sentence written on this blog, on 20 April, 2003, was: "The World Wide Web in 2003 is beginning to fulfill the hopes that Tim Berners-Lee had for it over 10 years ago when he created it."