Entries Tagged 'search' ↓

Is Schema.org Really a Google Land Grab?

Last week the Web's three leading search companies - Google, Microsoft and Yahoo! - announced a new structured data collaboration called Schema.org. It includes more than 100 new types of website markup for content like movies, music, organizations, TV shows, products, places and more. The stated aim of Schema.org is to "improve the display of search results, making it easier for people to find the right web pages."

However, is this collaboration routing around existing web standards, as promoted by the governing web body the World Wide Web Consortium (W3C)? Since the news was announced, we've discovered that the W3C was not consulted about Schema.org. And given that Google dominates the search market, should we be worried that Google will control a substantial part of the markup used on webpages if - as expected - Schema.org gets significant take-up? Here's why the alarm bell should be rung...

Sponsor

Firstly, for big picture context, this situation is somewhat reminiscent of the Microsoft land grab in the dot com era of the Web. Remember when Microsoft controlled the browser market and was able to dictate how webpages were marked up? Webmasters and developers were forced to use markup that catered to Microsoft's Internet Explorer browser. Schema.org may well be leading down the same path, with webmasters and developers having to use Schema.org markup in order to get their webpages ranked highly in the major search engines.

Specifically, here are the two main issues about Schema.org which leads us to suspect this is a land grab:

1) The 3 companies - Google, Microsoft and Yahoo! - write the schemas and host them centrally. These schemas sometimes directly compete with existing open standards - such as the e-commerce markup standard GoodRelations, which has been receiving solid take-up from the likes of Best Buy.

2) Whereas open standards like GoodRelations use RDFa (a simpler version of RDF, the main markup of the W3C-sponsored Semantic Web), the Schema.org markup will use Microdata - which is a spec written by Google.

RDFa Adoption Will Suffer

Schema.org will certainly lead to a decrease in RDFa usage, which ultimately hurts the W3C's long-running push towards the Semantic Web - that is, a Web with added meaning and structure.

Over the past year, RDFa received significant take-up from large companies like Facebook and Best Buy. It's particularly notable that Facebook used RDFa in its Open Graph protocol. Facebook is Google's main competitor in the social Web, so Schema.org could also be viewed as a competitive move by Google against Facebook.

Simply put, the argument here is that Schema.org is a strong push by Google (and less so Microsoft and Yahoo!) to be in centralized control of key aspects of Web markup - at the expense of W3C open standards. As Web data becomes more and more structured, we have to question any moves by a large, influential company that may put it in a position of control over that data.

Indeed, last year we raised the same questions about Facebook's Open Graph. Because although Facebook used RDFa, they used their own custom version of it. Despite this, both Facebook and the W3C argued that the Open Graph would actually help the adoption of RDFa.

Why Did Schema.org Choose Microdata Over RDFa?

ReadWriteWeb has learned of rumors that Yahoo! wanted RDFa to be a core component of Schema.org, but that Google and Microsoft insisted on Microdata. Why is that?

Microdata is the markup specification written by Google on which Schema.org is based. It's similar to RDFa, in that it adds semantics to HTML in order to provide more structure to Web markup.

Google explained the Schema.org decision to use Microdata over RDFa on a Google Webmaster Central help page:

"Historically, we've supported three different standards for structured data markup: microdata, microformats, and RDFa. Instead of having webmasters decide between competing formats, we've decided to focus on just one format for schema.org. In addition, a single format will improve consistency across search engines relying on the data. There are arguments to be made for preferring any of the existing standards, but we've found that microdata strikes a balance between the extensibility of RDFa and the simplicity of microformats, so this is the format that we've gone with."

That explanation makes logical and business sense, but even so we have to ask why Google, Microsoft and Yahoo! chose to route around the W3C supported standard of RDFa.

There is some politics happening here, because Microdata is sponsored by a non-W3C work group called Web Hypertext Application Technology Working Group (WHATWG), which was formed in 2004 in response to the perceived slow development of web standards at W3C.

Is This a Land Grab by Google? You Tell Us...

Regardless of the politics, there is a real danger that Google in particular will come to control a significant part of Web markup through Schema.org.

While it is a positive sign that the major search companies are pushing for more structured data, the big question is about control. Why isn't Schema.org using RDFa, the W3C open standard, as the base for its schemas? Does Google now have too much influence over the future of structured data? We'd love to hear your thoughts about these important issues regarding the future of the Web.

Discuss


Google, Bing & Yahoo’s New Schema.org Creates New Standards for Web Content Markup

schemalogo-1.jpgThe web's three leading search companies are announcing today a new collaboration called Schema.org, where more than 100 new types of website markup for content like movies, music, organizations, TV shows, products, places and more will allow search engines to better understand and present what they find on the pages that show up in search results. Yahoo announced the project first today on its Yahoo Search Blog and said it was reminiscent of all three search companies collaborating to create the sitemap concept.

This will change the way people design websites, it will change the way people do search marketing, it will change a lot of things. It should be very, very interesting.

Sponsor

The work is related to Yahoo's years-old Search Monkey project, where website owners were given guidance about how to mark up websites so that their appearances in Yahoo search results were vastly improved. Gone are the days of a blue link and a few lines of text in each and every case. Some types of discovered content are better displayed in other ways, with charts, graphs or images, for example. Now that Google and Bing are teaming up with Yahoo to create a standard format, I expect that just about every site on the web will be stopping to take a look and see how they can incorporate the new structure advocated on Schema.org.

recipepic.jpg

Above, a screenshot of the kind of search results Google has displayed since offering its Rich Snippets markup documentation. In this case, it will be easy to know what the cooking time for this recipe is because "cookTime" is one of 10 standardized fields in the recipe schema under schema.org, so there's one standardized way to communicate cook time for a recipe and every 3rd party indexing a recipe web page will know what the cooking time is immediately.

Bing says of the project:

"We've made great progress on the technical front to begin to model the real world from the messy bits of data scattered across the web. Things like movies have benefitted from this work. We're now able to understand 'Casablanca' is a movie and literally mine the web to re-assemble information about that movie from millions of sites.

But we think we can do better. We want to enable publishers to give us hints about what things they are describing on their sites. Rather than rely solely on machine learning and other AI techniques, we asked "what if we could enable publishers to have a single schema they could use to describe their sites that all search engines could understand?...We at Bing see this as a major step forward for the web, simplification for webmasters and richer more informative search results for consumers. As search continues to evolve from finding links to taking action, we're excited about the potential this new system provides."

This will change the way people design websites, it will change the way people do search marketing, it will change a lot of things. It should be very, very interesting.
Google's take on the announcement is the most detailed and can be found here.

Here's how I understand such work: technical standards like standardized markup for content types allows search engines and other sites to skip spending time and work figuring out what kind of content is on a page and move directly to the stage of doing something interesting with that content.

It's not easy for a web service to know that a page is about food, or wine or a movie - but if all pages that are communicate that in a standard fashion, then 3rd parties like search engines can proceed directly to building beautiful food, wine and movie search results pages or other services that present the content in a more compelling fashion. That could make searching more pleasurable and useful and ultimately drives more traffic to the most useful and best formatted sites in the search results.

Discuss


Mendeley Throws Open the Doors to Academic Data

mendeleylogo.jpgInnovations in communications software and websites can be quite exciting. After the dust dies down, however, it's really not clear how much more information has been made available, how much more people can communicate, how much more thinking has been enabled.

London-based Mendeley's offering up an Open API and making a vast catalog of academic publications searchable, well, that might make the cut.

Sponsor

Mendeley is a popular academic and scientific research cross-platform management tool, usable both desktop and online. The tool automatically extracts bibliographic data from a user's document library and stores that information, and the papers and studies it helps builds, on their computer or on the cloud. It is that information that is now available with its API and on its search.

Prior to making the API catch-as-catch-can, Mendeley is asking researchers and developers to send in proposals by May 14 and will announce who among them will have immediate access on May 21.

The quantities of data in aggregate (the API also offers you your own data) and the varieties in type of research, from cancer studies to the influence of the Iroquois Confederacy on early American democracy, beggars the imagination. On a more practical level, Jason Hoyt, the company's Research Director, writing on the Mendeley blog, pointed out the number of features that company gets requests for are too many to handle. Perhaps developers, both general and niche, will invent some interesting tools.

The catalog has basic search starting today, but Hoyt said the company already has an advanced search function ready that will be implemented in two weeks.

mendeleyfeed.jpgHoyt promises the relational elements of Mendeley will be expressed in its search results.


"An algorithm called 'ReaderRank'...will adjust results based on the level of readership for an article. This doesn't mean the most read articles will always appear at the top, only that it is an additional measure in ranking your results. We have also taken care to prevent artificial enhancement of the results, i.e. gaming the readership. Over time, we hope to refine this algorithm by taking into account other measures of quality such as the reputation of who you trust and follow on Mendeley."

Although the combination of Open API and search should in itself be a powerful source of information, inside academe and out, it might also inspire academic publishers, who are notorious stingy with information, to loosen their stranglehold on knowledge.

Discuss


Get Random!

Get Random

We spend a lot of time at Lifehack talking about getting organized – making up lists, labeling files, simplifying your workspace, and so on. Everything in its place, and a place for everything, right?

There’s nothing wrong with this view of organization, so long as you’re getting more work done than the time you’re spending on staying organized. But a lot of times, our brains don’t work quite so neatly. For that matter, our lives don’t work quite so neatly. As it happens, we live in worlds that are as much defined by randomness and chaos as by neatness and order.

This isn’t a “left-brain/right-brain thing. It’s about how we engage with the world. Because the world isn’t always as neat and orderly as the systems we create to interact with it, we can fall “out of sync” at times. We feel this all the time – overwhelmed, creatively blocked, or just plain stuck. At those times it’s a good idea to inject a little randomness into our otherwise predictable system.

Randomness isn’t just a way to “break out of the ordinary” – it is the ordinary! And as much as we try to control things, we need that little seed of randomness now and again to close the gap between our attempts to organize our lives and the mixed-up world that is our lives. It’s what we’re designed for – humans didn’t evolve in a GTD world, we involved in a messy and chaotic world, and we’re pretty well adapted to it.

Bring on the Crazy

Here are a handful of ways to add a dash of randomness to your life. Try them all or just one or two, and see if you aren’t quite surprised at the results.

The Noguchi Filing System: Designed by Japanese economist Noguchi Yukio, the Noguchi filing system relies on the vagaries of use habits, rather than the alphabet, to sort your files. The idea is simple: instead of filing material in traditional folders and drawers, you put every document (or bundle of related documents) into a 9×12 (or larger) envelope, label it, and file it upright on a shelf. New folders go on the left-hand end of the shelf, and every file you remove goes back not where it came from, but again, on the left-hand end of the shelf. As you use the system, the left side will fill up with material you use the most often, while material you useless often will move to the right. Every so often, you can box up the right half of the shelf and archive it, or shift them into long-term reference sections by subject (Noguchi color codes his reference files, and moves them to their own shelves to be ordered by use once again).

Though it seems crazy, in testing Noguchi says that access time is almost always faster in shelves sorted by the Noguchi system. That’s because material you’re most likely to need is going to be material you’re most likely to have used recently, and that material is all on the left. The rarely-used files to the right might take longer to find, but since you rarely need to find them, on average you’ll save time – not to mention the time you save by not filing in any particular order in the first place.

Bananaslug Fever: Searching on Google is pretty straight-forward – if you know what you’re looking for. But it’s easy to get stumped, trying search after search around a topic and coming up with a bunch of not-so-inspiring pages. Enter Bananaslug. The brain-child of my fellow UCSC alum (Go Fightin’ Bananaslugs!) Steve Nelson, Bananaslug works like Google – in fact, it is a front-end to Google – but adds a random keyword from one of a dozen or so categories to your search, creating some interesting – and maybe even inspiring – results.

For example, a search for project planning on Google turns up the usual assortment of Wikipedia and blog pages, plus a book or two. Useful, if you’re looking for basic info, but what if you already know all that, and you want to learn something new? When I enter “project planning in Bananaslug and ask for a random keyword from the category “great ideas” (it chose “reasoning”), I’m introduced to whole fields of project planning I didn’t even know about: quantitative reasoning, semi-quantitative reasoning, geometric-based reasoning, temporal knowledge representation, and so on. I could get the same results from Google, except I’d never, ever have known to add “reasoning” to my search terms.

Change something: Ever try to change a habit. Man is that hard. Experts say if you keep it up for 21 days (or 30, or 28, or 45, or…) it becomes a habit, but that’s clearly BS – the time it takes for something to become a habit varies by the habit itself, the personality of the person trying to instill it, the motivation, and so on. Some things never become habits, and some habits are born in a minute.

A lot of psychologists, coaches, and other counselors don’t advice their clients to adopt new habits, because habit-creation is rarely under conscious control. Instead, they advise their clients to just change one little thing, anything – move your computer, talk to someone new, try something that’s off your regular routine. It doesn’t necessarily have to be the same thing every day, either – the idea is to create enough chaos that your regular habits become indistinguishable from the new non-habits. Try one new thing every day, and see what happens.

Brainstorm: Stuck for an idea? Try “blue”. Or “propeller”. How about “traction ankle”? Throwing a random word or idea or phrase into the mix and forcing yourself to seriously consider it, no matter how far off-topic it might seem, can create a cascade of associations that finally circle back to something useful. For example, according to Eric Abrahamson in A Perfect Mess, the word “blue” was the key that led an advertising firm to develop a safety-focused campaign to reach out to the previously-untapped market of female auto insurance buyers. How? Who knows, and who cares? The important thing is that it works.

Unschedule: Arnold Schwarzenegger doesn’t have a schedule. If you want to see him, you call his secretary, and if he’s available right now, you come on over. If not, try again later. How crazy is that?

Of course, your life is a lot more complicated than his, I’m sure – he only has a state to run and movies to make. For you, maybe instead of a “non-schedule” you could try an “Unschedule. Popularized by Neil Fiore in his book The Now Habit, in an Unschedule you schedule only the things you want to do. In the gaps in between, you work on projects, writing them into your schedule after you’ve worked a solid half-hour on a single project. At the end of the day or week, you can see how many hours of productive time you’ve racked up – surprisingly, it’s often much  greater than people manage with a much more orderly, less random schedule. (You can see an example of an Unschedule at Fiore’s site.)

When the Going Gets Tough, the Tough Get Weird

Like anything, randomness is best in moderation. Try adding a dash to your otherwise orderly day-to-day and see what happens. One thing about randomness, it’s flexible – that little bit of weirdness might be just helpful today, but one day, when the going gets really weird, you’ll be ready to go with it. You may even go pro*!

(*With apologies to Hunter S. Thompson)


Dustin M. Wax is the project manager at Stepcase Lifehack. He is also the creator of The Writer's Technology Companion, a site devoted to the tools of the writing trade. When he's not writing, he teaches anthropology and gender studies in Las Vegas, NV. He is the author of Don't Be Stupid: A Guide to Learning, Studying, and Succeeding at College.

Follow him on Twitter: @dwax.