IT Conversations: Clay Shirky – Ontology is Overrated
Clay Shirky gave a presentation at ETech titled Ontology is Overrated. You can listen to the presentation at the link above (ITConversations).
Highlights [with my expansions in square brackets]:
(1) Ontologies are left over from times when we had to file objects on shelves. This is no longer true with data on the web [or in an enterprise].
(2) The ontological goal of finding the perfect categorization scheme for the “essence” of the objects you are categorizing is a false goal in this era.
(3) Library of Congress categorization scheme (hierarchical buckets without overlap between buckets) is optimized for numbers of books on the shelves not conceptual ideas or intellectual aspects. Books need to be in one place but ideas can be all over the place. We have confused the container for the things within the container.
(4) There is no shelf. There is no physical constraint that we have to enforce upon the web.
(5) Yahoo! created the 14 top level categories when the web started to grow. Books and Literature link under Entertainment is really a link to Art/Humanities. Their constraints on their ontology was stronger than the users expectation for where the users expect the object to occur. Yahoo put an upper limit to the number of links that you could have (3).
(6) You can ask Google for “Obstreperous” and “Minnesota” and get a list of pages back. You cannot reasonably ask a Ontologists to predict that there needs to be a category for Obstreperous and Minnesota. This is the fundamental difference between “search” and “browse”. “Browse” says that the Ontologists have the power and they get to override the user’s need. If they haven’t categorized objects the way that the users need the users are out of luck. “Search” says the reverse. Nobody gets to tell you, in advanced, what you need. We will do our best at the time you request to find what you want based on the link structure.
(7) Ontology works when you don’t have a lot of stuff, it is clearly defined separation and it is stable. Works when the domain is very restricted and all of the users are participants. (Diagnostic and Statistical Manual).
(8) In a system where: there is no expert who can exert force on the system (the US Government can declare that SUVs are light trucks not cars), the users are not experts, the information is fluid; then Ontologies and categorization systems fail. This describes the web. Large ill-defined corpus.
(9) Single-loss – we need to enforce a thesaurus of terms – we all need to use the same tags to discuss the same objects. Movie, Cinema and Film: you won’t be connecting the Movie people with the Cinema people. When you collapse the difference between the terms you assume there is no difference between the terms (single difference).
(10) Predicting the future is hard: (A) This is book about Dresden. This is statement about the essence of the item. (B) This is a book about Dresden and it belongs in the category East Germany. This is a prediction about future. The Former Soviet Union as an example.
(11) Problems when you merge Ontologies. Library of Congress taking in Shirky’s books as an example. They merge the books and ignore his categories. The interesting part comes from looking at how he organizes his links not the categories themselves (e.g. he files X under Category Y).
(12) The long tail in del.icio.us. Showing a 2 hour sample of tags entered into del.icio.us. Discussion about the tag “To Read” The cataloger looks at this is horror, “this is context dependent and temporary”. Well, so was East Germany! Once you expand you time scale to include the lifetime of a categorization scheme, you see that categories are also temporary.
(13) As we get used to the fact that there is no limit of “shelf” or “space”, we will gain from this roll-up of user based
Merge from the content (URL) then move up. Merges create overlap (Mac seems like OSX)
Mergers are probabilistic not binary.
You can do interesting roll-ups based on time, users, group of interests.
The signal loss comes from expression (by users) rather than compression (of items into a select few categories).
The filtering is post hoc – after the publication not before (no editing before publishing).
One-off categories (unused or not useful) will get lost in the wash.
Semantics are in the users – not in the system. The system will suggest tags that match what other users have used not the system will determine suggestions based on an understanding of the tag (e.g., Mac OS X is an Operating System that runs on Macs). This is not a way to get computers to understand things.
(14) It comes down to a question of Philosophy: Does the World Makes Sense or Do We Make Sense of the World. If the World Make Sense, then you believe that your understanding of the world is the correct view of the world. If We Make Sense Of The World, then the understanding is all context dependent and based on user experience.
We are looking at a radical break where we rebuild starting with the URL and we will get entirely different systems.
Clay wrote his presentation into an article
my question would be, is Folksonomy/social tagging, the first answer for idea written down by mr Berners-Lee
I don’t think so. As one person commented to me recently, “you can do really cool stuff with the Semantic Web [they were talking about the Semile project]. You need a 747 full of Librarians to implement it though.”
The semantic web work is interesting because it finds classification and taxonomies based on metadata in the html. You use informative “div” tags for instance to provide context / categorization of the content on a web page. In order for this to work, you need a (set of) taxonomy(ies) and a controlled vocabulary. You also need the rules for how you use both. All of this comes together to build an Ontology for web data.
I see the semantic web fitting into a communication and collaboration environment as one layer – a fairly high layer for structured information. The semantic web layer also has a high overhead for creation of the layer and for maintenance of information. This is why you need a 747 full of Librarians to implement it.
Folksonomies and Social Software operate at a lower easier less-formal layer. No ontology. No data dictionary. No controlled vocabulary. What you gain is very easy sharing of information, the chance for social discovery (“he is interested in this too?”) and the ability quickly find and sort based on a few key ideas. Clay Shirky did an interesting analysis of the tags for Ontology is Overrated Summer Remix posting. He found that it only took 10 people to establish what the top 3 tags are for the post. People are very good and interpreting information and drawing out the key themes. We should be, we have been asked to do this since 4th grade 😉 Folksonomies and Social Software rely on the wisdom of the masses to build their structure – not the expertise of the few – or the 747 full of librarians.
Thanks for the comment.
It is very hard working with ontologies in to the semantic web, because of the problem it is that we haven’t got any engineer