Clay Shirky gave a presentation at ETech titled Ontology is Overrated. You can listen to the presentation at the link above (ITConversations).
Highlights [with my expansions in square brackets]:
(1) Ontologies are left over from times when we had to file objects on shelves. This is no longer true with data on the web [or in an enterprise].
(2) The ontological goal of finding the perfect categorization scheme for the “essence” of the objects you are categorizing is a false goal in this era.
(3) Library of Congress categorization scheme (hierarchical buckets without overlap between buckets) is optimized for numbers of books on the shelves not conceptual ideas or intellectual aspects. Books need to be in one place but ideas can be all over the place. We have confused the container for the things within the container.
(4) There is no shelf. There is no physical constraint that we have to enforce upon the web.
(5) Yahoo! created the 14 top level categories when the web started to grow. Books and Literature link under Entertainment is really a link to Art/Humanities. Their constraints on their ontology was stronger than the users expectation for where the users expect the object to occur. Yahoo put an upper limit to the number of links that you could have (3).
(6) You can ask Google for “Obstreperous” and “Minnesota” and get a list of pages back. You cannot reasonably ask a Ontologists to predict that there needs to be a category for Obstreperous and Minnesota. This is the fundamental difference between “search” and “browse”. “Browse” says that the Ontologists have the power and they get to override the user’s need. If they haven’t categorized objects the way that the users need the users are out of luck. “Search” says the reverse. Nobody gets to tell you, in advanced, what you need. We will do our best at the time you request to find what you want based on the link structure.
(7) Ontology works when you don’t have a lot of stuff, it is clearly defined separation and it is stable. Works when the domain is very restricted and all of the users are participants. (Diagnostic and Statistical Manual).
(8) In a system where: there is no expert who can exert force on the system (the US Government can declare that SUVs are light trucks not cars), the users are not experts, the information is fluid; then Ontologies and categorization systems fail. This describes the web. Large ill-defined corpus.
(9) Single-loss – we need to enforce a thesaurus of terms – we all need to use the same tags to discuss the same objects. Movie, Cinema and Film: you won’t be connecting the Movie people with the Cinema people. When you collapse the difference between the terms you assume there is no difference between the terms (single difference).
(10) Predicting the future is hard: (A) This is book about Dresden. This is statement about the essence of the item. (B) This is a book about Dresden and it belongs in the category East Germany. This is a prediction about future. The Former Soviet Union as an example.
(11) Problems when you merge Ontologies. Library of Congress taking in Shirky’s books as an example. They merge the books and ignore his categories. The interesting part comes from looking at how he organizes his links not the categories themselves (e.g. he files X under Category Y).
(12) The long tail in del.icio.us. Showing a 2 hour sample of tags entered into del.icio.us. Discussion about the tag “To Read” The cataloger looks at this is horror, “this is context dependent and temporary”. Well, so was East Germany! Once you expand you time scale to include the lifetime of a categorization scheme, you see that categories are also temporary.
(13) As we get used to the fact that there is no limit of “shelf” or “space”, we will gain from this roll-up of user based
Merge from the content (URL) then move up. Merges create overlap (Mac seems like OSX)
Mergers are probabilistic not binary.
You can do interesting roll-ups based on time, users, group of interests.
The signal loss comes from expression (by users) rather than compression (of items into a select few categories).
The filtering is post hoc – after the publication not before (no editing before publishing).
One-off categories (unused or not useful) will get lost in the wash.
Semantics are in the users – not in the system. The system will suggest tags that match what other users have used not the system will determine suggestions based on an understanding of the tag (e.g., Mac OS X is an Operating System that runs on Macs). This is not a way to get computers to understand things.
(14) It comes down to a question of Philosophy: Does the World Makes Sense or Do We Make Sense of the World. If the World Make Sense, then you believe that your understanding of the world is the correct view of the world. If We Make Sense Of The World, then the understanding is all context dependent and based on user experience.
We are looking at a radical break where we rebuild starting with the URL and we will get entirely different systems.