The best and most relevant (to me) web dissertation I’ve ever read was Clay Shirky’s Ontology is Overated. I do not hope to come even close to the clarity and relevance of that manifesto, but I hope to add to the discussion with a narrower (commerce) rather than wider (information retrieval) focus of the topic from the perspective of a specific application. Secondly, I want to take a historical & BROADER view of the topic within the context of e-commerce. Lastly, while there is no ultimate winner (game is not over) nor the right way to architect an “e-commerce information retrieval system,” to this date, there has been a winning methodology as proven by revenue, profits, and even marketcap. How the pendulum will swing in the future, I dont know, but recent technology improvements certainly has allowed various architecture to compensate for the short comming of each.

(BTW, I’m using ontology/taxonomy/attributes as a generalization of any structured content, not technically correct but useful in this case)

At the two ends of the spectrum of e-commerce implementation of a product retreival system are

1. Search Engine + Unstructured Content - Product information is created by product owner (seller, dist, manu etc) in an adhoc manner with minimal regards to standardization or formating. A seach engine is used to find relevant product for buyers based on various algorithms (keywords, pageRank etc)

2. Query + Structured Content - Best way to think about this is a attributed query field and a attributed catalog. Essentially a SQL database with a structured query interface.

There are several examples of along the spectrum.

Google - In the purest sense, Google (not Froogle) is the perfect implementation of such a system with completely unstructured data and search engine

eBay - In the SKU-less world of ebay (circa 2000), seller enter product information in a semi-structured manner. Furthermore, there is no effort to consolidate listings with the same SKU into one giant listing. As far as eBay or any machine is concerned each product listed for sale is completely unique. A search engine is implement to search listing titles and sometime descriptions.

Delicious - There is really no “tag” implementation of a e-commerce search so I’m just gonna let delicious be my straw-man. Some might argue that Delicous and eBay should switch, I however would argue that the act of tagging a product with a set of specific tags is more restricting and thus more structured than eBay’s “Listing Title.” Furthermore, as you’ll see later, eBay and Delicious is creating a Recall/Precision tradeoff consistent with the rest of the spectrum. (BTW, eBay does have a categorization scheme but not in the context of its search engine. The scheme essentially offers an alternative method of navigation. But if you want to, you can switch eBay & Delicious on the spectrum because of this issue)

Amazon - Amazon has a catalog that is SKU centric in that product title and description are standardized for each unique product. Sellers of that product has to list his or her product under that SKU.

Chemdex – A long dead but very relevant example. In many ways represent all B2B e-commerce companies back in 2000. Like a lot of B2B implementation of an e-commerce info retrieval system, aka catalog, Chemdex has a very sophistical, highly attributed, highly structured product content. It has the very definition of an Ontology or Taxonomy (depending on your own interpretation of the word).

As we all know, companies that have taken the critical product strategy decision on the LEFT side of the spectrum on unstructured content has become the dominant players in the e-commerce world. For various reasons I will go into, Google and eBay has garnered a disproportionate amount of the e-commerce spend. Especially in the case of eBay vs. Amazon, the power of the unstructured content has won over rigid standards. While many would argue that eBay has much better business model (no inventory) than Amazon and thus is the leading players, I would argue that because Amazon has adopted this virtual model since 2000 and has yet to narrow the gap, it shows that it is actually the superior product architecture that is the driving force of eBay’s growth. Fundamentally, it is also this unstructured product content architecture that has allowed eBay to maximize its virtual model and thus is the true source of its competitive advantage.

There are several key differences in the spectrum:

1. Sophistication/Effort – On one end, the critical product and differentiation factor is better search algorithm, on the other end, the critical factor is content creation. Essentially, player on the left side of the spectrum decided to spend money on “understand the mess” while on the right side on “cleaning up the mess.”

2. SKU – Due precisely to the decision above, adding & creating content for players on the left is so easy, unlimited # of products can be sold and managed leading to breadth. On the right, because the bottleneck of the commerce system is on the creation of the catalog, companies are forced to focus on the product they can sell and drive inventory turnover for those SKUs (ie depth).

3. Investment – Equally important, Google and eBay lives and die by the “power” of its search engine and thus spend significant money on creating the best of breed algorithm or user experience. Chemdex and Amazon, on the other hand must invest in content creation throughput usually in the form of man power. (Chemdex spends disproportionate amount of its money on this task and eventually went out of business because it too so long, the quality was so bad, and so expensive.)

There are also some key trade offs too

1. Speed – Search Engines are by definition faster than Query Engines. Your SQL results on 100X less magnitude of data is still slower than a Google search. This has serious effects on the user experience especially in B2B.
2. Precision – A key search engine concept. Connoting the “relevancy” of the individual returned results. Structured content typically returns more precious results because more attributes and parameters can be specified by the “buyer”
3. Recall - Another key search engine concept. Connoting the “coverage” of the returned results. IE regardless of # of results returned, as long as all relevant results are included, it has good recall. Unstructured content typically has high recall due to the “fuzziness” & flexibility of its algorithm. Structured content, on the other hand, has serious issues as mentioned by Clay Shirky.
4. Flexibility - This is THE key reason that unstructured content won over structured. The flexibility to sell ANYTHING (kidney on eBay!) allowed eBay to evolve without management interference while Amazon required the creation of new content and new categories.
5. Data Mining - On the other hand, the ability to understand data through structured content is the key competitive differentiator that Amazon has over eBay or Google. It can mine data extensively to create sophisticated cross selling, up selling, recommendation, and personalization features that Google will be hard pressed to implement due to the fact that its data is “dumb.” While this had always been Amazon’s strategy is was still not enough to overcome the rigidity of its product catalog architecture.

These differences and trade offs were made by the various players in the industry. To this date, buyers have shown that a good search engine and unstructured product information source is the superior architecture for creating an e-commerce focused information retrieval system. Thus intelligence has won over brute force. Oh ya, I too, think ontology/taxonomy/attributes is over rated not just philosophically but for business.

I believe the past history of e-commerce search will have serious implications for the so called SEMANTIC web but I’ll save that for the next post when I can think more clearly. (Hint, I’m in Clay’s camp)

Just some of the things I read on tagging recently, there is a lot btw so this is not comprehensive:

Unfolding Ontology from Alex
The Yin and Yan of Tagging
More Clay
More Clay on Tim Bray’s Q
A blog on tagging: You’re It
Fred’s Tags