Not sure how this whole Business 2.0 VC contest has to do with the “Thesis Investing” meme that was so popular a few months back. Idea based investing? Better or worst? not sure, but I do know that whatever this is, it has been around for a long time. A lot of times entrepreneurs mistakenly think that a VC stole their business plan when all along the VC has been hunting for a startup to fit into their own idea. Of course VC’s will incorporate ideas/data from various entrepreneurs and re-position/improve their own ideas and thats not exactly kosher. Anyways, thats not a topic for this post. Mainly I wanted to talk about the upsell engine Greg Martin of Red Point mentioned in the article. Here is the excerpt:

$5M-THE ULTIMATE ONLINE UPSELL
WHO: Greg Martin, Redpoint Ventures, Los Angeles
WHO HE IS: Martin handles communications and digital media investments for the venture firm, which has recently scored with portfolio companies like MySpace and Topspin.
WHAT HE WANTS: Software that makes better product recommendations to online shoppers.
WHY IT’S SMART: Amazon may have been the pioneer in so-called collaborative filtering — matching online customers with products they’d be likely to buy — but by no means has it mastered the discipline. The percentage of buyers who make recommended purchases online is abysmal. “It’s about 3 percent for major Web retailers, and for most other merchants it’s lower than that,” Martin says. Software that can better sort and sift customer data and increase the conversion rates by just a percentage point or two, he says, would generate a healthy business. Beyond Amazon, after all, thousands of online merchants still don’t have access to such tools. “There’s a lot of information out there that’s being ignored,” Martin says.
WHAT HE WANTS FROM YOU: A group of no more than 10 people to tinker with and refine the algorithms to make online purchase suggestions more efficient. Says Martin, “I’d want to see the technology working, with a few customers onboard.” The next phase if all goes well? Developing algorithms for websites to serve up more relevant ads.
SEND YOUR PLAN TO: gmartin@redpoint.com

The reason that Amazon has built a somewhat successful recommendation engine is because they have a database of purchasing data that is both CROSS-CATEGORY and LONGITUDINALLY significant (statistically) to run collaborative filtering algorithms (either association rules or clustering algorithms) which mines the data to discover bundling opportunities. Most e-tailers are vertically focused and does not have big enough sample of customers (not mentioning technical & knowlege limitations) to be able to build this engine. (BTW the urban legend in the KDD circles is that beer & diapers is the most commonly bundled super market product . . thus the need for cross-category data) The basic algorithms are not complicated, see here, the Amazon implementation is actually even simpler and technically not considered clustering from a data mining perspective. Given those limitations, and the explosion of SEM/SEO driven e-tailers, lots of money can be made in creating not just an upsell engine but an “upsell network.”

Now, I think Amazon should create a “product recommendation” based webservice (thus still keeping their data proprietary). But I’m not sure if its going to happen given that they rather sell the complete bundle of merchant/website services. The bigger question then is how Greg can find a suitable “replacement” for that database of purchasing history.

This is where the blogosphere comes in. Remembering Tom Foremski’s post that created a storm on monetization of of the blogospher? This is a another method for monetization the blogosphere focusing on product based blogs (gizmodo,engadget, apple secrets, etc etc).

The idea to use linquistic frame structures to mine texts and infer relationships between products actually came from Alan Abraham. I met Alan at Wharton and we spent some time just kicking around this particular idea as a basis/case study for me to learn more about text mining algorithms. Alan is one of the leaders in the field and has created an application which uses this technology to infer contractual relationships between users and “electronic communities.” The software , called CAMpace, “incorporates coverage-checking components for contract monitoring and contract/policy conflict detection.” The paper can be downloaded here. Instead of writing my typically convoluted explanations, I’m just going to quote Alan from one of our email threads.

The paper that I mentioned to you concentrates on interpreting natural language business contracts, but the tools are very generalizable to the applications you’re interested in (inferring up-sell and cross-sell opportunities). In particular, the paper mentions some linguistic databases (See Section 3 on page 2), which can be used to identify words/tokens (like the word “prefer”) that function as comparatives in English and the linguistic slot frames that these words possess. You can then pull down web-pages (e.g. from magazines, newspapers, and other web-pages), parse sentences (Like “I prefer the Compaq US252 to the IBM ThinkPad T60″) looking for the above-mentioned comparatives (e.g. “prefer … to …”), and then pull out role-players (to populate the above-mentioned slots) from those sentences to dynamically populate the list of individual complementary and competitive products. e.g. the slots/linguistic-frame structure for “prefer”is as follows: “[person] prefers [productA] to [productB]” (where the items in square brackets [] are the slots/frames). You can, with some accuracy (e.g. 60%), pull out productA and productB from this, and associate them as competitor prodcuts.

My rambling stream of conscious response back is here

I thought about this all night. . . and came up with something I think we can commercialize. . .are you familiar with the UCCnet initiative? its a database of UPC codes, product descriptions, manufactuers etc. . . its an retail industry initiative. Anyways, if we marry UCCnet database + a web crawler + natural language database + some language processing engine and use it to create a database of “relationships” we can essentially map out all the relationships between all retail products (replacement, upgrade, complementary, accessory, upsell, crossell etc) we can than create a webservice that allows large e-tailers to dynamically do recommendations (ala Amazon) . . . PLUS. . . even cooler. . .we can use a google Adwords business model .. . .smaller e-tailers (yahoo stores) can participate in the “platform/marketplace” and cross link their products with each other . . . using ad serving contextual technology , we can actually serve products rather than text ads based on products viewed on the page . . .. (we also make this totally selfserve like adwords) example.. .Zappos.com sells shoes. . . if they join, they can cross sell products from eBags. . . . and vice versa. . . we take a cut of any successfull cross sell and share it with either eBags or Zappos. . .tell me if I’m nuts. . . do you think the technology exists to do this?

Anyways, the thread goes on for much much longer. So why am I sharing this? Well, call it pitch to Greg if you well (but I’m not naive enough to believe Greg will openly share his best ideas :) ). But more so, I think this is an social experiment in changing the VC-entreprenuer dynamics (ala Matt Marshalls’ thoughts on transformation in VC-land). If a VC can openly solicit ideas, why cant entrepeneur respond openly as well. If open source can free society from the shackles of “intellectual property,” why cant an “open business” do the same? If the spirit of co-production and peer-participation can create softwares, communities, websites, why cant companies be built without delineations on foundership, employeeship, customership, or even ownership. Who knows? . . . its getting late and I’m talking as if I’m high, I better stop before I hurt myself.