Content, Content Everywhere and not an Item to Consume
Tuesday, May 17, 2011 at 22:25
Eric Wilson in big data, discovery, machine learning, recommender system, recsys, search

I Feel Bloated

You have spent a great deal of time building up the catalog of things that you want to make available for consumers to purchase or consume.  In fact you have done such a great job that now you have (gasp) too much stuff and folks are not getting the value of the deep and rich service you provide.  You might even have multiple, complementary platforms that you offer your service or catalog through – potentially complicating the landscape of what to select even further.

Let’s face it; users are confounded at the myriad choices they have today.  “I have 999 channels to surf”, “There are 350,000 apps to look at”, “What other kinds of music tracks or artists might I like?”  Users are frustrated because search isn’t the answer.

 

The Wilson Confounded Search Conjecture:  You don’t know what to search for if you don’t know what you can search for.  

 

Search doesn’t solve the challenge of Discovery.  In order for Discovery to be a part of the user experience, the stuff they might like necessarily needs to find them!  It is precisely this challenge that we in the recommender system (recsys) space are aiming to solve.

I Think I Need A Recsys

The first step to solving an issue is to recognize that one exists.  If you are reading this post then you may have come to the realization that your stuff just isn’t performing and your users aren’t engaging. Don’t worry; there is a way through.

Some recsys questions to consider:

Build vs. Buy?

This is an interesting and often challenging question to answer honestly.  Sure, you have some sharp engineers that are invigorated by the proposition of building this kind of tech.  It’s cool.  Super geeky, but cool. The tough question is; do they really have the experience and horsepower needed to get it done?  

Any amount of research into the recsys area will certainly reveal several Open Source Software (OSS) projects that aim to bring some general-purpose solutions for this heady problem down to earth.  Here are a few of the more promising projects:

Mahout: http://mahout.apache.org/

Apache Mahout is a scalable machine learning library that supports large data sets.

Lucene: http://lucene.apache.org/java/docs/index.html

Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Solr: http://lucene.apache.org/solr/

Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.

The availability of these kinds of OSS options can mask the serious and complicated engineering nature of architecting a successful recsys solution.  Be aware that there is much work to be done before the above solutions can become pragmatic for your endeavors.

Also, make certain to answer the recsys questions listed above before embarking on your recsys project.  The answers will shape the success criteria for a buy option or the product definition for a build decision.

Article originally appeared on peripatetic.mobi (http://blog.peripatetic.mobi/).
See website for complete article licensing information.