Anandeep さんのプロフィールPannu's Pontificationsブログリストつながり ツール ヘルプ
    11月29日

    Does Personalization work for search

    Raul Valdes-Perez of Vivisimo says that personalization of search (i.e. figuring out what I want from a search that is different from what a random bozo wants ) is not a fruitful avenue for exploitation (academic-speak for "can't get no satisfaction").  He wrote a one page paper about it - heres (PDF) how it goes!
     
    Raul was at at Carnegie Mellon University when I was there - he was working on how scientific discoveries are made or something like that and wrote some AI programs that tried to simulate the process of scientific discovery. His other claim to fame was that his adviser was Herb Simon, CMU's resident nobel laureate who introduced the concept of bounded rationality in his Political Science disseration at the University of Chicago. Herb was one of the founding faculty of the Graduate School of Industrial Administration. CMU was so quant - they couldnt bring themselves to call their business school the Business School!
     
    Herb was one half of the Simon-Newell duo, who are considered "Fathers of AI" ,though there are enough claims about being the father(s) of AI that I am afraid to be seen with the mother! Herb and Allen Newell built GPS (Global Problem Solver) - the first "means-end" or goal driven forward and backward reasoning solver. I was on one of Allen Newell's projects when I was a graduate student researcher.
     
    As with many AI folks Raul ended up in search, Vivisimo is a CMU spin off. One of my good buddies, Liren Chen, who is among the best natural language and seach programmers I know was with them before he moved to Google.
     
    Raul basically says (i) interests change, (ii) profiles of users are not reliable and (3) data gathered is non-reliable as an indicator of preferences
     
    I disagree that personalization is a dead end - today we have only the two word queries (I think the average search is of the order of 2 words) to infer what a user wants. From aggregrating a lot of those two word queries we can even make decent guesses - but having extra information that drives inference is not always bad.
     
    Techniques exist to address each one of Raul's objections. I think Raul is reacting to the hype that based JUST on our click stream the search results would be better than those we have now.
     
    Where personalization scores is in segmentation - identifying you as being different from other users or the same as other users. The segmentation provides biasing of the aggregrate search, and doesnt act as a basis for the search itself. To take one of Raul's examples - if you are identified as a "doctor" segment - typing anthrax means that you are most likely to be looking for the anthrax disease entries that the aggregate search is aware of. If you are identified as a "rock fan" segment , anthrax means nothing else but the rocking heavy metal group. The words you use as a doctor (in other activities you do) have a co-relation with the words that will be in the documents describing anthrax the disease (for example "golf game", "nurses", "&** insurance companies") and these can be used to make sure that the documents get biased by the query AND by the words in your profile. Whereas the words you use in other activities as a rocker are probably "sex", "drugs", "rock" and "roll"!
     
    The challenge is figuring out the words you use consistently and coupled with certain activities - you may be part of both the "doctor" segment and the "rock" star segment. Another would be mapping the words that describe a particular segment to a segment. Also diagnosing if you are doing one off activities like searching for a new newspaper story or as part of your unchanging preference. Amazon does a great job already (in a constrained domain) about mapping the words you use, the books you buy and the meta-data associated with a book to bias the search for those books. I disagree that buying the books & spending time reading them makes this data more valuable than the data gathered from web page visits. I am committed enough to some websites, that I am sure statistical inference can pick it up!
     
    All this may be hard but I believe it is solvable, and I would go as far as to say, solvable using the same techniques used for making search more relevant.

    コメント

    しばらくお待ちください。
    入力されたコメントは長すぎます。短くしてください。
    何も入力されていません。もう一度やり直してください。
    現在、コメントを追加できません。後でもう一度やり直してください。
    コメントと書くには、保護者 (ほごしゃ) の方の許可 (きょか) をもらってください。許可をリクエストする
    保護者 (ほごしゃ) の方が、あなたがコメントを書けないようにしています。
    現在、コメントを削除できません。後でもう一度やり直してください。
    1 日に投稿できるコメントの最大数を超えました。24 時間経過してから、もう一度やり直してください。
    あなたが他のユーザーに対して迷惑行為を行っている可能性があると確認されたため、お使いのアカウントによるコメントの投稿を無効にしています。誤って無効にされたと思われる場合は、Windows Live のサポートにお問い合わせください。
    コメントを投稿する前に、以下のセキュリティ チェックを完了してください。
    セキュリティ チェックに入力する文字は、画像に表示されている文字または音声で流れた文字と一致していなければいけません。

    コメントを投稿するには、お使いの Windows Live ID でサインインしてください (Hotmail、Messenger、または Xbox LIVE を既に使用している場合は、そのアカウントが Windows Live ID です)。サインイン


    Windows Live ID をお持ちでない場合は、アカウントを新規登録してください。

    トラックバック (1 件)

    この記事のトラックバックの URL は次のとおりです。
    http://anandeep.spaces.live.com/blog/cns!4A061826CDFFAB7!116.trak
    この記事を参照しているブログ