June 20, 2006

Pandora Music Recommendations

Posted in Music, Personalization, Reviews at 8:15 pm by mj

For the last week or two, I’ve been listening to Pandora almost non-stop at work. (And, when working from home, I’m putting my 7-speaker computer sound system to good use.)

I was skeptical at first, but it is surprisingly easy to use, and generates good recommendations.

At the heart of Pandora’s service are “stations,” which are much like individually personalized radio stations. You create a station by entering the name of a song or band you like. Pandora will then find music similar to the song or band you entered. You refine the music played on that station by either entering more songs/bands whom you like, or giving thumbs up/thumbs down to songs as Pandora plays them.

Their revenue model seems to be a combination of (a) advertising, (b) offering a premium service, and (c) providing referrals to Amazon and iTunes. I can’t imagine many people look at their ads (I keep mine playing on my Windows machine while I work on Linux), but I can see the direct links to Amazon and iTunes generating quite a bit of traffic.

The killer is that Pandora’s free service not only allows unlimited listening, but up to 100 stations. Of course, like real radio, you can’t rewind or select a song to play. They also limit the number of songs you can skip in one session. (Consequently, after the limit is reached, when you give a thumbs down to a song, you still have to finish listening to it.)

I started my first (and, so far, only) station by entering Marillion, and soon supplemented that with Van der Graaf Generator and Nick Cave, mostly to see if I could fool their algorithm. Between those and the recommended songs that I have thumbed up, my station has probably 4 (maybe 5) strong-ish clusters.

One annoyance with Pandora is that their algorithm tends to stick to one cluster at a time, lasting maybe a dozen songs before heading into a new direction–and those durations are increasing. (Their algorithm may be designed to converge on a single cluster per station.) In fact, the best way to get them to start playing music from a different genre is to enter the name of a different kind of song that you like. Otherwise, you could be waiting quite a while.

The second annoyance–and maybe related to the first–is that they tend to wear out an album. For example, for several days, I could have sworn the only Marillion album they had was Real to Reel, since all of the Marillion songs they played were from that album. Similarly for VDGG and H to He, and Nick Cave and From Her to Eternity. But, eventually, they do expand the songs they play from an artist.

The third annoyance is that they weight songs that you’ve thumbed up heavily in selecting which song to play next. This often results in hearing the same song twice an hour, and produces a ton of “repeats.” Not so bad if you’ve thumbed up 500 songs, but as I was just starting out, I really wanted to hear a wider variety of recommendations. And, though you can ask them to not play a song for 30 days because you’re tired of it, that just seems like using a sledgehammer for a thumbtack: no nuance.

All in all, it’s a good service, and one that is still being actively developed. Yesterday, I logged on and found that they’d improved their interface a bit, and added song bookmarks. Previously–and still–I can not find a way to get a list of all the songs I’ve given a thumbs up to, so I’ve started using bookmarks in addition. You can view my profile at my Pandora profile page. I do no bookmark songs I already own, and I’ve only been bookmarking since Monday.

The feature I really want is an “I Own This” button, which would weight the song highly when computing similar songs, but prevent the song from getting played often (after all, if I own it, why would I need to listen to it on Pandora?).

In addition, there should be a way of submitting corrected title information, as I’ve discovered a number of songs with the wrong titles (usually by giving it the title of another song on the same album).

Anyhow, when I make my next Amazon purchase, I’ll be sure to do so in a way so Pandora gets the referral for the artists they’ve turned me onto.

June 18, 2006

Corporate In-Store Safety: Responsibility and Liability

Posted in Business, Me at 6:50 pm by mj

This weekend, my mom slipped on a wet surface inside a Wal-Mart store and broke her hip. She’s currently at the University hospital, where she will undergo surgery.

For its part, Wal-Mart (apparently) has a policy in place to cover its ass: its own employees voluntarily served as witnesses, and took photographic evidence of the scene before cleaning up the floor. Doubtlessly, Wal-Mart has run the numbers and determined that they’ll average smaller payouts by being helpful and considerate, than by trying to deny everything.

This brings up the question, to which I’m trying to find answers, of whether there are any strict legal standards for safety at a store, restaurant, etc., and whether systemic uncleanliness is a factor.

For example, I’m not a big fan of Wal-Mart stores. Notice I didn’t say I’m not a big fan of Wal-Mart, the successful corporation that has innovated in many areas of business, driven down prices, and tends to serve lower income families better than their competitors. I just can’t stand their physical stores, which tend to be cramped, crowded, smelly, ill-organized and, well, just plain messy.

What’s always struck me about Wal-Mart is how many hazards I’ve seen. It seems every time I enter their store, there is either a (metal) shelf coming loose, with a sharp end sticking out (I’ve been cut on those twice); or a puddle of laundry detergent on the floor; or oddly-shaped boxes sticking out into the middle of the aisle; etc.

Given this, is Wal-Mart (legally, not ethically) more responsible when a 60-year-old woman slips and breaks her hip, because their store policies tend to discourage putting cleanliness and safety above employee convenience? Or does the law run the other way, and put the responsibility in the hands of customers to know that Wal-Mart stores are usually not paragons of safety? Or does Wal-Mart’s pattern of behavior have absolutely no legal impact on an individual incident of a customer getting injured?

On the personal side of this, her injury is apparently not as bad as it could have been, and I’ve been told not to travel home for her surgery, but I do worry about how it will affect her health and mobility as she ages. This could restrict her to a wheelchair a decade sooner, or cause her blood pressure to rise, or more subtly affect her medical well-being. I guess it’s just wait-and-see at this point.

June 17, 2006

Four Years at Webshots

Posted in Me, Work at 5:24 pm by mj

I just realized this as I was waking up from my nap: today marks my fourth anniversary at Webshots. (My official hire date was June 16–a Sunday in 2002, with my first day being Monday.)

Technically, I’ve been doing work for Webshots for almost five years. As an engineer at Excite, I was, among other things, responsible for Webshots’ photo search starting in the summer 2001. I have fond memories of complaining to my manager, “Who the frell are these people who can’t give us a valid XML feed?!?” Over the years, I’m sure I’ve caused more than one partner to exclaim, “Who the frell is this bozo who isn’t using our API correctly?!?” And the world turns.

We’ve only made two indisputably bad hiring decisions in the past four years. There was the numbskull who tried to bring proprietary code from a former (competing) employer into our site (he was fired on the spot, and all of his code removed before we launched the product he was working on), and the guy who wrote more Daily WTF candidates in his short time than I’d believed humanly possible.

Today, the Webshots engineering and webdev team is better than it’s ever been in the over ten years of Webshots’ existence. Being part of CNET also means there are other engineering teams across the organization whom we can leverage for core infrastructure, and even, on occasion, bells and whistles. (One of those cool groups is responsible for Solr. I envy the people on that team.)

Four years in one organization is about the time when most people–even happy people–start re-evaluating their careers, and I admit I’ve been doing some soul searching myself about how to take my career to the next level. The things that make a job worthwhile for any engineer are being surrounded by smart people, learning new things, finding new challenges, and getting things done. I don’t have the perfect job (who does?), but I’ve found a lot of all four lately.

But the really nice thing about this anniversary? I start accruing an extra week of vacation every year! Yeah, baby!

June 13, 2006

The Economics of (Oil) Prices, and Long-term Oil Strategy

Posted in Business, Politics at 7:56 pm by mj

There’s a short entry on the economics of prices by Walter E Williams. Williams responds to the argument that charging today’s prices for oil that was bought cheaper a week ago is “price gouging,” and offers up the following scenario:

If you were really enthusiastic about not being a “price-gouger,” I’d have another proposition. You might own a house that you purchased for $55,000 in 1960 that you put on the market for a half-million dollars. I’d simply accuse you of price-gouging and demand that you sell me the house for what you paid for it, maybe adding on a bit for inflation since 1960. I’m betting you’d say, “Williams, if I sold you my house for what I paid for it in 1960, how will I be able to pay today’s prices for a house to live in?”

Williams is correct on that point. Such accusations are usually made by people who don’t understand basic economics.

But Williams continues and asserts, like so many do, that the real problem with high gas prices is the U.S. Congress:

Opening a tiny portion of the coastal plain of the Arctic National Wildlife Refuge in Alaska to oil and gas production, according to the U.S. Geological Survey’s mean estimate, would increase our proven domestic oil reserves by approximately 50 percent. The Pacific, Atlantic and eastern Gulf of Mexico offshore areas have enormous reserves of oil and natural gas, but like the Alaska reserves, they have been put off limits by Congress. Plus, the U.S. Office of Naval Petroleum and Oil Shale Reserves estimates the world supply of oil shale at 1.6 trillion barrels, of which 1.2 trillion barrels are in the United States.

If I may put on my astute politician hat for a bit, I think arguments like that miss the bigger point.

The untapped oil under U.S. jurisdiction can be seen as a bargaining tool against Opec. Knowing that the U.S. could commit itself to using its own domestic oil supplies–and, if that were to happen, we’d really commit to it–means the U.S. can bargain for cheaper prices (if not exactly cheap prices) now. It’s a bit of a threat: if we extract more oil, we can ruin the economies of several nations and make life miserable for some of the shiek-kings in Opec.

But what would happen if the U.S. committed itself to this route tomorrow? Well, after a ramp-up period which would probably involve higher taxes to subsidize the endeavour, we’d have cheaper oil prices. Much cheaper. But for how long? 90 years? (ANWR is 15 years, and the others?) And then? Then the U.S. would be backed against a wall.

In the long term battle for freedom, we’d better have some tricks up our sleeves to maintain our independence when the going gets really tough. Using up our oil at the first sign of a little trouble means those opposed to liberty–which most of the Opec nations represent–have a leg up in the long-term game.

My prediction is as soon as we establish a viable, scalable, long-term unmonopolized alternative to oil, we will tap our domestic reserves to get us through the turmoil that would undoubtedly follow, knowing that even if we tap it all, there’s an out (i.e., the proven alternative that is being rolled out). I don’t expect that to happen in my lifetime.

At least, if I were an astute politician–or, for that matter, even particularly politically astute at all–that would be my strategy.

(Link via Catallarchy.)

June 11, 2006

SIGIR 2006

Posted in Conferences, Search at 8:39 pm by mj

The ACM SIGIR is back in the states this year, hosted at the University of Washington in Seattle the second week of August. Registration was just posted last week, after months of a “coming soon” page.

Conference registration is $550 for ACM members.

The panels I’m planning on attending, from the program:

  • User behavior and modelling (day 1, session 1, room 1)
  • Exploiting Graph Structure (day 1, session 2, room 1)
  • Formal Models (day 1, session 3, room 2)
  • Machine Learning (day 2, session 1, room 2)
  • Efficiency (day 2, session 2, room 3)
  • Clustering (day 3, session 1, room 2)
  • Recommendations: Use and Abuse (day 3, session 2, room 2)
  • Web IR: Current Topics (day 3, session 3, room 2)
  • Faceted Search Workshop (day 4) ($110 more, plus an extra night stay; Andrei Broder, who I wrote about earlier, is the co-leader)

I’m not much of a conference person, and even less of a socialite, but I’m looking forward to learning a lot from the people there–both the panel members, and the audience.

Who knows, I may even come away having recruited a couple of smart young researchers to work on my team.. (though, with Yahoo, Microsoft, Google, and several start-ups well represented on the panels, it will be tough.)

I’d love to hear if any of my six readers (including the four spammers) is planning on attending.

Ebay Enters “Contextual Advertising” Business

Posted in Advertising, Personalization, Search at 5:33 am by mj

As reported seemingly everywhere, Ebay is entering the contextual advertising business, where ads on affiliates’ sites will link directly to active auctions on Ebay whose items match the content on the current page. This is most likely a good thing for Ebay sellers. The value to small-time content publishers remains to be seen, since I believe the TOS on the GYM team offerings forbids forays into multiple advertisers.

This marks the fourth major player to enter this arena, which means it’s time for somebody else to come along and change the nature of the game. Once everybody has the know-how and the infrastructure, the market becomes ripe for a superior differentiated product.

Contextual advertising is a bit of a misnomer, since the actual context of the user’s session really doesn’t come into play. Rather, it refers to an advertisement appearing in the context of the content on the page.

For example, let’s say I (as the content publisher) know that you came to a page by searching for “aluminum siding” (yeah, I know). Although the page itself probably has at least one of those words, my advertising partner of choice has no real way of distinguishing my interest in aluminum siding from my interest in vinyl siding (which is also contained on the page). And they certainly have no clue that I’ve skipped over 12 other search results because they didn’t contain exactly what I wanted.

But intent through explicit search is only a small piece of the puzzle. What if I knew you came to a page through a recommendation my system offered you, and I (of course) know the criteria that was used to make that recommendation?

Most advertisers are equipped to take “hints” from the publisher, in the form of additional keywords, but they’re not equipped to (a) accept a lot of additional keywords, or (b) accept keywords that we’d like to negate, or (c) consider the real context of the user’s session, or (d) learn from a user’s behavior, to further refine their model of the user’s context (intent).

Maybe by considering, say, the last 8 pageviews within the last 30 minutes (those with contextual ads, anyway), they’d get closer in some circumstances, but they’d flub it in many situations. This is even more true when only certain pages contain calls to the advertiser, and those pages probably are not the ones providing the meat of the context.

Further down in the report, the reporter also mentions that Ebay is studying the possibility of opening up their user feedback system in some way. That seems like more of a trial balloon being floated to gauge interest and, more imporantly, to take suggestions on how to do so in a way that provides value, but still keeps the most important part proprietary. Hence the “it could take several years” comment from their director of developer relations.

Still, tying reputation systems into advertising–and, maybe going even further, establishing seller reputation on a publisher-by-publisher or user-by-user basis–seems like the next logical step.

June 8, 2006

The Next Generation of Search

Posted in Personalization, Search, Usability at 9:45 pm by mj

More and more I find myself thinking, “Those guys at Yahoo get it.”

The latest example is Andrei Broder from Yahoo! Research, who, at last month’s Future of Web Search Workshop, gave the keynote talk titled, From query based Information Retrieval to Context Driven Information Supply [link is a PDF].

While this is a CTO-level presentation (i.e., high level, few details), it was well illustrated and to the point (and quite funny in spots).

According to Broder, Classic Information Retrieval makes all the wrong assumptions for the web context: classic IR ignores context, ignores individuals, and ignores dynamism (which is to say, the corpus is static). This is one reason I’ve never put much faith in academic search criteria (such as the TREC corpii).

He goes on to outline the first three generations of web search: from keyword matching and word frequency analysis (1st generation); to link analysis, clickthrough feedback, and social linktext (2nd generation); and, currently, in the midst of “answering the need behind the query” (3rd generation), which is mostly about supplementing core search with tools (spell checking, shortcuts, dynamic result filtering, …), or with high-ranking, high-certainty results from verticals (maps, local searches, phonebooks, …).

And what of the newborn 4th generation? It’s about going “from information retrieval to information supply” (emphasis mine), which is all about implicit searches: personalization, recommendations, …and, of course, advertising.

If you know me, it’s this 4th (and possibly 5th, see my notes below) generation that I’m always harping about. I wrote a bit about the future of search last year. I make a bit of an ass about it sometimes, but it turns me on.

And advertising, of course, is the big payoff from a corporate POV.

His final slide (slide 49) lists the challenges with this 4th generation of search: it involves a lot more data collection, a lot more data modelling, a lot more math, and a lot more understanding of the significance of the relationships between users and content.

What I find even more interesting, though, is what Broder left out:

  • Search is well on its way to being integrated into normal navigation (faceted search is just one step)
  • Social networks can, and should, affect relevance (social search is just one step)
  • Search is being used as a platform, and soon, partners will be able to affect relevance for their users–providing yet more information to the core search system
  • Search will soon be but a mediator between users and content–whether integrated into normal navigation or not–which provides the missing context for advertising, which can not be merely gleaned from content matching, or third-party user profiling

Did he stop short because much of the future of search makes Yahoo’s current search business positioning irrelevant?

Or because he has his team secretly working on the 5th generation of search and doesn’t want to give away his edge?

Or maybe he just wants to underpromise and overdeliver…

(Like via Greg Linden, who is one of the few bloggers I’ve made time to read in the last two weeks.)