October 22, 2006

Search privacy through #privacy: somewhat feasible, or dreaming too small?

Russ Jones (no relation), CTO of SEO Marketing shop Virante, has generated quite a bit of buzz around pound privacy. In his open letter to G/Y/M/A, he says

A number of indexing technologies and standards – robots.txt, nocache, noindex – have been adopted by all major search engines to protect the authorship rights of websites across the internet. Yet, to date, the search engines have not created a standard of privacy for their users.

His solution? #privacy as a sort of pre-processing directive in search queries:

The standard is simple: if a user includes #privacy in a search query, the search engine should not associate that IP (or other tracking mechanism such as cookies) with the query, nor should that query be made available via public or private keyword tools

I think what Russ and his co-workers are going for–aside from a bit of nice publicity for their company–is much simpler than what many “privacy rights” advocates are seeking, and more feasible too.

For example, Michael Zimmer says #privacy doesn’t go far enough, and I think he speaks for a lot of people:

Forcing users to append their searches with a tag in order to protect their privacy accepts the premise that search engines should be allowed to collect personal information by default. And that is what must change.

The argument for 100% email encryption is valid here, as well: namely, if you only protect yourself when you have “something to hide,” then it becomes a lot easier to determine who’s doing things they’re not supposed to be, and to show intent in a legal proceeding.

It’s an interesting dilemma. There is obvious scare mongering going on, of the kind that first stigmatized the use of cookies 10 years ago. Still, there is definitely risk here, as your searches can be (and maybe already have been) subpoenaed.

I’m not clear how far the original proposal wants to go. The straight-forward reading would imply that such searches do not contribute to popularity lists, relatedness of queries, relevancy feedback (including clickthru tracking), etc. That’s a lot of stuff around search to prevent innovations upon, should such a standard become the default setting. I’m not sure what harm it does to know that 10,000 people searched for stuffed bears in microwaves if none of those queries are attributable to a specific individual or ISP.

It also probably rules out keyword-based advertising, and especially keyword-based advertising targeted to your interests, or from which your profile might be gleaned (for example, clicking on most ad links will give the advertiser information about you and the context of your click–it has to, or else advertisers will not be able to track success rates).

It gets worse. Even if the search engine respects my privacy, any links I click on will, by default, send my search query to the host’s site (through the HTTP “referer” header). Should search engines somehow mangle the Urls, or push every click through a redirect that has no correlation with the original search? (A conspiracy theorist will say that is the goal, as it will make SEO Marketing firms much more valuable. ;-))

There’s something in me that likes #privacy as a manual, special-circumstance directive. A naive implementation, though, will lull people into a false sense of securityprivacy, as it cuts across several areas of the business and underlying infrastructure.

Beyond that, search engines can go a long way toward alleviating fear, uncertainty and doubt by simply being totally clear how their personal information is being used. For example, to establish the relatedness of one query to another, you need to associate each search with a unique user and then correlate multiple searches to similar users. However, that data does not need to be queryable on a per-user basis, nor does it need to survive a long time (maybe 30 days). Be clear about that, and most people won’t care most of the time.




  2. Russ Jones said,

    Thank you for taking the time to look over #privacy. I have taken the time to look over your comments and those of a handful of other prominent bloggers who have voiced their concerns over the standard.

    I have discussed them in detail on thegooglecache.com. I would be pleased if you could take a chance to look them over.

    Thanks again,

    Russ Jones

    PS: my brothers initials are MTJ!

