Friday, November 30, 2007

News sites want more control of search engines access

AP via Findlaw:

"The desire for greater control over how search engines index and display Web sites is driving an effort launched Thursday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access.

Currently, Google Inc., Yahoo Inc. and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as "robots.txt," which a search engine's indexing software, called a crawler, knows to look for on a site.

The formal rules allow a site to block indexing of individual Web pages, specific directories or the entire site, though some search engines have added their own commands.

The proposal, unveiled by a consortium of publishers at the global headquarters of The Associated Press, seeks to have those extra commands - and more - apply across the board. Sites, for instance, could try to limit how long search engines may retain copies in their indexes, or tell the crawler not to follow any of the links that appear within a Web page."

The proposed controls are known as Automated Content Access Protocol (ACAP). Google, Yahoo et al are never going to go along with this and sure enough -

"Google spokeswoman Jessica Powell said the company supports all efforts to bring Web sites and search engines together but needed to evaluate ACAP to ensure it can meet the needs of millions of Web sites - not just those of a single community." As Nicholas Carr and David Weinberger and others have said, the more 'free' content there is, the happier Google will be.

No comments: