The Google Purchase of the 1995-2001 Usenet Archive And the Online Community
A Usenet user in Seattle, Washington was using the Usenet archive at Deja.com on February 12, 2001. He went out to get some coffee. When he returned, www.deja.com had changed to groups.google.com. This is symbolic of how the online community learned that Google, Inc. had purchased the Usenet archive from Deja.2436
A number of users expressed their dismay that the purchase resulted in Google taking the Usenet archive off line and substituting an archive of Usenet posts that Google had been collected since August 2000. Googles beta version of a user interface for the archives was, many felt, quite inferior to what Deja had online. One of the noted problems was that the Google interface didn't have a means to view the discussions that a post was part of (known as discussion threads). Instead the posts were presented individually, in a manner similar to how one might present the results of a web search.
An article published in The Register on Feb. 13, 2001 expresses the frustration of users with the fact that Google had not maintained access to Dejas user interface and online archive until they got their own software developed. Subsequent articles in The Register on Feb. 14, 2001 and Feb. 15, 2001 included comments by Googles CEO Larry Page about why Google had not maintained the Deja archives online. He promised some would be back online in a month and the rest in ninety days.
Others in the Usenet community expressed their relief hearing of the purchase of the 1995-2001 Usenet archive by Google. They felt Google had developed a a good web search engine. This apparently gave them confidence that Google would be able to create a good user interface for a Usenet archives as well. They urged giving Google time to show what they would do.
Research Origins of Google Web Search Engine
A report at the National Science Foundation in 1999 explains that "the 'Google' search engine was developed by Hector Garcia-Molina's group at Stanford as an outgrowth of the DLI project." The development of the Google search engine was carried out as part of a Digital Libraries Initiative (DLI) research project at Stanford University in California. Several of those connected with this project are now working at Google either as technical advisors or as employees.
In a paper presented in 1998, Sergey Brin and Lawrence Page, at the time Stanford graduate students in the DLI project, describe the rationale for design decisions for the Google web search engine. Their paper The Anatomy of a Large-Scale Hypertextual Web Search Engine describes the recent commercialization of the Internet and the harmful effect this has had on the quality of web search engines (some of which were originally developed with NSF funding). "Up until now most search engine development," they write, "has gone on at companies with little publication of technical details. This causes search engine technology to remain largely a black art and to be advertising oriented....With Google, we have a strong goal to push more development and understanding into the academic realm." Later in the paper they describe another objective of their research. They write:
Another goal we have is to set up a Spacelab-like environment where researchers or even students can propose and do interesting experiments on our large-scale web data.
A design goal for Google was as a public research web search engine to provide a laboratory to pursue web search engine research. This 1998 paper also discusses how the proprietary activities of commercial enterprises do not facilitate the research and sharing needed to develop web search engine technology. The paper includes an acknowledgment of the funding of the Stanford Integrated Digital Library Project by the NSF, DARPA, NASA, and Interval research, and the industrial partners of the Stanford Digital Libraries Project.
What has happened to the goals expressed in this 1998 paper describing the design rationale for Google?
Instead of a publicly developed search engine for research into web search engine design, the authors of the paper have formed a start up company. They are now the President and the CEO of Google. Several of those in the Stanford University digital libraries research community are involved in the company. Stanford University is among the investors providing the funding for the company.
Describing such developments in testimony before a House Appropriations Subcommittee, the director of the NSF, Dr. Rita Colwell explains that the "transfer to the private sector of 'people' - first supported by NSF at universities - should be viewed as the ultimate success of technology transfer." She cites Google as the company which "is an excellent example of knowledge transfer from NSF investments in people." Formerly US law required research done at government expense remain in the public domain. Has this requirement been changed? How is it that a publicly funded research project is the basis for a private corporate start up venture by the researchers, their professors and their university? What is the effect on the nature of basic research funding when the fruits of its development are privatized by the researchers and their university along with the corporate partners to the venture?
How does this transfer of researchers and their research from the academic sector to the private sector affect the goal of Google research to provide an open process to support research development of web search engines? Examining what has happened in the acquisition by Google of the Usenet archives and software from Deja will perhaps provide some insight.
Responding to a question about why Google bought a Usenet archive, Craig Silverstein, a former Stanford graduate student and now director of technology at Google explained that the mission statement of the company is "to organize the world's information, making it universally accessible and useful." He describes how Google planned for a number of months to add Usenet data to its search engine data bases and over the past 6 months this goal became more and more a topic of conversation at Google.
According to Silverstein, Google started a conversation with Deja about the archives. However, after Deja sold off part of its company, the opportunity became available for Google to acquire the Usenet archive data rather than license it. No one at Google has revealed what Google paid Deja for the Usenet archive. Considering the goal of encouraging the sharing of research about web development that marked Googles early development, the process of internally deciding to purchase a Usenet archive rather than any obvious discussion with the online community suggests that the companys foray into the private sector has involved them in a similar black art that they observed as a problem of previous search engine development.
Online Petition to Deja about the Usenet Archives
Silverstein raised the question of why there was only one such Usenet archive. Also he said he wasn't aware of the online petition signed by more than 3850 users to urge Deja to maintain the Usenet archives or to transfer it to a reliable organization, preferably a public or nonprofit organization, if Deja could no longer maintain it. Many of those who signed the petition included comments with their names. This public online petition contrasts with the internal discussion and negotiations that Google carried out to acquire the archive from Deja. That those involved in the acquisition at Google did not have an idea of the concerns of the online community suggests there is a communication problem between Google and the online Usenet community.
Whether there are other archives of Usenet posts during the 1995-2001 period is not at the moment known. Steve Bacher is one of those who signed the petition to Deja. Comments from users like Steve Bacher are included in the petition. These provide an understanding of why more people didnt archive Usenet during this period. Bacher describes how he used to maintain an archive of Usenet at his site but that he came to rely on the Deja archive and discontinued his own, telling those who had used his archive to use Deja.
Another comment in the petition, by Ofer Even-Tour notes that Alta Vista had an archive that was discontinued. Ofer writes: "I wish Alta Vista would bring their Usenet Archive back."
Recognizing the problem of relying on one entity to archive Usenet, Paul Shaffer writes: "Who was sleeping when Dajanews became the chokepoint of usenet history???"
Reading the comments in the petition helps to provide an understanding of the importance of access to users of an archive of Usenet posts. Also several of those commenting propose the conditions they feel will be necessary to continue such access.
Among those signing the petition is Theodor Holm Nelson, author of the book Computer Lib. He writes: "This archive is a public resource which has slipped into private hands. It must be kept available for the public benefit."
In a similar tone, Kay Marquardt explains that "the content of the Usenet archive is public content."
Such concerns lead Lee Randolph to write: "where is the Andrew Carnegie who will endow 'free public search engines' for the new century?"
Considering the problem of how to maintain such an archive responsibly, Calfin Ostrum writes, "If it had been known that you would remove forever access to the Usenet archives, some other more public-minded organization would have come into being to preserve them. Like it or not, you have implicitly assumed a responsibility to provide these archives and you are going back on it. If you don't want to continue to provide them, you should 'fess up to it and then arrange to transfer them (for free) to whatever organization offers to take them. The Usenet archives are a major respository of a non-trivial part of contemporary culture."
Others point out that since an archive provides a public benefit it needs government support and funding. Ray Normandeau writes, "Maybe Government grants should be requested for upkeep." Echoing this sentiment, Brian McNeil explains that the "USENET archive... should *never* have been in private/corporate hands...give it to an appropriate educational establishment."
David McRitchie writes that if Deja could no longer continue the archive, it should be turned over to the U.S. Library of Congress as a working system.
Robert L. Collins explains, "The usenet power search power tool is invaluable to me. I use it more than any other link. If you can't find a way to make it financially viable, then perhaps you should spin it off as a non-profit and seek grants. It is a public good...government funding is appropriate."
Describing the value of the archive Kalle Valo comments: "Deja's news archive is essential part of Internet. Whenever there is a problem, news archive almost always has a solution. And even in many languages."
Considering the future online community, prompts Lee Coursey to write, "Future generations of Netizens will need this."
Since feeds of Usenet posts are sent to news servers at participating sites with new posts being added by users at the sites and older posts expired by sites, a Usenet archive can be compared to an ongoing accumulated global conversation. To determine how to archive such conversation is a research problem that some in the Usenet community feel requires a community approach. A post on the website slashdot.org generated a heated discussion about whether it was desirable to have the code for the user interface to the Usenet archive as open source. Also there was discussion about whether Google should make copies of the archive available to those who desired a copy. The slashdot.org discussion was a response to an article that appeared on the Wired website on Wednesday, February 21 proposing that Google provide a copy of the archives data to be maintained as a distributed system on the computers of a number of different universities. The article also proposed that Google open source its user interface so those in the online community could explore how to improve it. This proposal echoed a proposal made in The Register on February 13, 2001. Andrew Orlowski wrote:
But perhaps something as valuable as Usenet the words of ordinary Internet users is never going to be safe in private hands. Why not return it to its roots? The Library of Congress could administer the archive, and ensure it was a properly distributed system farmed out to the best Universities, who could produce ever more cunning hackish search tools? Thats not as much fun as shooting lasers at rockets, of course, but a lot cheaper.
Users on Mailing Lists and Newsgroups Discuss the Problem
There has also been discussion of what would be an appropriate way to maintain the Usenet archives on several mailing lists. One such discussion took place on the Community Memory mailing list. Some on that list volunteered to try to find an appropriate academic or non-profit institution to maintain the archives. One such possibility proposed was the Metalab ibiblio.org project at the University of North Carolina in Chapel Hill which was formerly known by the name sunsite. Sunsite was the name of the site, they explain, because they were originally funded by Sun Microsystems and still are along with other corporate partners. But they wanted a vendor neutral name to reflect the general nature of the information they archive. Another possible site proposed was the Computer Museum in California. A subscriber to the mailing list reported that he tried to contact Deja to inquire about the possibility of a copy of the archive going to the Computer Museum, but his inquiries did not get any response.
The newsgroup alt.fan.dejanews provides a forum on Usenet for discussion of what is happening with the Usenet archive. Several users discussed the difference in culture between a corporation which has an obligation to view a Usenet archive as a way to earn revenue and the needs of Internet users for whom Usenet and the Internet are an important means of communication unrivaled elsewhere in the world.
In a post, William S. Kossack describes his experience participating on Usenet and the implications of this experience toward understanding the nature of Usenet and the Internet. He writes:
If all we did was read archives then the internet would die tomorrow. The internet is about communication. Its about the guy in the outback that knows something about the software your using that nobody else does. Its about the guy with a different native language that needs help on a research problem. Its about the guy down the street that needs help finding someone to really fix his car. Its about people and communication between people that don't know each other and will probably never meet each other.
I've worked on problems where the best expert or at least the one willing to help lived in the outback. I've solved research problems where everyone working with me either lived in a non-english speaking country or at least in Chicago. I've gotten answers to car problems, camera problems, computer problems, health problems, and even met my wife via the net. The internet is not about archives it's about communication. Its about communication on a scale not possible by any other means.
Kossacks post poignantly characterizes the nature of the discussion and human-to-human computer-facilitated interactions which are possible because of Usenet and the Internet.
Will this Culture Clash Affect Usenet?
What will be the effect of putting a Usenet archive again under the constraints of the income producing requirements of a corporation?
Will this affect the precious human-to-human communication that Usenet and the Internet make possible?
If Google is willing to provide copies of the archive to university sites or other non commercial institutions, would this be helpful in making it possible to establish a form of user interface and archive access that support the continued growth and spread of such human communication?
While there has been broad ranging discussion in the online community about what should happen if Deja could not maintain a Usenet archive and much sentiment toward having the archive provided with a home with an academic or noncommercial institution, a decision to buy the Usenet archive was made internally at Google without any input that is obvious from the Usenet community. The lack of communication between the online community and Google on the considerations that are important to take into account in determining the future for a usenet archive is an example of the culture clash that Googles purchase of this Usenet archive suggests.
Another aspect of this culture clash between the online community and Google relates to any claim of Google to own the content of a Usenet archive. The postings on Usenet are different from much of the content of the web. While Google is indexing and providing means of searching the web, it does not claim to own the web pages or information it is indexing. With regard to a Usenet archive, however, the offer to license or purchase Usenet posts for a fee or to claim rights to ownership of the posts, is contrary to the understanding of users and their intention with regard to their Usenet posts. In general, those who post on Usenet consider their posts to be contributed to facilitate communication in the online community. Any companys claim that it has a right to buy or sell a compilation of Usenet posts presents a serious challenge to this understanding which has made it possible for Usenet to function over the years.
In his article Net Cultural Assumptions, first posted on Usenet in 1992, Gregory Woodbury stresses that people who post on Usenet are doing so recognizing that "folks on different machines *desire* to share information in an easy and timely manner, despite the spatial separation between them and the machines they are using. That is the persons using the Net to communicate *want to communicate* and are willing to cooperate in effecting that communication." That is the unwritten agreement. How Woodbury would feel about a company putting a copyright on those communications and calling them their property to be bought and sold, is not the subject of his article. But what effect will it have on Usenet when posts the online community has contributed for the purpose of communication are claimed as the property of commercial entities?
As Woodbury argues, those posting on Usenet in general consider that their posts are contributed to facilitate communication among Usenet users. Any company declaring that it has the right to the ownership of these posts, or to buy or sell a compilation of such posts, presents a serious problem for Usenet users and for Usenets continued development. Their actions can have a chilling effect on those who make the contributions.
In general, posts are covered by the Berne convention, agreed to by many countries, and which the US joined on March 1, 1989, protecting the right of the creators of the posts to their copyright. The Berne convention provides that once a work or idea is fixed in a tangible form, the creator holds the copyright to the form. No (c) or other notice is required for the copyright status. Users so not need this protection when they are contributing to communicate. Nevertheless, this copyright is a protection against any other entity gathering their posts and claiming ownership or the right to financially benefit from the copyrighted work of others, without the explicit permission of the contributors.
Whether Google paid money for the Usenet archives is not known, since they have not made the details of the transfer from Deja to them public. However, a spokesperson for Google has said that the company will consider the request to make a copy of the archive available to a nonprofit or public entity and that proposals can be sent to firstname.lastname@example.org
Tom Truscott, one of the co-originators of Usenet, provides a bit of a different perspective to understand the challenge the transfer of the Usenet archive presents to the community. He points out that those at Deja who developed the archives and the code for the user interface spent a long time thinking and working on them, and for most it must have been a labor of love. He suggests that creating a new user interface or search software for the archives will require that technical decisions be made which will require an understanding of Usenet and its nature.
For example, he writes:
1) citing a Usenet article - When I reference a Usenet article, I use the magic URL that Deja supplies for it. I have found them to be valid indefinitely. At least, until about a week ago. Will Google continue to supply permanent URLs? I sure hope so.
2) Ranking Usenet articles - I haven't tried the new google/deja search yet, but I've heard it doesn't track "threads" any more. Technically, this is quite important, as Steve Bellovin pointed out in The Register. A thread represents an interactive discussion, and so presenting the thread together and in order is good. But there is another way that usenet searches can exploit threads. Usenet articles are more transitory than web pages. But "followups" to articles which create Subject threads, permit a limited variant of PageRank [Googles ranking scheme for web pages-ed]
Describing the differences between web technology and Usenet technology that are relevant toward how one will do a search, he writes:
3) Searching Usenet articles - When doing a text search, google considers matches in the web page title to be more important than elsewhere, and text in a large font is more important than text in a smaller font. A usenet article does not have a title, but it does have a Subject: field. Usenet articles often contain "included text" which should be considered less important than original text.
He summarizes, "So, there are significant differences in the ways that pages/articles should be cited, ranked, and searched," and he asks: "Does Google plan any improvements, for usenet articles, in any of these areas?"
Truscotts comments are helpful in conveying how the level of understanding of Usenet will impact the design decisions that Google or anyone else who designs software for a Usenet archive makes.
There is another question, however, raised by the transfer to Google of the Usenet archive. This is the question of how important is it to maintain and develop the collaborative online community? How important is it to encourage cooperative contributions to a common pool of technical knowledge, software code, tools and other social forms that the new online community has developed?
JCR Licklider is recognized as the visionary who inspired the development of a the worldwide network of networks. In articles he wrote in the 1960s and after he explains why it is crucial to foster a collaborative online environment and contributions by users to a common pool of technical knowledge. The research on time-sharing that Licklider supported when he first went to ARPA in 1962 set the foundation for such a cooperative community. The early collaboration between the different Centers of Excellence that Licklider set up at universities in the US were the basis for the research to create the ARPANET. The creation of the ARPANET continued the development of this cooperative community.
The ARPANET mailing lists begun in the 1970s supported the cooperative communication that continued to develop. Usenet grew up in the early 1980s by building on the experience gained by those who had participated in the ARPANET mailing lists and by linking up with the ARPANET mailing list community. Together Usenet and ARPANET technical pioneers formed a vibrant online cooperative community and created a common pool of technical knowledge. They have given the world contributions as varied as the Requests for Comment (RFCs) and Unix tools. Even more important perhaps has been the ability of the online community to work together to solve the difficult problems of scaling computer technology and computer networking. Usenet and the Internet are crucial supports in making it possible for researchers to collaborate to understand and then solve the problems these developments present.
The problem that the online community is faced with is how to continue its collaborative communication and contributions? Do they need some broader support from academic institutions and governments toward this end? Isnt it a loss if research objectives are ended and the resources used to develop commercial enterprises as happened with the 1998 design objectives for the Google web search engine? Isnt there a need to find a way to support and encourage the integrity of the research community so that they can resist efforts to turn them and their endeavors into products for investor speculation? Those who are technical employees of private corporations will especially need a vibrant online collaborative community to help them overcome the difficulties that functioning in a proprietary environment brings.
Vibrant and functioning Usenet newsgroups and Internet mailing lists can help with these challenges. But what will it mean to the online community if these essential communication processes are curtailed or declared the private property of someone? This is one of the challenges now facing the online community.
This is one of the questions raised by the sale by Deja of the contributed posts of the Usenet community, and one of the questions raised by Googles buying these posts and suggesting that they have a property right to own them and to trade them.
How this dilemma will be resolved will be determined by how seriously the online community treats it. The petition to Deja and the various discussions both on Usenet and on mailing lists suggest that there are those in the Usenet community who recognize the importance of the situation.
Mit dem Schalter am linken Rand des Suchfelds lässt sich zwischen der klassischen Suche mit der Heise-Suchmaschine und einer voreingestellten Suche bei Google wählen.
Zum Wechseln zwischen Heise- und Google-Suche
Verlassen und Zurücksetzen des Eingabe-Felds
Direkt zur Suche springen