Directories and shared memory
Randomly ran across this discussion of online directories like Google today, and something caught my eye:
So why does Google work so well? Because it's open and fair and seems smart (it's just a computer of course). Its algorithms figure out what the Web has to say on a given subject. Collectively we believe in Google, it's our memory, it's the way we share.
I don't believe in a "genetic memory", or any of the other theories about shared memories I've seen (not that I've done an exhaustive review, you understand). But it seems to me that as the human race moves forward into a more and more interconnected world, one where vast volumes of information are nearly instantly accessible, we will develop one.
Not one based on genetics, or on cultural memes, but one grounded in databases and hyperlinks, grown out of the tangles and cycles of references we all make to each other, every day. There are already efforts to create historical archives of the Net, in the realization that information is lost with every revision of a web page. Myself, I think these archivists -- these contemporary archaeologists, digging frantically through our trash piles even as we toss out today's informational leavings -- are doomed to failure; the Net grows and changes far beyond even Google's ability to archive it. Since the last time Google cached my web page, it's changed perhaps several hundred times. While doubtless no really valuable information was lost by that lack, it does serve as an example.
Instead, I think the Net will more or less [have to] archive itself; the important bits, anyway. Before any written records, human culture survived centuries without any conscientious archivers creating indelible records. The important information -- the stories, the toolmaking methods, the survival tactics -- were communicated via daily use, and refreshed via daily use as well. While the invention -- and wide spread -- of writing certainly served to protect human works, we are now fast approaching an age where we produce more information than we could possibly preserve. In fact, we may well have entered that age already. Obviously much of this information will be trivial; perhaps most of it will, except to net historians. The question is how to identify, and preserve, the valuable information without attempting the impossible task of either sifting through all of the information or archiving everything (and every version of everything!).
The answer is already visible in two places. The first I've already mentioned: Google, and similar services. By "similar services" I mean automated agents which try to identify, index, and provide useful information on demand. What makes Google so very interesting (at least to me), and so illustrative of the second place, is this: Google isn't actually automating the identification of useful information. Google's PageRank and indexing mechanisms use, as one measure of information's value, how many times a piece of information (a web page) has been referenced (linked to) by another piece of information (another web page), inevitably written by a human!
The Net, much like preliterate human culture, archives itself, preserves itself, through some process of evolution -- some process of winnowing which selects information for reproduction based on its usefulness or value. Google's success is built, very frankly, on realizing that this dynamic exists, and figuring out a way to capitalize (quite literally) on it. I think this is really the future of information science, at least the foreseeable future; not on trying to duplicate the abilities of the human agent, but in complementing them and in mining them for metadata.
I think the rise of weblogs has pointed up this phenomena very sharply; the major drawback to Google's method is that it takes bloody forever (very precisely measured) for most things to get updated in Google's index. You're never going to find up-to-the-minute news and discussion on Google. It's fantastic for static information, for older information; it's hopelessly inept at current, dynamic information. People, on the other hand; human agents have several million years of design invested in them, design aimed at producing something quite expert at dealing with changing, current, dynamic information. Nothing hits the air without ten or twenty weblogs (and that's just counting the major ones, leaving aside the minor ones like my own) weighing in on it, digging for more info, centralizing the vast, flux-filled database that is human knowledge. The medium itself practically guarantees that the fringe can speak as loudly as the mighty, that the skeptics and the whistle blowers can raise their voices to add to the establishment's and the defenders'. That is true so long -- and only so long -- as services like DMOZ and Google are open, and free. So long as they are, the Net can archive itself, can rank its own contents, its own relevance, and we can all gain by it.
Addendum: Obviously I was wrong about never finding news on Google. Clever fellows.
 The huge, glaring, painful exceptions are, of course, the only things in cyberspace that aren't free: servers and bandwidth. It's hard to get the message out if your server is drowning in requests and your ISP is banging on the door demanding payment for the 4.3TB of data that just gummed up their pipes.