The web is a failed information management system.
What is odd about that statement is not that the attempt has failed – I don’t think I have ever heard of any other fate for an information management system – but that the fact of the attempt has been so completely forgotten.
Information is everywhere, of course. The public web, or at least some parts of it, is densely populated with links. Following a chain of them long beyond the answer to any question you might have started with is the road trip of the internet age. But beyond a still-thin surface layer, many end points of links remain resolute no through roads.
The idea of hyperlinks long predates the web. The hypothetical Memex engine dates back to 1945 and a recent article in the Atlantic takes the story back to the nineteenth century. More recently, everybody knows that Tim Berners-Lee invented the world wide web,1 but there is much less understanding of what it was he thought he was inventing.
Berners-Lee described the problem he was trying to solve in his famous paper proposing a new information management system for CERN:
CERN is a wonderful organisation. It involves several thousand people, many of them very creative, all working toward common goals. Although they are nominally organised into a hierarchical management structure,this does not constrain the way people will communicate, and share information, equipment and software across groups.
The actual observed working structure of the organisation is a multiply connected “web” whose interconnections evolve with time. In this environment, a new person arriving, or someone taking on a new task, is normally given a few hints as to who would be useful people to talk to. Information about what facilities exist and how to find out about them travels in the corridor gossip and occasional newsletters, and the details about what is required to be done spread in a similar way. All things considered, the result is remarkably successful, despite occasional misunderstandings and duplicated effort.
A problem, however, is the high turnover of people. When two years is a typical length of stay, information is constantly being lost. The introduction of the new people demands a fair amount of their time and that of others before they have any idea of what goes on. The technical details of past projects are sometimes lost forever, or only recovered after a detective investigation in an emergency. Often, the information has been recorded, it just cannot be found.
The solution he described combined technology and usability, recognising from the outset that people would use something which was attractive and useful:
The aim would be to allow a place to be found for any information or reference which one felt was important, and a way of finding it afterwards. The result should be sufficiently attractive to use that it the information contained would grow past a critical threshold, so that the usefulness the scheme would in turn encourage its increased use.
That’s a fine ambition which became an information revolution and it’s pretty clear that the ‘critical threshold’ was passed quite a while back. But the initial problem Berners-Lee described still sounds uncannily familiar today, and is still a long way from being solved. As I wrote a while back:
One of the purposes of this blog is to help me find things I half remember thinking five years ago. I have no equivalent tool at work for finding my thoughts, let alone anybody else’s. That’s an important reason why so much energy is devoted to the reinvention of wheels.
There has been a flurry of recent coverage for the brave study by the World Bank which shows that a third of their policy reports are never downloaded and almost 90% are never cited (though if I have understood their methodology correctly, my citing their paper on the citation of papers would not be counted as a citation, so the precise numbers should not be taken too seriously). But although the coverage has included wry comments about the fact that a report about how pdf documents are little read is itself a pdf document, I haven’t see any recognition of a more fundamental problem. The introduction to World Bank report is a statement of why knowledge and the sharing of knowledge matter, including (with the emphasis in the original):
Internal knowledge sharing is essential for a large and complex institution such as the Bank to provide effective policy advice. Bottlenecks to information flows create inefficiencies, either through duplication of efforts and diverting resources from knowledge creation itself.
With that thought it mind, it turns out that that report does not link to any of the published material it refers to. It has a long list of references, many of them to other papers by the World Bank itself, but in virtually all cases they are textual descriptions, not links.2 It’s a dead end not because it’s a pdf, though that doesn’t help, but because it is constructed as an end point, not as a node in a network.
I have laboured that point a bit not because I care greatly about the information management practices of the World Bank, but because I suspect they are distinctive more in the visibility of what they do, than in the doing of it.
Most of the material I see in my working life is self-contained and very little of it makes explicit connections to other information.3 There are two big reasons for that (as well, no doubt, as a host of smaller ones).
The first is technical. You can only link to something if you know where it is now. There is only any point in linking to anything if you can be confident that it will still be there next week and next year (and in some cases, next decade). That requires information to have a permanent, canonical location at an appropriate level of granularity and for the arrangement of information to be more durable than the arrangement of work.
The second is cultural. You will only link to something if doing so is seen as valuable (and if doing so both is and is perceived to be easy to do). Links are most likely to be seen as valuable by people who might choose to follow them. Following links is easy for somebody reading on a screen, but impossible for somebody reading on paper. Reading on a screen is easier if the material is designed to be read that way, not just in layout but in information richness. So there is little chance that links will flourish in an environment where most information is designed for presentation on paper (even if it is actually sometimes consumed on screen).
Any solution to the information management problems of organisations needs to address both the technical and the cultural issues. The technical solution is necessary, but wholly unsurprisingly, it falls very far short of being sufficient. Even with the network in place to support a much more web-like approach, we cannot hope to consume information that way until we start producing it differently.
But if we succeed, there are prizes well worth having here, which go far beyond better information retrieval. As Tim Berners-Lee speculated a quarter of a century ago:
In providing a system for manipulating this sort of information, the hope would be to allow a pool of information to develop which could grow and evolve with the organisation and the projects it describes. For this to be possible, the method of storage must not place its own restraints on the information. This is why a “web” of notes with links (like references) between them is far more useful than a fixed hierarchical system.
The need hasn’t changed in the last twenty five years. Perhaps we should try the solution.
- Apart from the people who persist in believing that he invented the internet. ↩
- There are precisely two clickable links, both to posts in the same blog – but bizarrely the links are to the blog’s homepage rather than the specific posts being cited, so even those don’t help as much as they should. ↩
- The one big exception to that is emails which contain long chains of their predecessors, but the less said about that the better. ↩