Interesting BBC news report of a press conference by the UK National Archives. Sooner or later we have to take the data mountain seriously
According to the National Archives, "If you put paper on shelves, it's pretty certain it is going to be there in a hundred years...If you stored something on a floppy disc just three or four years ago, you'd have a hard time finding a modern computer capable of opening it..."Digital information is in fact inherently far more ephemeral than paper....The National Archives, which holds 900 years of written material, has more than 580 terabytes of data - the equivalent of 580,000 encyclopaedias - in older file formats that are no longer commercially available."
I'm puzzled by this: according to this reference the print collections of the US Library of Congress only amount to 10 terabytes. Possibly the Archives keep their pages in some image format rather than text?
Microsoft, the main cause of the problem, have now re-positioned themselves as saviours, although the article points out that instad of releasing their own OpenXML they could ahve just got in with the existing Open Document Format. Still, it's progress of a sort.
The other point, of course, is that we generate so much data these days, a recurrent theme on this blog. I think the next big problem will not be how to preserve it, but what to throw away. We're living an equivalent age to that time in the USA when petrol, was cheap and the supply seemed endless, and the gas-guzzler was everywhere. At the moment, everyone who can generate data does so. (Consider yesterday's post about the SWS.)
Sooner or later we will have to get 'green' about data.