-
Website
http://www.scripting.com/ -
Original page
http://www.scripting.com/stories/2007/12/10/futuresafeArchives.html -
Subscribe
All Comments -
Community
-
Top Commenters
-
eas
55 comments · 4 points
-
AndrewBurton
134 comments · 10 points
-
Michael Markman (Mickeleh)
154 comments · 16 points
-
Rex Hammock
52 comments · 9 points
-
malatmals
81 comments · 3 points
-
-
Popular Threads
-
How I develop formats and protocols. (Scripting News)
1 day ago · 11 comments
-
Open is in the eye of the beholder. (Scripting News)
3 days ago · 13 comments
-
Store Twitter URLs in earth's oceans? (Scripting News)
5 days ago · 16 comments
-
Why today's Twitter is like Napster in Y2K. (Scripting News)
5 days ago · 15 comments
-
If you wrote the words you own the copyright. (Scripting News)
5 days ago · 7 comments
-
How I develop formats and protocols. (Scripting News)
I ended up doing was creating a USB stick for my wife that includes my Flickr/Blog/Hosting/Server/Facebook passwords and billing information for my wife. Along with that I included close technical friends that will hopefully help her keep it up and going in the near term.
This buys me a bit of time in case I get his by a bus tomorrow. Right now I'm working on a solution to pull out my content from Facebook those and stream it back to my blog cloud, etc and hopefully back it up to S3. Definitely not something for the average consumer.
*sigh*
I'd like to see them do two things to start.
1. Provide us a way to define an index page. The page it displays when someone goes to http://somesite.com/. Amazingly now there is no way to do it. So I couldn't use S3 to store scripting.com which is a totally static site.
2. I'd like to be able to pay them say $10K and get a persistent SLA for a site like scripting.com. I don't think the bandwidth bills for such a site come close to the interest generated on $10K, so it could be perpetual.
I would be concerned about Amazon's longevity though. They appear to be a solid company, but how long will they be around? Will they survive Bezos??
Obviously this is not a problem just in the consumer space. In many industries, there are requirements to preserve information for 7 or more years. While it may be possible to preserve things at the byte level via tape backups, it is often impossible to retrieve because the software and hardware are obsolete and not available.
1. A sufficient map of each of the entry points of your stuff.
2. A crawler which can scan each entry point deeply, without missing any assets
3. A proxy which caches the responses obtained by the crawler, and saves them in a resource structure which is accessible by the originating requests. The storage format should be standard, and portable.
4. A perpetual domain registrar which maps storage instances of the proxy cache format to web domains via the mapping from item 1.
5. A perpetual storage service.
6. A financial trust to keep 4 and 5 running (archive.org?)
with this i can take my posted items from facebook, my items in my flickr feed, etc and publish to blog store. after that I'll look to pushing to S3 or maybe an MSN space too. ;-)
anyone know of a tool or a service I can use to do this?
What happens when someone goes to one of Marc's sites -- how would they know to go to archive.org?
Have you tested it your assumption? Have you looked to see how complete an archive they have of your sites? (I have, it's not complete.)
What I want is better than what archive.org provides. It could be part of the solution, but if it were the complete solution, we'd just host our sites there, and that would be the end of the problem. But we don't -- for good reason. They don't have the technical answer, they don't host your domains (and is there a way to pay a fee for a domain that lasts for perpetuity?) and they don't provide any guarantee that part or all of your archive will be maitnained for any period of time. Even if they did, you'd have to ask what's the likelihood they'll be around in 50 years, or 100 years to fullfill the terms of the agreement? That's why the longeivty of the institution doing the archive matters too. Harvard parterning with archive.org might be a good solution. Add a SLA in there, and we might be getting close to the answer.
What if those who cared could communicate a policy to archive.org. Maybe they could say "archive this, but don't make it publicly available for 5 years from the date you archived it. Or, archive this, but "hide it from public view until 10 years after it disappears from the web."
In all seriousness, I appreciate you talking about this topic. It's made me think about some things I'd otherwise have not considered.
Why not host your content on big companies' free services? They're likely around for a while and incented to keep your stuff up.
I don’t get the $ argument. Why do publishers still print Shakespeare books? Because people still buy them.
Speaking of which I gotta write to Lawrence and see if he has a copy of all my earlier blogs on some hard drive somewhere.
As a photographer, I am thinking about preserving images that I take. Hard-copy works well for this. But as we all have encountered with attempting to digitize every bit of hard-copy information these days and become "paperless" in our lives, hard-copy information degrades with time. It's a catch-22 either way you go, for my situation.
On the topic of archiving content, such as scripting.com, I think you could digitally archive textual content relatively easily. But as Dave has already pointed out, who do you trust with that information? How long will a company/service/whatever be around? There's no guarantee.
You've definitely provoked thought in my mind; we all (those of us that wish to maintain some type of legacy after we've passed) have a problem that needs solved, and relatively quickly. There's no telling when our day will inevitably come.
It begins like this:
"(To make things clear from the beginning, this piece is not about Dave Winer)
It's time to add "death" to our thinking about the Internet."
http://info.org.il/english/mail_to_the_future_d...
So therefore automated dead man's switch type scenarios don't work because they take too long. Some form of Escrow service that you can keep updated and give access to a few select friend or family members would be good, along with some basic instructions, such as "take down flickr immediately" or "keep my blog going as a memorial" or something to that effect.
But even that doesn't solve the problem of way long term storage of things of historical significance. The National Library of Australia has some websites they are archiving as significant resources see http://pandora.nla.gov.au/about.html but even that would be miniscule.
So I suppose at the moment it's up to each one of us individually to have a personal plan kept in place with a few trusted people that know about it. I like Jeff Sand's idea of the Memory Stick but I would like to see something available on the Web.
Looks like there are some folks doing something like that already (blurb.com, I don't know anything about them.)
I have 5 years worth of posts and scraped it into a nice big file, that's a bit of a hairball...now what?
Excellent topic
"The dead web - Google + Archive.org? "
http://tinyurl.com/2jdnmu
Books require no attention, except a decent fire-safe environment. Your work would be much more likely to survive if you have it printed in book form and donate copies to a wide spread of major libraries : Congress, British Library etc.
I doubt Amazon, Google etc. will be around in 100 years, but there's a very strong chance the libraries will.
What keeps the great writers alive and in print, of course, is public demand. Without that, anyone's work is doomed. The best we can do is to give it a chance by putting it in a place of assured safety -- books in libraries.
These formats should not be moving targets. They need to be stable and easy to deal with.
We'll will need to update the list - certainly the current pdf format can't hold future holographic data (or even modern video). But this process shouldn't happen too quickly. Maybe a 10 or 20 year time frame for adding a new format would be appropriate.
And obviously we need open formats with multiple open source readers available.
These latter issues just as necessary to deal with as the technology side of things. For instance, even if a web archival system is entirely voluntary, should there be restrictions on the use of content? Should a formula of standard practices developed for the preservation of blogs and other websites for people who do not specify how their legacy should be maintained once they pass? Should family members have a say in what is or is not preserved?
On the technology and organization side of things, as people have begun to suggest, the issue of how web material should be preserved must be addressed. Should websites be maintained in a static or dynamic form as-is, i.e., maintaining the same URIs and preserving everything intact, without necessarily indicating that the site one is viewing is 'historical'? Is a particular degree of centralization or distribution of repositories desirable?
It is also worth considering drawbacks to what might become a de facto automatic archival system that follows an author's death. Should we take any measures to avoid extreme data clutter that could develop over the next century, or less, as hundreds of millions of new creators take to the web? What do people involved in the search engine business have to say? (Do they think that many fiscal quarters in advance?) Should we even care, or do we just let them adapt?
--Steve
Interestingly enough, the social networks may end up being a solution for this.
Really good to see you hammering on this problem.
Writing may be kinda solitary, but preservation has to be collaborative, and there are a lot of disparate efforts underway now to find solutions. There are a lot of subtle aspects to this, ranging from file formats, emulating applications, broken links to things outside the personal collection, and domain registration, to sustaining institutions over the long term.
Alfred de Grazia (who did computer based social network analysis in the early 1950s using punch cards) came up with an economic model he described at http://www.grazian-archive.com/projects/archvpt....
Cathy Marshall at Microsoft has done a lot of excellent working looking at user behavior, i.e. the human dimension, and how things really get lost. See Evaluating Personal Archiving Strategies for Internet-based Information at arxiv.org/pdf/0704.3647.pdf