DISQUS

Scripting News: Once again, future-safe archives (Scripting News)

  • papyromancer · 2 months ago
    I'm totally down for formally archiving all materials I've ever produced and even merely recorded. I've been uploading my avchd footage to EBS on Amazon, and I've been considering using Kaltura to provide a way to attach metadata to the 'stuff.'

    Currently, Kaltura charges quite a bit for Akamai level storage of data. I propose leveraging s3/cloudfront and archive.org to provide levels of access based upon public interest in the archived data. Some sort of co-op should be formed to provide access to data while respecting non-commercial licensing (really any desired licensing) for the duration of copyright to gracefully preserve the value of works.

    The easiest way I see of achieving this is by creating some sort of redirection engine (I'd rather not say url shortener) that drives multiple content delivery networks. Redirect to archive.org when the data is accessed sporadically, redirect to ad supported(?) cloudfront for more popular data, redirect to akamai for data that's too hot to handle. any revenue generated goes toward supporting the network with net income distributed to the copyright owners or their descendants. The system would of course garbage collect itself to keep cost of s3/akamai distribution as low as possible.
  • jpdefillippo · 2 months ago
    This is one of those nagging issues that's been driving me mad too. I've got a few websites of friends who've died that I am keeping in trust until I can find a place to store them that's long term. It's an expensive proposition and I thought that archive.org was going to be doing something about this but the issues with domains switching hands and their foolish handling of robots.txt files make it almost useless for long term archiving which is quite ironic.
  • dave · 2 months ago
    It could even be a mission for Yahoo, assuming they had some way to
    assure us they''d be around 20 or 50 years from now. Google or
    Microsoft -- I would never trust them. They throw their weight around
    too much. It would have to be an organization with a strong public
    service component. Not that Yahoo has that, but what choice do they
    have but to find a new way to be.

    What's the story on archive.org's "foolish handling of robots.txt?"
  • jpdefillippo · 2 months ago
    Yahoo's the reason I'm building DeathVault.com! Their TOS won't let families of the deceased even have access to their email. I don't trust them for an instant.
  • Ohrid_accommodation · 2 months ago
    totally agree. yahoo sucks.. :/
  • dave · 2 months ago
    Okay scratch them off the list.

    What's the story on deathvault.com?
  • jpdefillippo · 2 months ago
    DV's a secure, time-released password/data escrow service I've been playing with launching for a few years. There is another competitor in the space but they're doing it all wrong and it's something I need so I'm building it. Just trying to find good, rock solid hosting out of the jurisdiction of as many governments as possible.
  • Ravi Pinjala · 2 months ago
    Out of curiosity, is there any reason that data needs to be kept out of specific jurisdictions, or is it just that it needs to not be in only one jurisdiction? If it's the latter, it'd be interesting to try to mirror across several hosts that are in completely separate jurisdictions. For static data, file synchronization problems should be just about nonexistent, since there can't be conflicting updates.
  • jpdefillippo · 2 months ago
    On the robots.txt thing... if a domain currently has a restrictive robots.txt file then old versions of the site that didn't have one or had looser restrictions are inaccessible. So sites I built fromthe old days that have domain squatters on them now are dead in the archive. I wrote them several times about it but they stood firm on it hence my declaring it a foolish policy.
  • dave · 2 months ago
    Oh that is super foolish!

    What are they thinking.

    Oy.
  • evanwolf · 2 months ago
    Yahoo!'s choosing to shut down Geocities to save money.
  • mhaeberli · 2 months ago
    Dave,

    Thought-provoking. Arguably the gang at long now are also thinking about these things...

    Best,

    Martin Haeberli
  • hardaway · 2 months ago
    I think about this all the time. I'm the beginning of the digital age,, and all my photos are on Flickr, SmugMug, whatever, and on various drives. Many of those are pieces of history, some of more interest than others. But the job of curating everyone's archives is overwhelming, and perhaps cost prohibitive. An issue like this has to be managed by a team of historians, philosophers, librarians, and technologists, and perhaps professional curators. Will there be a career one day called "life curator," a person you can hire to work with you on what's worth saving from your life? Like the co-author of your memoir?
  • Derik · 2 months ago
    In as much as I usually leave things I do not understand to others, here is a site http://bit.ly/16GOpl that was probably onto a similar idea (albeit, different requirements). As an architect, we could define the requirements, outcomes and use cases. For actors, we have both those gone and those left. I think the biggest challenge is the legal side. The technology is probably a no-brainer. Interesting project anyhow. Keep me posted since I could provide 2cents.
  • James · 2 months ago
    I have little idea how to pull this off but assuming the content is OK for public domain then a distributed "vault" seems a brilliant solution.

    I suppose i've got a bit torrent type of idea in mind. Trackers run which you can point personal domains too. e.g. vault.scripting.com.. The tracker has a list of clients, each with a bandwidth rating, which hold the content of vault.scripting.com and in turn request from that IP and this is delivered to the user. The great thing here is that your content can be stored by many hundreds of different sources, big or small all over the world.

    It's a rough thought but it negates the need of any specific individual / company. I think there is a major flaw in my idea because you could potentially compromise content if you're a client but im sure there are CRC checks or some levels of encryption you could employ
  • Zacqary Adam Green · 2 months ago
    You might want to check out what the Archive Team guys are doing. Right now they're all frantically downloading Geocities as fast as they can before Yahoo closes it forever, but that's just the current project.

    They're essentially a hobbyist group, so hardly the well-established institution you're looking for (and I'm looking for too, frankly), but I'm sure you'd find their work interesting.
  • AndrewBurton · 2 months ago
    I wonder if you could use DNS as a distributed storage system. I hadn't known about the TXT type of record until someone mentioned it here, but...and this may be a bit out there...if you gave a subdomain to every image (pic1.domain.com) you could then store base64 encoded binary data in sequential sub-subdomains of the image (part1.pic1.domain.com, part2.pic1.domain.com, ...). Since DNS files are replicated all over the 'net, your data would be stored in servers, routers, and firewalls all over so long as your domain stayed active.
  • adean · 2 months ago
    You aren't the only thinking about it, for sure. About a year ago, I worked on a proof-of-concept app for http://emortal.com/ (I don't know the current status of the venture).
  • Stanley_Krute · 2 months ago
    Yes, this is important stuff to work on. I'm currently putting ~5k photos on the 'net each year. I'd love to find a way to endow their continued existence.
  • dave · 2 months ago
    That's the most concise way to put it. "Endow their continued existence."

    It's more of a financial thing than a technical thing, isn't it.
  • Stanley_Krute · 2 months ago
    Yes. Financial and legal. I will run the question by a friend who's a good wills & trusts attorney, see what ideas pop into his head.
  • Joe Moreno · 2 months ago
    A business model similar to a cemetery's might be one possible solution to support the costs indefinitely.
  • elasticthreads · 2 months ago
    Having just read about Microsoft/Danger's massive loss of Sidekick data (http://tr.im/BsU8), and considering Murphy's law, Twitter/GMail/Everyone else's occassional outages, the idea of placing such priceless data into centralized, coorporate hands is scary. (the information age has now lasted long enough that data is now as much of an heirloom as jewelry, art, furniture, and soon to be much much more commonplace an heirloom). And scary not primarily for privacy concerns, but for archival concerns.
    Dave, your post has touched on a huge issue. One that's only just emerging. The answer must be distributed, open source, and secure. Until such a solution exists, printing on archival paper is still the safest bet.
  • jeremy toeman · 2 months ago
    When we were planning Legacy Locker the idea of building a feature wherein users could invest in their digital archives came up numerous times. Our fundamental flaw with the vision is the inability to guarantee anything into "forever". We figured out all sorts of models around annuity structures that were "theoretically forever", but then there's a terms of service issue. How can you guarantee it? Bottom line is you can't... Even if you start with a ridiculous amount of money, there's no "automatic forever" solution - human intervention is completely required for it to work. And if you are going to involve people in the management/processing, we decided it wasn't a "tech" problem to solve...
  • Tim · 2 months ago
    This may not be the service Dave imagines, although I could misunderstand him:

    I'm astonished nobody proposed national libraries. Most national libraries get their contents by legal deposit: two copies of every book or other publication (map, “something”) published must be given to the library. There's no reason not to extend the concept of a (hopefully voluntary) legal deposit on web publications.

    In fact the National Library of Germany is bound by law to exactly that: to collect and archive online publications. It's obiously not perfect (nothing could be) and they are still developing procedures and won't collect for some time but what I see look's promising: static content, strong metadata, persistent identifier and in the future automatical harvesting using open protocols of the Open Archive Initiative. If you just want to get archived and don't particulary care for continious service (like Dave seems to wish), this should be quite good enough.
  • Martin Petersen · 2 months ago
    Dave, what do you think will be the volume of data you want to archive and what timeframe do you expect?
    1 TB for the next 50 years or something like 100 TB for 300 years. I think its the timeframe thats the real problem. Technology and financing set aside, there are a very few private projects that have such a stamina.
    Perhaps its my european thinking but I can not imagine any private istitution that could asure an archive over such a long period.

    Martin
  • SassyPoppet · 2 months ago
    I would be more than willing to help with this. I'm always amazed at the timing of things I come across and how the big picture becomes fairly attainable by coincidence... I just registered on Ancestry.com.. and was thinking the same thing.. how will those after me know that I have a picture site, or a blog? Granted, I know that my daughter will know, but will she be interested in maintaining it, or having access to it etc? It's a very interesting paradox - I almost think it will be much easier to lose our collective histories over time if we're not careful... and how do these electronic archives get maintained? I believe in some ways it's even more critical figuring this problem out as we move away from paper in an almost permenant way.

    I like the idea of academics being a part of it.. but what about an organization such as the Smithsonian, or something along those lines? Heck,dare I say an org like Google? While it has been mentioned that legal and financial are huge considerations - when it's all said and done, the underlying architecture, security, dr/coop and maintenance should be focal. And, while I kinda like the thought of my "knowledge" and life experience being immortal, it will only remain this way as long as someone is interested in having access to my stuff :)

    So yes, I would be happy to volunteer - and while I'm not an tech-uber-geek (which I fondly call "TUGS", I've got a knack for planning needs based on user functionality liasoning with the TUGS, and I'm a phenominal organizer / scribe, albeit a modest one ;)
  • jimmason · 2 months ago
    how about nypl, they have a digital collection of picture of nyc 1900's. or the smithison who also have the tech knowlege to achive your goal. by the way what happen to ap pictures they do not apear any longer. my window to the rest of the world.
  • AAfter Search · 2 months ago
    Interesting discussion. Here are some 'crazy' ideas...
    1. How about storing them on a p2p network of computers dedicated to this purpose. Legal issues and abuse need to be controlled by some one. A Wikipedia like non-profit structure may last longer.

    2. A few timecapsules with the information around the world and may be in the moon as well :-) [if everything fails ]
  • rob ocallaghan · 2 months ago
    Interesting we had a discussion at #twuttle about what happens to your online data and id after you die.

    @fellowcreative has a project called deathbook which is aimed at asking questions in the UK about how people can ask for their online data and id to be handled after death - i.e delete/archived/split up and passed on etc etc.

    ATM an email address may just get reused if you pass away perhaps passing on some data (think the twitter hack). What happens to your facebook account - should it stay around so people can still post to your wall or be removed asap as it it causing distress....
  • la5rocks · 2 months ago
    Very much an interesting problem! I've run most of my online life from my own server, so I'm not tied to the whims of others.

    I'm more of the opinion that the online "me" would be archived in some offline manner, since at some point I will be very much offline as well! The problem is of course how to best store it all? I agree the process very much matches what libraries, etc. have to deal with, and I think some key ideas might be:

    1. Pick the best. This is very subjective, but what is Ansel Adams know for? Thousands of good images? Nope, just a few outstandingly great ones! What are the stand-out milestones of your life's work? With all our ways of getting metrics, this shouldn't be too hard to figure out.

    2. Diversify. Not just putting content in various online locations, but also in various media. Figure out what you want to spend, and how you expect your loved ones to be able to get to it. Your guess is as good as mine, but hopefully one or more will work.

    3. Share. There is such a thing as too much centralization. So often throughout history we find gems of historical documents squirreled away in someone's attic, even though the originals are gone. Get your results from step #2 into as many hands as are interested in having it.

    I'll be watching with great interest to see what comes of an online solution!
  • davidrindc · 2 months ago
    Have you looked through what the Library of Congress is up to? www.digitalpreservation.gov My guess is they would need a lot of support and funding to ramp the system up to the level you are talking about.
  • mediadancer · 2 months ago