-
Website
http://www.scripting.com/ -
Original page
http://www.scripting.com/stories/2009/08/19/howToFixUrlshorteners.html -
Subscribe
All Comments -
Community
-
Top Commenters
-
eas
55 comments · 4 points
-
AndrewBurton
134 comments · 10 points
-
Michael Markman (Mickeleh)
154 comments · 14 points
-
Rex Hammock
52 comments · 9 points
-
malatmals
81 comments · 3 points
-
-
Popular Threads
-
Open is in the eye of the beholder. (Scripting News)
20 hours ago · 13 comments
-
Store Twitter URLs in earth's oceans? (Scripting News)
3 days ago · 16 comments
-
Why today's Twitter is like Napster in Y2K. (Scripting News)
3 days ago · 15 comments
-
If you wrote the words you own the copyright. (Scripting News)
3 days ago · 7 comments
-
How open standards are created. (Scripting News)
5 days ago · 11 comments
-
Open is in the eye of the beholder. (Scripting News)
That's because the person that's mostly interested in keeping a short URL alive and functional is the owner of the domain it points to. This is how my solution http://urlborg.com/a/urlborg_xml/about/ works. And it's nice, because "regular" users don't have to bother with such things: when the user urlborg.com, the generated short URL will be using a custom short domain, if the owner of the destination URL has registered one.
my $.02
Regular Joe user wants to create a short link. Site YYY.com, seeing the light of the short URL future, has shortlinks for all of their stories via ZZ.com/xxx. They control YYY.com and ZZ.com, so can keep ZZ.com up as long as they want. This is great and something Company YYY is providing for all their users. I suspect this is actually the use case Dave envisions, but _every single site has to implement it_ -- that won't happen.
Tech savvy Jane user wants to create a short link. Site RRR.com is old fashioned and only has one set of URLs: extraordinarily long ones. Jane creates JJ.com (using the above method) and creates links to RRR.com. These are Jane's links and these links will work so long as Jane wants. This helps Jane as she can grow her own brand, track how her own links spread, etc, but hurts RRR.com when Jane closes down JJ.com. Jane doesn't care, though. This is not so great, overall, but Jane would prefer this.
Real world users are stuck with link rot at some point. But as Louis Gray pointed out in a recent Google Reader comment, very, very few old links are clicked on.
1) This is switching out one point of failure (bit.ly) for another (S3). Actually it isn't even S3, it's the person or entity who controls that S3 bucket.
2) There's a finite amount of available domains that are short without being nonsensical.
3) Setting this up (registering + DNS + finding an S3 provider) is outside the scope of what 99% of twitters users either can or wish to deal with.
The only way to solve these three problems in a manner that's likely to gain traction is to provide them as part of a service, but then we're back to the situation where if that service fails then links are dead.
I think an alternative is to recognize bit.ly and peers for what they are - a DNS service that requires persistence. These services should be encouraged to provide static dumps of their mappings that can periodically be retrieved and used by third parties where resolution is required. E.g Google calculating pagerank, or even for handling a 404 error.
your Apache server and remap the CNAME. The data moves easily.
Don't know about your #2 and #3 except come on -- getting an S3 account is
as easy as getting an Amazon account, and that's not a huge barrier. If
you're worried about the people who can't get on Amazon, okay -- got it.
And you just need a sub-domain.
Nonsense names? That has not been a problem so far in URL-shorteners. :-)
Dave
Any solution that requires people to perform local backups to preserve integrity of shortlinks is a non-starter. Any solution that requires people to do more than type in a url and hit a button to get started is a non-starter.
There are hundreds of thousands, if not millions, of people out there generating short urls every day. Most of them don't give a fig whether tr.im or bit.ly ceases to exist in six months, nor spend 15-30 minutes fooling with DNS/CNAME/S3. To have any hope of being worthwhile any solution that is developed needs to recognize this.
The DNS records here don't seem to include a CNAME record for each short URL: http://network-tools.com/default.asp?prog=dnsre...
Could you clarify? Thanks!
Ex.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title></title>
<META HTTP-EQUIV="Refresh" CONTENT="0;URL=http://www.flickr.com/photos/scriptingnews/3782292158/">
<meta name="robots" content="noindex"/>
<link rel="canonical" href="http://www.flickr.com/photos/scriptingnews/3782292158/"/>
</head>
<body>
{Google analytics code}
</body>
</html>
The links are stored in your S3 bucket as a static HTML page with a meta-refresh. Winer wrote about it here:
http://www.scripting.com/stories/2009/04/27/adj...
One idea behind dropping the URLs in your bucket is that you could now make the URLs portable between URL shorteners that provide tracking stats.
For example, if you shortened a link with tr.im and they implemented this technique, you could then move to Adjix, in the future and Adjix would pull your shortened URL out of the HTML meta-refresh page, stored in your bucket, track the click, and then perform a 301 redirect.
In other words, not only could your bucket perform the live redirect if you wanted, it could also be a data store for your links.
Despite many code examples and articles on how to actually build a short trackable url redirect over the years, it's definately a hot topic now that short urls are so pervasive. And I agree that it should not be this way.... at least the reliance on centralized commercial endeavors. You can of course have your own short domains for your own content. Wordpress has several plugins and also the wp.me service. That is all moot to the concerns of link rot that is at the core of this topic.
So, using Amazon s3 is a great idea. Using CNAME is certainly an option. CNAME is known to be a bit costly and can cause a little lag but no doubt a common and useful way to map (and mask) domains to each other. In the early days of videoblogging/podcasting, Feedburner got very popular (2004/2005ish). I know, Dave, you have mentioned Feedburner as a reference to this topic several times. I too was quite vocal about people using their own domains and not publishing the feedburner domains (for obvious reasons).
What should also be mentioned here is the use of A Records instead of CNAME so you are not reliant on creating subdomains.
This is common for services that let you use your own domaiin to map to your url on their service. I have done this with Tumblr in the past.
Thanks for the update. And btw, I love your new ou.rs domain :)
Amazon S3 static IPs change from minute to minute and you can't set up a CNAME for a domain's root name. :^(
As they say across the pond, "Brill"!
The Web was fantastic because it was easy to edit; without a service like bit.ly to accomodate less savy users looking at less savvy writers, you are left with them being less quoted in Twitter. I completely agree that massive content platform should have that yesterday (if not because you have a creative friend, but because they want to control their own traffic).
http://adjix.com/tt3b
Sharing Amazon S3 buckets with third parties opens up a lot of possibilities. For example, you could have Twitter store every Tweet you've ever sent, as an HTML page, in your bucket. Or, Blogger could store every post you've blogged in your bucket. Etcetera, etcetera.
I'm not talking about mini-posts that were multicasted to Twitter and Blogspot at the same time. I'm talking 500-1,000 word posts that have bit.ly URLs in the middle of them. What the hell for? All it does is obscure where you're going, which makes me *much* more likely to not click them.
Maybe they enable better tracking? I'm not sure. Because I already track all my outbound links. I just don't understand why someone would take the extra effort to go out and obfuscate their links, when length is clearly not an issue.
[pant ... pant ... ]
Okay, sorry. That's been building up for a while now.
You map the CNAME to the URL-shortener. Later, if they go away or you decide
to leave them, you point the CNAME to the Amazon bucket.
Then you don't need to change the API. And I think we need to see some
convergence there. Since I've built on the tr.im API, and since it's going
open source, that seems the most logical to me. I imagine that bit.ly, the
800-pound gorilla won't see it that way.
Dave
As for URLshorteners, this idea can do great help to the current situation.But basically I think it's a more meaningful solution to loosen the limitation of character number on microblogging.
One key reason that people want to shorten URLs in blog posts is so they can see how many clicks each link receives, where it came from, by whom, etc. This is easiest for the layperson to do when using a URL shortener that tracks stats, as you mentioned in your post.
It's not only the fact that the link is shortened but rather the fact that it's trackable and you can brand your link by using your own domain name.
If you shortened a link with Adjix and you have your CNAME point to Adjix, then Adjix will track click stats while saving your links in your S3 bucket. If Adjix goes away, you can then just point your CNAME back to your S3 bucket and the link lives on.
http://www.scripting.com/stories/2009/04/27/adj...
One idea behind dropping the URLs in your bucket is that you could now make the URLs portable between URL shorteners that provide tracking stats.
For example, if you shortened a link with tr.im and they implemented this technique, you could then move to Adjix, in the future and Adjix, would pull your shortened URL out of the HTML meta-refresh page, track the click, and then perform a 301 redirect.
In other words, not only could your bucket preform the live redirect if you wanted, it could also be a data store for your links.
1. http://adjix.com/ty35
Basic, shortened, Adjix link.
Traffic stats collected by Adjix.
2. http://go.usna93.com/ty35
Shortened link with CNAME for go.joemoreno.com pointing to partner.adjix.com.
Traffic stats collected by Adjix.
3. http://links.joemoreno.com/ty35
Shortened link with CNAME for links.joemoreno.com pointing to S3 bucket: links.joemoreno.com.s3.amazonaws.com.
Traffic stats logged by S3 bucket logging.
4. http://urlpuppy.com/ty35
Shortened link with registrar domain name forwarding.
Traffic stats logged by S3 bucket logging.
One extra step is required to get #4 to work since you can't set a domain name's root to a CNAME. You must have your registrar forward your domain name to http://s3.amazonaws.com/YourS3BucketName.
Don't forget that you'll need to share your S3 bucket with Adjix to get this to work: http://adjix.com/tt3b
http://www.codinghorror.com/blog/archives/00127...
http://en.wikipedia.org/wiki/URL_shortening#Cri...
http://userscripts.org/scripts/show/52584
http://www.google.com/support/webmasters/bin/an...
On the YouTube video around 12:20 the presenter begins talking about how the site reputation does not transfer between domains.
So having canonical tags pointing to domains anywhere other than the current domain or sub-domain causes Google to ignore it.
ex. of good use of canonical element and bad.
a.blah.com/1137 -> blog.blah.com/2009/08/19/some-post-about-stuff
ad.vu/1137 -> blog.blah.com/2009/08/19/some-post-about-stuff
I think we all know that the actual code involved to make a short trackable url is simple.
Their are various methods but none are complicated. Some won't work as well as others.
But we dont need adjix or any 3rd party involved (besides your web host and domain registrar) unless you find value in that, and some will.
tr.im is releasing their code and data for this reason while also providing services/support.
Wordpress users can use Plugins listed here:
http://wordpress.org/extend/plugins/search.php?...
Here is a stand-alone script that comes with a WP plugin and an API:
http://yourls.org
Wordpress.com users can leverage wp.me - http://en.blog.wordpress.com/2009/08/14/shorten/
RR/W just posted about a few other solutions:
http://www.readwriteweb.com/archives/you_dont_n...
The good thing is that their are so many options for those who truly need to care about this stuff (until we can move on from this dark time of cryptic url usage ;)
I believe that we need *all* 3rd party link shrinkers involved, and using the technique outlined here, in order to make your links portable for when link shrinkers like tr.im or zi.ma close up shop.
i'm all for all service providers to do everything they can to make sure that the data is open and accessible. whatever approach works, works.
i think it's easy to mesh the discussion because their are those who:
a) have their own website/domain and want short trackable urls for their own content (ie. blog posts).
b) have their own website/domain and want short trackable urls for ANY content (any url on the web).
c) just want to quickly share links on social services like twitter and dont care about tracking, link rot etc.
d) run a service as a data store, stats collector and as a result, a traffic handler and do not want the trend of DIY to gain much momentum.
e) community-owned services like rp.ly (who first conjured up the idea) and soon to be tr.im (unless they change their mind again ;)
and any other variations. it's easy to misstep with comments :)
From what I've seen playing around with your solution, you would need everyone to work around the same name space, base32,64, 1-12? characters, otherwise there would be overlap when moving your short link files between services when they "close up shop" and you will have to create more sub-domains each time. {a,b,c,d}.ho.st
Ideally, you'd want all URL shorteners to use the same name space to prevent a token collision.
Otherwise, each URL shortener will need to check your bucket to see if you've already created a link with the same token before shortening your link. If so, then the URL shortener will need to choose a new token which isn't really a big deal - most provide that service, now, by letting you pick your own token: http://tinyurl.com/ReallyCoolLink
Here's what you'll need to do:
1. Create an S3 bucket named sho.rtu.rl.
2. You'll need to share your bucket with Adjix so it can save a copy of each link to your bucket. Details here: http://adjix.com/tt3b
3. Log into your Adjix account and click Edit Profile. At the bottom, just enter the name of your bucket.
That's it - each link that you now shorten with Adjix will be copied to your bucket.
If you want Adjix to track click stats then you'll need to point the CNAME for sho.rtu.rl to partner.adjix.com.
If Adjix goes down, for any reason, then you can simply change the CNAME for sho.rtu.rl to sho.rtu.rl.s3.amazon.com and the link will continue to work.
Please don't hesitate to let me know if this is confusing.
Thanks,
Joe
Using all the same token name space, while everyone has their own FQDN would severely handicap the ability for any one URL shortener to act alone in provisioning tokens without a clearing house. I sure as hell don't want to see an ICANN of shortURL.
It's too late for collision prevention, no one does it, and it's too time constraining. Also, to manually choose new tokens and then update all the accessible HTML elements for links already distributed sounds like a nightmare.
All the manual DNS changes can be a hassle should your provider not allow you to adjust SOA or individual TTL refresh, expiry entries.
Here's a tip, if we point our CNAME to your partner.adjix.com; you should be the one to update DNS records when you fail, not us.
BTW, I'll probably use the method anyway.
One other thing. You will need a record of the shorturls. They'll never break, but when you switch providers, the next system that generates your shorturls will need to know whether the shorturl they are creating is in fact unique, right?
In other words, think of your bucket as the link backup data store , with the host CNAME point to, say partner.adjix.com. You'd only have to deal with the non-301 redirect if something happened to your link shrinking partner.
That's what I mean about having a record of the shorturls. In case you do switch providers, they need to know that any new shorturl they create is a unique one.
Yes, the new provider would have to do a quick check to avoid a namespace collision of the token. The only way around this is if one authority handed out the tokens. Perhaps that's where 301Works.org will enter the game.
seo related:
http://www.seobook.com/archives/000297.shtml
a system i built takes advantage of static files (just txt files) and optional mysql data recording/backup. the static files, a unique one for each url, can be easily sent to s3 or any other server as a backup. they can even be served via rss feed so that anyone can subscribe and generate more backups ( i thought dave would like that idea ;). if your site crashes, your backup nameserver could direct people to your mirror server(s). wont get into redundancy here as their are various approaches... like just using Amazon EC2 or Linode etc. Point is, i like static file approach. though i dont create static html pages with the meta refresh like adjix, i still am able to track clicks, referrers and other data with no problem during the 301redirect.
sull
My strong preference is to stick with the same domain and compress the path and query string as much as possible. For example:
http://samj.net/2009/08/twitters-tweet-trademar... -> http://samj.net/1a
That way your user agents are saved at least one recursive DNS query (with CNAMEs it's two), a TCP 3-way handshake and a HTTP transaction and can get everything done in one go with keep-alive. In that case the performance impact would be negligible, and you can still do your tracking.
Whatever the mechanism we should definitely advertise the result using the rel=shortlink standard.
Sam
For example, let's say that your link was created by tr.im and stored in a bucket called go.samj.net. If tr.im closed up shop, you could then adjust your CNAME to point to Adjix and Adjix would pull your link from your bucket, track the stats, and do a 301 redirect.
I think more to the point, before we go about declaring "$solution has solved all the problems with URL-shorteners" we need to assess the proposal thoroughly (look how far rev=canonical got before being reeled back in).
Sam
The ideal solution should be part of the Internet. Perhaps an RFC needs to be sent out? I'm sure there's a PhD student working on something like this.
Sam
We'll be losing the "tracking" aspect of the current 3rd party short URLs, but this tracking is already quite unreliable given the non-unique nature of the short URLs. Perhaps it would be in the interest of Twitter etc to provide this click tracking capability. It'd still be unreliable, but would be far less damaging to the net infrastructure.
Regarding 3rd party tracking, what is the use case really (beyond feeding the egos of people whos egos don't need feeding). I realise that there are marketing benefits for the few of us who are in that industry but the cost of creating millions of links to each resource (rather than one or two - the canonical and short links) are many and not always obvious. I found today that browsing a twitter search for new content was almost impossible for example because my browser's colouring of visited links was broken.
Incidentally Wordpress blog software will be including shortening services (perhaps using wp.me) in its stats module soon... these will presumably capture stats as well, thus solving the problem.
Sam
Re 3rd party tracking, yes, the value is the ego boost, and also the ability to understand how something spreads across the net and who are the real influencers. For companies like http://tra.cx this is invaluable information. It could be provided by the conversational media (e.g. Twitter) or by the publishing infrastructure (e.g. Wordpress).
And of course, once we have multiple stat-providers, there will be companies that will create services that will aggregate all these stats in one place, creating a new industry segment, and so the cycle continues...
Suppose I'm using the shortener miniu.rl and one day they go out of business so I point the CNAME at my bucket on Amazon instead of on their host.
Then I go create an account on newmi.ni and all my new URLs go there.
Just like when I start a new blog. I leave the old one where it is. No need to merge the two (though people often get hung up on doing exactly that).
BTW, as I said in the post, worth repeating because people are saying that it's a flaw, just trading one weak spot for another. If Jeff Bezos changes his mind and decides to shut down S3, I was smart, I backed up my bucket on my laptop hard disk. Then I got an account at Rackspace and moved the folder there, and pointed the CNAME at my server over there. So it's not trading one weakness for another, it is giving me control of my data. Big diff.
unclear what happens if I stop paying my domain registration fee (the one that has the cnames in it). feels like dead links will still be out there.
Let's say there's a new httpr:// and httpsr:// (for http(s) redirection) URI. The browser would take that domain, do a NAPTR query that would return one (or multiple) new URLs that it would follow.
It's just like sip, ENUM, etc... but takes a url and turns it into another url.
So httpr://3tjf.me.com would get the browser to do a DNS NAPTR call to 3tjf.me.com which would return the new url, and it would follow that.
Anyway, this is all extremely hypothetical for many obvious reasons... :) (including the lack of tracking offsetting the gain in speed and reliability)
[Does anyone else find it ironic that we're using a 3RD PARTY COMMENTING SERVICE to discuss how to reduce reliance on 3rd party link shorteners?]
- URL Shorteners should provide either a feed or batch backup solution to export data to another link redirection service.
- Accept CNAME redirections as google does with their ghs.google.com service. Users could then register domains that they control and own (and can point them elsewhere if the service expires).
It would be far better to standardize on an export format and data/domain portability, than it would be for individual users owning and managing these kludgey S3 buckets.
See earlier comment addressing this exact subject in this thread.
In brief - I'd say to primary motivation is tracking/analytics when you are sharing links to sites that are not your own. Another reason I didn't list there, are that you are adding "features" to your link:
Adding Real-time Discussions - http://go2.me
Adding an element of fun and personalization to your link - http://www.quip-art.com
The next step is to make the URL data portable, which isn't hard - we'd just need to standardize it between URL shorteners.
So if one of the mainstream (or other) shorteners would allow me to use a private CNAME record when creating links, and would allow me to export my data - problem solved (in the "right" way).
That would be a really good idea. Then each of us could have our own software that shortens our URLs and those we send along, reduce the unnecessary centralization, and provide smaller URLS that can fit within the Twitter parameters. There are URL shorteners out there--mostly in the internet marketing arena--and I think they have much broader use than they're currently getting. Of course, we ARE a lazy species...