DISQUS

Scripting News: How to fix URL-shorteners, part II (Scripting News)

  • thepartycow · 3 months ago
    Shouldn't adjix return a 301 (permanently moved) and not a 200?

    $ curl -I http://c.oy.ly/kb2t
    HTTP/1.1 200 Apple
    Date: Wed, 26 Aug 2009 00:09:37 GMT
    Server: Apache/1.3.41 (Darwin) mod_ssl/2.8.31 OpenSSL/0.9.7l
    Cache-Control: max-age=60
    Expires: Wed, 26 Aug 2009 00:10:37 GMT
    x-adjix-location: http://www.tuaw.com/2009/08/17/mac-201-preparin...
    location: http://www.tuaw.com/2009/08/17/mac-201-preparin...
    connection: close
    content-length: 870
    Content-Type: text/html
  • dave · 3 months ago
    I don't think there's any "should" to it. It works. I'm happy. :-)
  • Robert Lowe · 3 months ago
    I assume it works that way to ensure the Google Analytics script, which is embedded in the page, runs.
  • sull · 3 months ago
    that was the original reasoning adjix had for doing this... a request from clients to utilize analytics code on the url.

    however, would this not be a better solution?

    How do I tag (and track) my links?

    http://www.google.com/support/googleanalytics/b...
  • nickhalstead · 3 months ago
    Dave,

    Interesting plan, but http://www.adjix.com/ is not doing proper redirects, it currently builds a page with a META REFRESH tag in to do the redirect, which only a browser can follow.
  • abrahamvegh · 3 months ago
    Every few minutes you call the Twitter API? And you want to know why it keeps going down...
  • dave · 3 months ago
    Everyone's grouchy here today!

    Abraham, that's how apps that work with the Twitter API work.

    Ask anyone if you don't believe me.
  • abrahamvegh · 3 months ago
    Nobody's grouchy; I am quite familiar with the Twitter API, and hitting it every few minutes makes the API and by extension the entire service less usable for everyone else. At a minimum, you should be waiting at least ten minutes between calls for something as trivial as that.
  • dai_vernon · 3 months ago
    I would argue that running a URL shortening service is a pretty non-trivial function so long as twitter is limited to 140 chars
  • myquealer · 3 months ago
    Abraham, what do you think every twitter client (Tweetie, Tweetdeck, Nambe, Seesmic, etc.) do? They are hitting the twitter API every minute or so to show you the latest tweets (whether you are actively reading them or not. And they, on an individual basis, are less important than Dave archiving his tweets. There are millions of those clients running and you're worried about Dave's one archiving script?

    Twitter allows everyone 150 API requests per hour. If you ask twitter nicely, chances are they will whitelist your account and allow you to make 20,000 API requests per hour. I think they can handle Dave's 20 or so per hour just fine.
  • abrahamvegh · 3 months ago
    Allow me to reiterate: I am completely cognizant of the way all Twitter clients work. All I'm saying is, if everyone backs off just a little from hitting the API then it's better for everyone, and doing what Dave is doing, he can afford to back off without losing functionality.
  • dave · 3 months ago
    Poor guy has no clue how the API works.
  • dave · 3 months ago
    Also, I may not be on the SUL or be a verified user, but I am on the
    whitelist and am allowed an unlimited number of API calls.
  • Jesse Stay · 3 months ago
    Why do they require you to enter your domain? Can't you just CNAME it? Are they doing some sort of prevention to prevent just any domain from CNAME'ing their domain? I'm very curious about this because I'd like to do something similar for some features we offer on SocialToo (unrelated to Adjix).
  • dave · 3 months ago
    It's tracking again (it seems to be the answer to all the questions people are asking here).

    Think about it this way. If I pointed the sub-domain to their server, they would know how which long URL to redirect to, but they wouldn't know who to give the credit to. The end result of all this linking is the Top 40 list, it's kind of a link-blog, fascinating to see what my 24K followers on Twitter and the retweeters and FriendFeed folk find interesting.

    http://dave.40twits.com/
  • Jesse Stay · 3 months ago
    Thanks Dave. Thinking back at it, that makes total sense and I was about to
    conclude that myself. That brings some great ideas though for the option
    I'm considering.
  • christian · 3 months ago
    I'm going to call that high productivity. Especially since you are in the middle of that crazy cold.
  • dave · 3 months ago
    Happy to report the cold is in remission. Otherwise I never would have gotten this much done. :-)
  • brianjesse · 3 months ago
    Dave, thanks for sharing how you've worked out your vanity url shortening.

    I'm curious about the Amazon bucket solution, it doesn't look like it's possible to use it with a root-domain such as oy.ly because a CNAME won't work for that. True?

    Thanks again

    - Brian
  • sull · 3 months ago
    this is true - A records dont work because s3 does not have a static IP address to point to.
    also, it has been noted that the best and proper way to handle redirects is with a 301, not a meta-refresh.

    i think this adjix experiment is fine but it is not how i personally would choose to do this.
    its a solution and its one that dave chose to try. nothing wrong with that.

    though i still do not grok why running your own software to do redirects and click tracking etc is not taking precedence over using adjix, tr.im or any other service. i'd like clarification on that. maybe i a missing something. but an ec2 server running your own code seems to be the holy grail here. in previous article, i pointed to some options - http://www.scripting.com/stories/2009/08/19/how...
  • Joe Moreno · 3 months ago
    Cost is the big reason. Running an EC2 server will cost at least $70/month. Using this S3 option will probably only cost about $0.25/month.

    It's possible to use a root-domain. See option 4 under How to do it:
    http://blog.adjix.com/2009/08/own-your-links-wi...
  • sull · 3 months ago
    good point, joe.

    however, for a public facing url redirection/tracking service (a startup), the cost is inconsequential.
    for ordinary folk, you can run a private service for your own needs or even let others use it too and you could get away with using a shared hosting service for $5-$20 per month... running side by side with your website/blog.

    as for option 4 - domain redirect.... oof. a redirect to a redirect is one redirect too many for me. but i suppose it is a workaround that wouldnt bother some. i have to imagine that search engines would not look upon this whole setup very well :/ thoughts?
  • dave · 3 months ago
    If you run your own software it's yet another web app you have to run
    forever.
  • dave · 3 months ago
    Correct.
  • Joe Moreno · 3 months ago
    You can do it with a root-domain. See option 4 under How to do it:
    http://blog.adjix.com/2009/08/own-your-links-wi...
  • crabasa · 3 months ago
    Dave, for those of us who don't know, can you please explain precisely what you would lose if Adjix went away? Or another way of putting it, what value do they delivery when they're up?
  • dave · 3 months ago
    Tracking. Tracking. Tracking.
  • crabasa · 3 months ago
    But... isn't that something that a halfway decent web analytics tool could provide for retrieval of static files?

    I don't mean to be a trouble-maker, but I just wonder if the "middleware" of a URL shortener is really adding that much value, given the trouble.
  • sull · 3 months ago
    i have cloned the core bit.ly (including tracking) without even using a database (uses flat static files).
    and it's done in about 100 lines of code.

    and click tracking is click tracking. date/time, ip, referrer url. standard stuff. i only spent a few moments on adjix.com but i did not even see these basic click stats. but maybe i missed where more detailed stars are or maybe their is a partner/api dashboard that is more thorough.

    anyway, glad you are happy with a solution. at the same time, it's good that your readers/commenters are engaging the conversation with other ideas.

    cheers!
  • Joe Moreno · 3 months ago
    Sull, Adjix tracks every single click, by IP address and referrer. Additionally, this is archived into a CSV about once/month. I'm not aware of any other service that does this.
  • sull · 3 months ago
    good to know. i just did not see that tracking data when i logged in. i only saw the # of clicks.
    great regarding CSV export.
  • dave · 3 months ago
    Me too. I've written a number of URL-shorteners. it's not the wriitng of
    them that matters, it's as I said -- the keeping them running. It's a pain
    in the ass, and I'd rather let someone else do it.
  • sull · 3 months ago
    fair enough. it's a good system that lets you leverage any service. i get it :)
    thanks.
  • hjmler · 3 months ago
    looks a lot like a solution to a problem almost no one has...
  • Zacqary Adam Green · 3 months ago
    As good an idea as this is, as are all the other things you come up with (generally), we can't forget that 99% of the planet isn't tech-savvy enough to be both interested in this and able to set it up easily. That's why big, private servers that could disappear at any time are so popular: someone else does the dirty work for you.

    I think what the Internet needs most is a widespread public interest in owning your own data, and the ability to do so just as easily as signing up for Twitter.
  • dave · 3 months ago
    Everything worth doing starts out complicated then you hack at it to simplify it as much as possible. The first car was a bitch to drive. As was the first blog.
  • cshotton · 3 months ago
    Is it really just as simple as pointing your DNS at the S3 site if Adjix fails? If so, why not just do the shortening yourself?

    I forget the name of the service, but there's someone out there who is simply writing static HTML docs with a redirect meta tag, saving the file with a "short" name, and redirecting to the long one. Seems WAY simpler to let a dumb Apache server offer up a simple text file from the file system (and use the httpd.log file for your stats) than to jump through all of the API ca-ca using someone else's service.

    What am I missing?
  • dave · 3 months ago
    Because I want tracking.

    http://dave.40twits.com/
  • brianjesse · 3 months ago
    An actual text file on the web server named "5" works well for this use case, but was trying to figure out if the S3 bucket could be used as well.

    http://bh.ly/5
  • Mason Lee · 3 months ago
    An improvement for use in Twitter for sure. You're no longer dependent on a third-party, and for the rest of us, your redirects are simple files that can be mirrored by archivists for when c.oy.ly goes the way of the dodo. (Does the bucket have a full public index?)

    If I may ask: You've suggested that Twitter.com should include full links in their feeds, rather than pack everything into the 140 chars. If they did that, would you stop using c.oy.ly, or will you find too much value in this new click stats architecture?

    I like the idea of public click stats. I'm glad you make your Top-40 report public.
  • Ravi Pinjala · 3 months ago
    Serious question: If a company as large and established as Amazon or Google created a URL shortener, would you use it? It seems that it would be exactly as stable as a S3-based solution, minus the hassle of redirecting through another layer of DNS.
  • dave · 3 months ago
    It's a good question. Before implementing this solution with Adjix, yes I would have preferred to use something from Amazon or Google. Now, because of the safeguards we've implemented, I'd prefer to use Adjix because Joe Moreno is a good guy, and he returns my calls, and when I ask him for a feature he usually says yes. I like Amazon and I trust Google, but neither of them consider me that important a customer or user of theirs.
  • Dean Michael Berris · 3 months ago
    Nice concept. I know that CNAME's and hosting on third-party storage providers would work -- heck you can even host it on your own servers if you really want to. But then you run into the problem (again) of your distributed storage of shortened links to be the single point of failure for all your links.

    Thinking about it a little more, if we all went distributed and each one had a separate storage for each of our shortened URL links, then the likelihood of failure of an individual node (in aggregate) is higher than if you just had a company that has the economies of scale to do it for you -- like Amazon, Google, Yahoo, Microsoft, [insert favorite internet megacompany here]. For instance, what's stopping any distributed link from acting bad and doing the wrong thing and redirecting to spam or porn? Don't you think we'll be in the same conundrum?