Eric Wong’s mostly pure-Ruby HTTP backend, Unicorn, is an inspiration. I’ve studied this file for a couple of days now and it’s undoubtedly one of the best, most densely packed examples of Unix programming in Ruby I’ve come across.
Unicorn is basically Mongrel (including the fast Ragel/C HTTP parser), minus the threads, and with teh Unix turned up to 11. That means processes. And all the tricks and idioms required to use them reliably.
We’re going to get into how Unicorn uses the OS kernel to balance
connections between backend processes using a shared socket,
fork(2), and accept(2) — the basic Unix prefork model in
100% pure Ruby.
But first …
This will be my first talk at a major conference.
An illustrated re-introduction to HTTP caching with a focus on gateway caches and their potential benefits within the context of modern, dynamic web applications.
Real HTTP caching for Ruby web apps.
Today it occurred to me that, after a little over ten years of basic fluency in HTML, I have absolutely no idea why the href attribute is named “href”. Why not “url”, “link”, or even just “ref”?
And why we need more three-legged stools.
It’s not a robot thing.
Use this to get kicked out of the party.
We should be doing more of this:
If you’re building a modern website then you’ll be needing some javascript libraries and css.
Rather than hosting these common libraries on your own server, you should Use a Content Delivery Network. Lucky for you Google, Microsoft and Yahoo host a range of popular javascript and css which you can directly link to for free. This saves your bandwidth and speeds up your website load time.
The great thing about the shared CDN approach is that the resources are cached once and reused across all sites, often without even making a validation request.
I’d seen the JavaScript libraries before but I’d never considered using this approach with CSS. The YUI CSS reset is a perfect example of where a shared CDN provides the most benefit. If every site that employed the basic CSS reset used this URL, it would effectively be baked into the browser with no overhead after the first request.
Interesting use of node.js as a sort of HTTP reverse proxy. It uses redis based queues to communicate with backends instead of establishing a direct socket connection and doing HTTP:
This spike uses node to put messages into a (redis) queue. Ruby background workers read from the queue, process the requests, and respond on a different queue. When node receives the response from the background worker, it sends the response back to the waiting user.
I assume this adds a not insignificant amount of latency to each request but would also make possible a bunch of long-running connection features. For example, the response (or portions of the response) could be delivered from separate worker processes. This style of architecture, where the client connection isn’t tied to backend web process, looks promising. The nginx_http_push_module is another example that gives the same types of benefits.
“The facebook server responds with a permanent redirect”
Maintenance release that fixes a bunch of issues under Ruby 1.9, some multipart form problems, and various other minor bugs.
Warning: PDF. This is probably the best high-level, everything about HTTP caching all in one place resource on the web at this point. Good stuff. I’m kicking myself for not being a part of his track at JAOO now.
It’s a rough world out there, and we need to to a better job of thinking about and testing under realistic network conditions. A better mental model of bandwidth should include:
- packets-per-second
- packet latency
- upstream vs downstream
Densely informational piece. Don’t miss the part where they generate packet loss using a microwave and a cup of tea :)
This is how I am using Rack::Cache, Sinatra, and CouchDB … Sweet ascii diagram there. I’ve seen this ETag chaining technique twice just this week. The other one is gemcutter. They store gems in S3 and pass the S3 provided ETag along in their responses, so it’s like the web app is more of an intermediary sometimes. Weird and cool and interesting.
Whoa. There’s some serious shit poppin' off on the rest-discuss mailing list lately. Here’s Roy Fielding (completely out of context):
Quite frankly, this is the single dumbest attempt at one-sided “standardization” of anti-REST architecture that I have ever seen. It even manages to one-up the previous all-time-idiocy of IBM when they renamed their CORBA toolkit “Web Services” in a deliberate attempt to confuse customers into thinking they had something to do with the Web.
It doesn’t get any better from there :) I saw the REST-* site a few weeks ago but I (literally) thought it was a joke site. The sad thing is that, if the past is a predictor of the future, Jboss/Redhat will probably be able to convince a large chunk of enterprise IT managers that they are REST.
Chris Wanstrath and Leah Culver’s submission for Rails Rumble ‘09 finally has its own permanent hostname. Hurl makes HTTP requests and then shows you stuff about the response, like headers and a syntax highlighted body. Hurl’s have permalinks, too, so you can link to them from email threads, IRC, technical documentation, etc. See the about page for more info and a screencast.
Dustin Sallings proofs out an implementation of the recently released Tornado web framework but builds on top of Twisted. The result is -1,297 fewer lines and all the benefits of having the Twisted framework underneath. I’ve been waiting for someone from the Ruby community to announce a port — we’re good at stealing. Using Dustin’s fork as a reference and basing a Ruby implementation on EventMachine might be the way to go.
Interesting W3C Note from January 2003 that I don’t remember ever seeing:
HTTP and URIs are the basis of the World Wide Web, yet they are often misunderstood, and their implementations and uses are sometimes incomplete or incorrect. This document tries to improve this situation by providing a set of good practices to improve implementations of HTTP and related standards (Web servers, server-side Web engines), as well as their use.
The information here is relevant to people who build web apps, not HTTP server implementors — the title is a bit misleading (not actually but practically). I especially like this bit about why short, less meaningful URLs are better than verbose, descriptive URLs. Shortness has become the most important characteristic of URL design in most apps I’ve built recently; SEO be damned.
Unicorn is a newish Rack-based HTTP server that’s kinda sorta like Mongrel but comes packed with some insane process management features. The main link is to the SIGNALS file, which documents the master/worker process model, supported signals, process replacement, failover, etc. See the README for a high level description of features.
This link brought to you by @defunkt, who explained Unicorn’s unique approach (repeatedly) over the course of a week.
Yahoo!’s proposal to open source their “fast, scalable and extensible HTTP/1.1 compliant caching proxy server” as an Apache project:
Traffic Server fills a need for a fast, extensible and scalable HTTP proxy and caching. We have a production proven piece of software that can deliver HTTP traffic at high rates, and can scale well on modern SMP hardware. We have benchmarked Traffic Server to handle in excess of 35,000 RPS on a single box. Traffic Server has a rich feature set, implementing most of HTTP/1.1 to the RFC specifications.
Rad. I know Yahoo! runs a custom build of Squid as well so I’m curious to understand where this thing came from. The proposal states that it was originally acquired from Inktomi and has been in use for some time.
Adam takes a look at how long requests and backlog interact. The sleep example runs concurrently under Mongrel but Thin and WEBrick will backlog.
Tony’s simple HTTP interface to RabbitMQ. Somebody get this running as a service on EC2 so we can hook Heroku apps up to it on the private network.
mnot on how to evaluate different proxy cache options for your needs.
Whoa. How do I get my hands on an english copy?
We made it.
Protocols are hard. Nobody understands this.
Get it while it’s hot.
Why browser UI for HTTP auth is so horrible has always baffled me. This could be improved significant without any changes to HTTP whatsoever.
John Adams posted a bunch of details of the Varnish configuration they use in front of search.twitter.com to the varnish ML. Great stuff and nice to see the Twitter devs continuing to share their experiences with the community.
Geoffrey Grosenbach interviewed me yesterday for the Ruby on Rails podcast. We had a nice chat about Python/WSGI, Rack, Sinatra, Rack::Cache, Heroku, and other random stuff.
This is one the amazing benefits of having an insanely simple but well defined SPEC (Rack) around the edges of your library. It makes it trivial to hook things up in new and interesting ways.
I’ve written this same exact blog post a dozen times. For some reason, each hop along what should be a pure HTTP pipeline wants to invent their own psuedo-protocol for transferring HTTP messages. Why?! Your reimplementation of HTTP is not going to be any less complex — by definition, it must be at least as complex; and your reimplementation is definitely not going to be less buggy than the real HTTP implementations that have been around for a decade or more.
This is why can’t have nice things …
We gave the Sinatra website a major face lift. Check it out. Don’t leave without subscribing to the feed.
Magnus Holm disects a couple of implementations for parsing nested form parameters (e.g., “person[name]=Joe&person[zip]=55555”) in Ruby. _why’s is the most interesting (as always). We just added this to Sinatra and I’m fairly confident we’ll see something like it land in Rack before 1.0.
I put a lot of work into this release. Really happy to see it out :)
Quick presentation on Rack by Dan Webb. Covers a lot in eight minutes.
Matt Todd did a nice presentation on Rack to the Atlanta Ruby Group (ATLRUG) and they were nice enough to put video of the slides + audio of Matt’s narration online.
I’ve annotated RFC 2616 Section 13 with details on where Rack::Cache is and isn’t compliant. Anything not highlighted should work as described in the RFC. I think I’ll be using SharedCopy more in the future.
Ian McKeller shows how easy it is to find web API “secret keys” when the user has access to the (network) client code. It’s actually a nice little crash coarse in how to write cracking software (here “crack” means warez scene type “crack”). That crazy shit like this is possible is why I got into software in the first place. Completely
Interesting looking HTTP client library for Ruby with support for HTTP caching (with pluggable backends), basic and digest auth, intelligent redirect handling. It’s been around for a while and looks like it could eventually become similar in feature set to Python’s httplib2.
Nick Kallen has started a project to implement a HTTP cache in Scala. Seems like an excellent idea given Java’s extensive collection of stable HTTP server libraries and Scala’s strengths in concurrency and performance.
Nice look at caching idioms in Django and why you need to generate HTTP cache validators up-front and efficiently.
An Nginx module that acts as a gateway cache. I haven’t tried it yet but it’s a really good idea.
Pratik continues his series on Rack with a deep dive into Rack::Builder.
I’ve linked to this before and I’ll link to it again.
Pratik’s first in a series of pieces on Rack: how it came to be, why you need to understand it, along with some simple examples. Future installments will cover Rack::Builder and Middleware.
Sebastien Auvray covers Rack::Cache at InfoQ. Thanks!
Interesting approach to setting cache related headers using a Rack middleware component.
Allows a server to turn the tables and make HTTP requests to the client. I’ve been trying to come up with some use for this for 45 minutes and I’m totally baffled but it’s kind of interesting anyways.
Ryan King nails it.
So, I got an email yesterday disagreeing with my remark about HTTP caching being wildly under-appreciated in the Ruby web community. I felt bad, a little. Then I read this article (posted the day after my remark), which talks about Scribd moving to a Squid reverse proxy setup to front their Rails deployments:
“But there was a problem – no one uses caching proxies in 2008 :–) So, we’ve got an idea – why can’t we place such a server in front of our application and make it cache content for all users in the world?”
The fact that Scribd had to “have this idea” on their own and had not previously been exposed to a ton of literature/tools on reverse proxy / gateway caching is completely fucking unacceptable. I’m back to agreeing with myself.
Pretty good introduction to building pieces of Rack middleware and using Rack::Builder.
Much nicer, IMO. I’m interested to see if someone can get Rails + Rack::Cache working together so that you can maximize the benefits of generating these validators.
Paul Downey translates Dr. Fielding’s REST APIs Must be Hypertext Driven into lay-hacker speak.
Huh? In a sane world, “Ajax” would have been called “HTTP” (or, more elaborately: “JavaScript gets a mostly-standard asynchronous HTTP client library”).
At first I thought this was going to be one of those articles that confuses animated JavaScript effects for Ajax but it goes on to talk about how Ajax is bad because it breaks “Save Page to File” … or something. Save Page to File?!
“Varnish implementes a subset of the ESI Language 1.0 defined by W3C, this document lays out some of the thoughts and rationale for choices made and advice for usage of these features.”
This lets you perform includes at the cache layer so that each included resource can have its own caching policy. Akamai edge proxies have supported this for some time, apparently.
Looks like a really solid improvement on 1.0. I haven’t had a chance to play with any of the betas but I’m anxious to see whether If-Modified-Since/If-None-Match validation made it in. There’s a note on “serving expired objects until we have a fresh one” but that sounds more like stale-while-revalidate.
Joe Gregorio’s 14 minute video introduction to REST and HTTP.
… is a Ruby library suitable for use as a drop-in Net::HTTP replacement or with event frameworks like EventMachine and Rev.
Oh, nice. Here’s a high-level design document that describes the new cross-site XmlHttpRequest (their calling it, “XXX”) functionality and ties the other documents floating around out there together. It seems that servers will be able to signal that certain resources are accessible from other domains using HTTP headers or (gasp!) XML processing instructions (PIs). Weird.
Just landed on mozilla trunk a few days ago. See the draft spec for specifics.
Dan Kegel: “You can buy a 1000MHz machine with 2 gigabytes of RAM and an 1000Mbit/sec Ethernet card for $1200 or so. Let’s see – at 20000 clients, that’s 50KHz, 100Kbytes, and 50Kbits/sec per client. It shouldn’t take any more horsepower than that to take four kilobytes from the disk and send them to the network once a second for each of twenty thousand clients. (That works out to $0.08 per client, by the way. Those $100/client licensing fees some operating systems charge are starting to look a little heavy!) So hardware is no longer the bottleneck. ”
Looks like this is from 2003 but is still pretty accurate as far as I can tell.
All frameworks should approach caching the way Django does. The core app/origin framework does no real caching but provides utility/helper methods for setting standard RFC 2616 cache related headers on the response easily and correctly. A completely separate set of caching goo (“middleware”) sits between your app and performs the actual caching based purely on the headers set by the origin. The benefit to this approach is that caching is totally independent from the app framework and can be swapped out for a true gateway (“reverse proxy”) cache at any time.
Dare Obasanjo is a machine.
Assaf Arkin: “There’s also some back-end processing going on, and I think that part is using DRb for now. But maybe the next update it will switch over to RMI or UNIX pipes or whatever. I don’t much care because the library does the talking, and besides, it’s only distributed in the sense that we have two pieces of code running with different PIDs. Not particularly important what’s happening on the wire, as long as it’s fast.”
Bill de hÓra knocks one out of the park: “I think sometimes that the problem people have with REST is that it’s so well-defined; it’s not witchcraft, it’s not a cargo cult. You can’t argue with it on a relativistic basis or apply clever rhetoric or continuously redefine what it means. An architectural style isn’t ‘good’ or ‘bad’ – you have to decide if it’s the right fit for your problem space and if not, you have to come up with a more appropriate one.”
Adam Wiggins on Sinatra’s blasphemous approach to controllers and routing. AKA: the thing that makes Sinatra my web layer of choice (well, that and throw :halt).
Still too much work but it’s nice to see some support for conditional GET making its way into the framework.
joshua schachter on Rabble/Kellan’s “Beyond REST?” presentation, with an interestingly simple HTTP-based callback system.
Great look at varnish and concerns around putting a front-end reverse proxy cache in place.
Nice ApacheCon EU ‘08 presentation (warning: video + slides, no transcript) covering various blue sky stuff on Roy’s brain for Apache and HTTP.
“… the ‘new reality’ is the realization that Dynamic Scripting Languages are ready for prime-time and that REST is a simple, yet scalable architecture to build a servers on.” – I’d say that’s definitely a new reality for the enterprise, Bill.
Brad Neuberg (Google Gears): “Our historical closeness to the web creates a kind of myopia, where we can’t see how amazing it is. It’s a billion Library of Alexandria’s dropped into our laps.”
Roy Fielding on the difference between architecture, architecural styles, patterns, implementations, and applications.
Stefan Tilkov addresses some of the most common doubts people have when first deprogram and come up to speed on REST. Short and well done, IMO. I think I’ll be handing this out quite a bit in the future.
I need to give jQuery a serious look. Prototype’s Ajax.Request stuff is crippled (no PUT or DELETE) to the point of being worthless; the jQuery selector magic looks a lot more intriguing than what you get with Prototype, too.
“The ngx_http_empty_gif_module keeps a 1x1 transparent GIF in memory that can be served very quickly.” — That’s so amazingly awesome; spacer.gif for life.
I repackaged mongrel_proctitle as a GemPlugin so that all mongrels on use it automatically. This is the first chance I’ve had to play with GitHub, too. Lovin' it.
Constantly updates the the process title ($0) with something like: “mongrel_rails [10010/2/358]: handling 127.0.0.1: HEAD /feed/calendar/global/91/6de4”. Let’s you monitor backends with ps and top.
An epiphany everyone needs to experience.
Peter Cooper scratches the deployment problem itch.
“… even if you have a single server, a proxy in front can help performance significantly. Through the simple expedient of buffering, your heavyweight processes don’t waste time serving every request for the entire length of time the client is connected”
“I have spent many years working on the FreeBSD kernel, and only rarely did I venture into userland programming, but when I had occation to do so, I invariably found that people programmed like it was still 1975.”
Bob Ippolito wrote up some pros and cons to reverse proxy implementations in different servers a few months back. I don’t think much of it is out of date at this point but nginx isn’t represented.
That’s much nicer. Amazon should adopt it immediately.
“… if all you can think of is reasons why the web is stupid and awkward, and you think it’s some giant step backward (from what?), then you haven’t thought very deeply about what’s happened in the world of technology and why.”
“… Rails has picked a side in the SOAP vs REST debate. Unless you absolutely have to use SOAP for integration purposes, we strongly discourage you from doing so. As a naturally extension of that, we’ve pulled ActionWebService from the default bundle.”
Stefan Tilkov with a poster-size illustration of HTTP client errors (4xx series only).
How long has this been floating around? Roy Fielding on building the web… (via Aristotle Pagaltzis on rest-discuss)
RESOLVED FIXED
“… on Java, too many web frameworks – think JSF, or Struts 1.x – consider the Web something you work around using software patterns. The goal is get off the web, and back into middleware…”
“413 Requested Entity Too Large”
I saw this same note on rest-discuss the other day and thought it struck a chord. :) Jon Hanna on SOAP, Web 2.0, other stuff…
A site for sore eyes :)
Bingo!
“How I explained REST to my wife” in French!
How did we ever get anything done without superfluous quadrants and models. Bring ‘em on. The trick is making something every developer would know is a joke but that could make it past a manager or architect.
exec 3<> /dev/tcp/$HOST/80 What?! How cool is that.
Nice activity diagram describing the resolution of response status codes given various request methods and headers. Full res GIF, JPEG, PNG, and SVG.
Wow. Much worse than I thought.
“All you have to do is change the internal processing, add 200 more methods to the HTTP parser, serve Bittorrent over Ethernet, and have it save Korean orphans while eating a Mango in the back seat of an El Camino driven by twenty midget clowns.”
Sam with a very simple, step by step tutorial on using your site as an OpenID identity provider.
“Should machine-to-machine, multi-hop, RESTful communications expose a need for additional functionality, then, and only then, will the need be addressed. This is opposed to the WS style of standards creation where solutions are created that go in search
“Each resource demarcates a subset of an application’s state, and becomes a handle by which other applications can interact with that state.”
Aristotle just destroys that recent reg article that suggests we need to shit-can 20 years of engineering masterpiece for distributed objects. Nice piece!
“Why would my sister want to borrow someone else’s broom, you sexist ass? My sister is a lawyer for the friggin' ACLU! before tossing her Napa Valley cab in the poor guy’s face.”
Great read…
Tim Berners-Lee’s blog. Finally!
Yep :)
You’ll have to excuse my ego linking but having Udell point to you is like have Carson ask you onto the Tonight Show.