This will be my first talk at a major conference.
An illustrated re-introduction to HTTP caching with a focus on gateway caches and their potential benefits within the context of modern, dynamic web applications.
Real HTTP caching for Ruby web apps.
We should be doing more of this:
If you’re building a modern website then you’ll be needing some javascript libraries and css.
Rather than hosting these common libraries on your own server, you should Use a Content Delivery Network. Lucky for you Google, Microsoft and Yahoo host a range of popular javascript and css which you can directly link to for free. This saves your bandwidth and speeds up your website load time.
The great thing about the shared CDN approach is that the resources are cached once and reused across all sites, often without even making a validation request.
I’d seen the JavaScript libraries before but I’d never considered using this approach with CSS. The YUI CSS reset is a perfect example of where a shared CDN provides the most benefit. If every site that employed the basic CSS reset used this URL, it would effectively be baked into the browser with no overhead after the first request.
Warning: PDF. This is probably the best high-level, everything about HTTP caching all in one place resource on the web at this point. Good stuff. I’m kicking myself for not being a part of his track at JAOO now.
It’s a rough world out there, and we need to to a better job of thinking about and testing under realistic network conditions. A better mental model of bandwidth should include:
- packets-per-second
- packet latency
- upstream vs downstream
Densely informational piece. Don’t miss the part where they generate packet loss using a microwave and a cup of tea :)
This is how I am using Rack::Cache, Sinatra, and CouchDB … Sweet ascii diagram there. I’ve seen this ETag chaining technique twice just this week. The other one is gemcutter. They store gems in S3 and pass the S3 provided ETag along in their responses, so it’s like the web app is more of an intermediary sometimes. Weird and cool and interesting.
Interesting W3C Note from January 2003 that I don’t remember ever seeing:
HTTP and URIs are the basis of the World Wide Web, yet they are often misunderstood, and their implementations and uses are sometimes incomplete or incorrect. This document tries to improve this situation by providing a set of good practices to improve implementations of HTTP and related standards (Web servers, server-side Web engines), as well as their use.
The information here is relevant to people who build web apps, not HTTP server implementors — the title is a bit misleading (not actually but practically). I especially like this bit about why short, less meaningful URLs are better than verbose, descriptive URLs. Shortness has become the most important characteristic of URL design in most apps I’ve built recently; SEO be damned.
Yahoo!’s proposal to open source their “fast, scalable and extensible HTTP/1.1 compliant caching proxy server” as an Apache project:
Traffic Server fills a need for a fast, extensible and scalable HTTP proxy and caching. We have a production proven piece of software that can deliver HTTP traffic at high rates, and can scale well on modern SMP hardware. We have benchmarked Traffic Server to handle in excess of 35,000 RPS on a single box. Traffic Server has a rich feature set, implementing most of HTTP/1.1 to the RFC specifications.
Rad. I know Yahoo! runs a custom build of Squid as well so I’m curious to understand where this thing came from. The proposal states that it was originally acquired from Inktomi and has been in use for some time.
mnot on how to evaluate different proxy cache options for your needs.
Another classic on latency vs. throughput. This one gets into the limitations of speed of light fairly quickly :)
Here’s the slides from my RailsConf 2009 presentation on HTTP caching. I doubt the general info will make much sense without me talking over it but the diagrams should be fairly useful.
Get it while it’s hot.
John Adams posted a bunch of details of the Varnish configuration they use in front of search.twitter.com to the varnish ML. Great stuff and nice to see the Twitter devs continuing to share their experiences with the community.
I haven’t actually had a chance to watch this yet but I’m sure it’s great if it builds on the talk Gregg gave at acts_as_conference 2009. Also, I love this slide: “Reverse Proxy Caches – WTF?” :)
This is one the amazing benefits of having an insanely simple but well defined SPEC (Rack) around the edges of your library. It makes it trivial to hook things up in new and interesting ways.
Nice overview of caching from 1000 feet. Lays down some useful terminology, like “Cache Hit”, “Cache Miss”, “Storage Cost”, “Retrieval Cost”, “Invalidation”, “Replacement Policy”, etc.
I’ve annotated RFC 2616 Section 13 with details on where Rack::Cache is and isn’t compliant. Anything not highlighted should work as described in the RFC. I think I’ll be using SharedCopy more in the future.
Nick Kallen has started a project to implement a HTTP cache in Scala. Seems like an excellent idea given Java’s extensive collection of stable HTTP server libraries and Scala’s strengths in concurrency and performance.
Mailing list for Rack::Cache users and hackers. Come on in, the water’s warm.
It’s really starting to come together, isn’t it?
Bad-ass ActiveRecord extension that does read-through and write-through caching to memcached in a way that’s fairly transparent. This is one of the strategies the Twitter folks put in place recently to improve their response time and availability.
Nice look at caching idioms in Django and why you need to generate HTTP cache validators up-front and efficiently.
An Nginx module that acts as a gateway cache. I haven’t tried it yet but it’s a really good idea.
Sebastien Auvray covers Rack::Cache at InfoQ. Thanks!
Interesting approach to setting cache related headers using a Rack middleware component.
Ryan King nails it.
So, I got an email yesterday disagreeing with my remark about HTTP caching being wildly under-appreciated in the Ruby web community. I felt bad, a little. Then I read this article (posted the day after my remark), which talks about Scribd moving to a Squid reverse proxy setup to front their Rails deployments:
“But there was a problem – no one uses caching proxies in 2008 :–) So, we’ve got an idea – why can’t we place such a server in front of our application and make it cache content for all users in the world?”
The fact that Scribd had to “have this idea” on their own and had not previously been exposed to a ton of literature/tools on reverse proxy / gateway caching is completely fucking unacceptable. I’m back to agreeing with myself.
Much nicer, IMO. I’m interested to see if someone can get Rails + Rack::Cache working together so that you can maximize the benefits of generating these validators.
“Varnish implementes a subset of the ESI Language 1.0 defined by W3C, this document lays out some of the thoughts and rationale for choices made and advice for usage of these features.”
This lets you perform includes at the cache layer so that each included resource can have its own caching policy. Akamai edge proxies have supported this for some time, apparently.
Looks like a really solid improvement on 1.0. I haven’t had a chance to play with any of the betas but I’m anxious to see whether If-Modified-Since/If-None-Match validation made it in. There’s a note on “serving expired objects until we have a fresh one” but that sounds more like stale-while-revalidate.
Lots of good stuff coming in Varnish 2.0. GC, regexp based purge, custom hash funcs, backend load balancing based on health or other metrics, and the thing I’m personally most interested: what looks like support for validation based caching.
All frameworks should approach caching the way Django does. The core app/origin framework does no real caching but provides utility/helper methods for setting standard RFC 2616 cache related headers on the response easily and correctly. A completely separate set of caching goo (“middleware”) sits between your app and performs the actual caching based purely on the headers set by the origin. The benefit to this approach is that caching is totally independent from the app framework and can be swapped out for a true gateway (“reverse proxy”) cache at any time.
Still too much work but it’s nice to see some support for conditional GET making its way into the framework.
Great look at varnish and concerns around putting a front-end reverse proxy cache in place.
Interesting. I’ve been using the jquery-1.2.3.js hosted on google code for a few months now. Maybe I should have read the TOS…
Superbly explained and with extremely useful circly diagrams. Bravo.