[Ed: My apologies if the following seems incoherent. I accidentally published the current draft half-written.]
Dreamhost had something of a hissy fit because they can’t figure out a solid model for deploying simple Rails apps quickly and reliably with minimal configuration (i.e. like PHP). That’s because no such solution exists. There are many deployment configurations and every one of them have trade offs in simplicity, correctness, reliability, performance, and widespread support. The simple solutions either perform horribly (CGI) or are extremely unreliable (web server managed FastCGI).
DreamHost has over 10 years of experience running applications in most of the most popular web programming frameworks and Rails has and continues to be one of the most frustrating.
I’m not sure why DreamHost chose to nail Rails specifically. This problem is not new and is definitely not limited to Rails. These same issues plagued the Python web app community for the years I was active there as well.
The B-List does a good job of explaining the history and current situation from a Python perspective:
The fact that the deployment model for the new frameworks centers around long-running processes — rather than launching a clean copy of the application on each request — is one that’s hard on hosting providers; unlike traditional CGI or PHP setups, a framework like Rails or Django or TurboGears simply can’t be run on a “launch a new copy every time” basis; the overhead of loading and initializing the framework makes it unusable in that situation. Having a persistent process which loads the code once and keeps it in memory is the only way this can work.
That’s the root of the problem but I fear we’ve only made things worse over the years with failed attempts at solving it. FastCGI and then SCGI tackled the performance issue but neither solution simplified configuration. Managed FastCGI should have eased configuration but instead creates a bunch of additional reliability issues (orphaned processes, app processes running with web server privileges, etc).
If you want to solve this problem right, I’d say there’s really two problems you need to address:
Stop proliferation of web / application gateway protocols (SCGI, FastCGI, HTTP).
Improve external application process management.
I’d personally like to see FastCGI and SCGI dumped as deployment options and for everyone to move hard on reverse proxy configurations. Most web servers have solid reverse proxy support (lighttpd 1.4 being the notable exception). Using HTTP all the way down has a nice architectural consistency, opens up additional options for caching between web and app, and performs well. If the world moved to this kind of layered HTTP approach, it would open up a lot of possibilities for simplifying configuration and maybe even application discovery.
Managing application processes could be simplified a great deal as well. There’s currently quite a bit of fiddling and planning required to get ports allocated properly and to make sure application processes stay up. The whole problem area is begging for innovation.
Comments
One thing that’s a problem with HTTP is a lack of good request metadata. There’s X-Forwarded-For, which is the one consistently supported bit of metadata, but also the most useless (who really cares about the remove IP address?). There’s X-Forwarded-Server which is a bit of a help, but not added consistently. There’s no good convention for the base path of an application (X-Forwarded-Path?), or the scheme (X-Forwarded-Scheme?). Sending trusted information is also tricky (X-Remote-User?) as it’s all too easy to have an insecure deployment where you let information through that you think is trusted. FastCGI and SCGI basically solve all these problems by sending a richer request object than just HTTP. These conventions could be added to HTTP, though trusted information is particularly tricky.
Port allocation is of course another problem. The easy solution is named sockets. Most HTTP servers don’t really consider serving HTTP on named sockets, but there’s no reason they couldn’t do this. But again, this is already normal for FastCGI and SCGI.
The last problem is process management. FastCGI tries to do this, but seems to do it rather poorly. Lots of other systems do it properly. I think FastCGI is too complex to reform, but doing what FastCGI is supposed to do isn’t incredibly difficult (or, at least, other systems have done the equivalent successfully).
I feel like the whole thing isn’t begging for innovation as much as a champion. Just a couple people could take this problem on and solve it, I think. But there’s no money to be made, I think, only fame; and which Zed has pointed out, this is not a money-making sort of fame ;)
— Ian Bicking on Thursday, January 10, 2008 at 05:15 AM #
Will mod_python and mod_wsgi help?
— john saponara on Thursday, January 10, 2008 at 08:50 AM #
Using HTTP proxying is not a solution that will work for large scale web hosting outfits. Using such an approach means they have to build a separate infrastructure for controlling and managing a users backend processes. There is no way they will let a user manage this themselves as then they will have no easy ability to stop users going rampant and running too many processes or doing other stupid things. What web hosting companies want in this space is control and lots of it. They also want it to be simple and preferably managed through their Apache installations.
Using mod_python as suggested by John is also not an option because it isn’t practical for various reasons which I have outlined previously in http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html and http://blog.dscpl.com.au/2007/07/commodity-shared-hosting-and-modwsgi.html.
What is effectively needed is a better implementation of an Apache module with FASTCGI like qualities but which is tailored to Python and WSGI in particular. By it being customised for WSGI means deploying applications should be a matter of simply dropping in the appropriate WSGI script file which bridges to the actual application.
Coming up with a better way of deploying Python WSGI applications using Apache is exactly what I have been trying to push for a year now through the mod_wsgi project. I thus really quite understand Ian’s comment of ‘I feel like the whole thing isn’t begging for innovation as much as a champion’. This is actually representative of the uphill battle I have been facing even getting the main figures in the Python web community to even accept mod_wsgi as a viable option. I really don’t understand why there is so much ambivalence and/or push back over it.
One would think that trying to come up with a better option would be greeted with open arms, but in practice it isn’t. It is getting so ridiculous that there are even some individuals going around saying mod_wsgi is evil and taking personal pot shots at my integrity. This sort of smear campaign is not what the Python community needs if we want to come up with a solution to this problem. :–(
— Graham Dumpleton on Thursday, January 10, 2008 at 10:53 AM #
Whoops. That should have read “I thus really don’t quite understand Ian’s comment”. This new Mac keyboard ate my input. ;–)
— Graham Dumpleton on Thursday, January 10, 2008 at 01:29 PM #
Graham wrote:
Well, the whole point of my post was that we need a standard process management tool that’s built around HTTP proxying instead of non-standard web –> app protocols like FastCGI or SCGI. Your assumption that users would manage these processes is wrong – there could be a single daemon running on the machine that was capable of bringing external app processes up with whatever configuration the shared hosting provider deemed necessary.
FastCGI is basically a non-standard HTTP proxy so I’m not sure how using an actual HTTP proxy would change anything. Same SCGI and I’m assuming modwsgi is going down the same wrong path and recreating HTTP yet again between web and app.
This seems like a perfect example of how people tend to misattribute the real problems in this space. The real issue is configuring, managing, and controlling external app processes and then letting the web server know about them but everyone seems to concentrate on the communications protocol used between web and app. We already have a perfectly good inter-process communication protocol for transmitting web requests – HTTP.
— Ryan Tomayko on Thursday, January 10, 2008 at 08:53 PM #
Ryan, what is your response to Ian’s concern about request metadata?
— Martin on Thursday, January 10, 2008 at 09:54 PM #
Martin: Ian’s concerns are clearly legitimate issues but they don’t strike me as anything that couldn’t be overcome. I felt Ian addressed this:
I mean, we’re basically talking about figuring out what the headers ought to be named, their syntax, and semantics. I’m planning on doing a bit more research on exactly what request metadata reverse proxies currently provide (or can be configured to provide) to downstream servers. That should help the discussion along a bit.
But HTTP has facilities for extending the request with additional metadata so it seems like a no-brainer to just use HTTP as a base and extend it as necessary instead of rolling a completely new protocol.
— Ryan Tomayko on Friday, January 11, 2008 at 12:43 AM #
The whole point of mod_wsgi is to simplify the configuration and management.
What protocol is used to communicate between the Apache child process and the daemon process in mod_wsgi is totally irrelevant as mod_wsgi provides the code for both sides of what is really an internal private communications channel. This is because the daemon process in mod_wsgi is just a fork of the Apache parent process and is not a fork and then an exec of some external application. Thus it doesn’t strictly need to use a public protocol as no one ever has to write daemon side code which talks that protocol. So, whether mod_wsgi uses HTTP, FASTCGI, SCGI or something else is totally irrelevant. It can therefore use what is the most efficient mechanism which is able to correctly transfer across the information required without having to try and force something else into doing the job which isn’t designed for the purpose.
It seems though I have a lot of trouble getting people to understand this, probably because they just don’t understand that how mod_wsgi works is different to how traditional FASTCGI, SCGI, AJP and proxy solutions work. In mod_wsgi it isn’t doing an exec of some separate process and there is no need for special daemon side toolkits like flup to talk the protocol, or even user side code which talks HTTP. It is a complete solution for WSGI in one package, the protocol it uses internally is totally irrelevant as at that point it is a black box and no one needs to know how it works. I could use HTTP as the protocol and it would still not affect one bit what the user has to do for their code as the entry point is a WSGI application object and nothing more complicated.
So maybe have a look at mod_wsgi rather than assuming that it is just like existing FASTCGI or SCGI modules for Apache, as it isn’t.
— Graham Dumpleton on Friday, January 11, 2008 at 10:20 AM #
Graham: I read up on mod_wsgi a bit and I think I have a better feel for the general argument. Unifying the functionality of mod_python and FastCGI/SCGI into a single module makes sense to me. It means that hosting providers can deal with a single module for WSGI based apps. That’s cool. I get it. I also see how this is a nice configuration win since you just point apache to the WSGI app and things Just Work. That’s great too.
I disagree with some of your assertions here, though, and this may help explain some of the push back you’ve been receiving.
I disagree. This protocol and the code that implements it is extremely relevant. It has to be rock solid, well used, and thoroughly tested.
You’re assuming the only value of using HTTP is that it’s a “public protocol” that people can “write to”. Another benefit of using HTTP is that you can leverage all of the existing client and server code that’s evolved and stablized over the past fifteen years. You could potentially benefit from HTTPs support for intermediaries for caching, security, etc.
Glyph Lefkowitz goes further on this topic as well:
All in all, I can see how mod_wsgi might be a better solution than mod_python / FastCGI/SCGI (especially in shared hosting environments) once it becomes stable. However, I’d much prefer a reverse proxy based solution that worked with different language environments and web servers. mod_wsgi is a good solution for Apache, Python, and WSGI setups but I think the issue could be generalized further. I just think the right way to solve this problem on wide scale is to start with the reverse proxy support built into existing web servers, the existing HTTP server libraries built into Python/Ruby/Perl/Java/etc, and figure out how to make configuration a bit easier.
Lastly, a bit of constructive criticism: try to be a little less emotional about this stuff. Your comments here and on this post are a big turn off, IMO. One gets the feeling that you’re not interested in other perspectives or that you feel that others are too stupid to understand your great project. That’s a recipe for getting your project ignored regardless of the technical benefits.
— Ryan Tomayko on Saturday, January 12, 2008 at 03:28 AM #
The emotion you see is just my passion for trying to come up with a solution for this problem. :–)
As to the protocol, I had no intention of implementing something complicated, or of reinventing code. As such, it is pretty well the same as SCGI with the exception that the WSGI environment is packaged up in a even simpler format than SCGI. In SCGI, because it was a public protocol that had to potentially work across architectures they had to encode the CGI environment back into a parsable text format. Because in mod_wsgi the communication is only within the same host, it is as simple as binary byte value giving number of key/value pairs in WSGI environment. Followed by the key value strings where each string is preceded in turn by a binary byte value giving the length. Simple, efficient, hard to get wrong and easy to unpack.
After the headers goes the request content. As for the response back from the daemon process it is a standard HTTP response exactly like a normal CGI script would generate. As a result, the existing Apache code for decoding CGI responses is used and so am using tried and tested code. On the daemon side, once having unpacked the WSGI environment, it uses existing Apache code to construct an internal Apache request object associated with Apache data structures for setting up the internal input and output filter chains that Apache uses. Because the Apache code is used to set this up and it looks the same as it would if it were executing with main Apache child process, the exact same code is then able to be used to dispatch to the WSGI application as if it were running in embedded mode. Because the standard Apache HTTP output filter is used in the output filter chain, any response should be conformant to the standards.
Back in the Apache child process side of the connection, the response is fed back straight into the existing output filter chain for the original request. This means that it passes through all the registered Apache output filters, including any caching, compression etc. If you were really sadistic, you could even configure Apache to send the response through its server side include engine.
So, where ever possible existing Apache code is used to do everything, especially where stuff relates to standards. The last thing I want to be doing is running around trying to work out if something adheres to the standards, so I leave that all up to Apache and its existing code which has been in use for years and is well tested.
— Graham Dumpleton on Saturday, January 12, 2008 at 09:42 AM #
Ryan,
regarding the commentary in your bookmark of Bob Ippolito’s Reverse Proxy Roundup, there’s another one missing from his line-up: Varnish. (See the Notes from the Architect for what makes it especially interesting.)
— Aristotle Pagaltzis on Saturday, January 12, 2008 at 09:00 PM #
A. Pagaltzis: duly bookmarked :) That’s some of the best shit I’ve read in a long time.
I remember seeing Varnish announced and it definitely looked interesting but I haven’t had a chance to look into it.
— Ryan Tomayko on Saturday, January 12, 2008 at 11:42 PM #