IBM says LAMP users need to grow up.

Let’s do it:

According to Daniel Sabbah, general manager of IBM’s Rational division, LAMP — the popular Web development stack — works well for basic applications but lacks the ability to scale.

Nope. We call bullshit. After wasting years of our lives trying to implement physical three tier architectures that “scale” and failing miserably time after time, we’re going with something that actually works.

If you look at the history of LAMP development, they’re really primative tools … the so-called good enough model. The type of businesses being created around those particular business models are essentially going to have to grow up at some point.

No. The LAMP stack is a properly constructed piece of software. Features are added when an actual person has an actual need that arises in the actual field, not when some group of highly qualified architecture astronauts and marketing splash-seekers get together to compete for who can come up with the most grown-up piece of useless new crap to throw in the product.

The LAMP model works because it was built to work for and by people building real stuff. The big vendor / big tools model failed because it was built to work for Gartner, Forrester, and Upper Management whose idea of “work” turned out to be completely wrong.

Now you’re saying that the primitive yet successful LAMP model should adopt the traits of the sophisticated yet failing big vendor model.

I believe that in the same way that some of those simple solutions are good enough to start with, eventually, they are going to have to come up against scalability, Sabbah said during a press conference at the IBM Rational User Conference in Las Vegas.

We can’t scale? Really? Are you insane?

Alright, that last jab may have been a bit unfair. I think what Sabbah is really talking about is PHP. I can’t be sure but none of Yahoo!, Amazon, Ebay, or Google seem to be using PHP widely on their public sites. But then again, they aren’t using Websphere/J2EE, .NET, or other scalable physical three tier architectures either.

UPDATE: See comments for interesting notes on PHP usage at Yahoo!.

While we’re talking about architectures, I’d like to jump into a brief commentary on what’s really at the root of the debate here.

There arewere two widely accepted but competing general web systems architectures: the Physical Three Tier Architecture and the Logical Three Tier Architecture. IBM (and all the other big tool vendors) have been championing one of them and LAMP is a good framework for the other (although you’ll rarely hear anyone admit that LAMP provides an overall architecture).

The Physical Three Tier Architecture

Many large enterprise web applications tried really hard to implement a Physical Three Tier Architecture, or they did in the beginning. The idea is that you have a physical presentation tier (usually JSP, ASP, or some other *SP) that talks to a physical app tier via some form of remote method invocation (usually EJB/RMI, CORBA, DCOM) that talks to a physical database tier (usually Oracle, DB2, MS-SQL Server). The proposed benefits of this approach is that you can scale out (i.e. add more boxes) to any of the physical tiers as needed.

Great, right? Well, no. It turns out this is a horrible, horrible, horrible way of building large applications and no one has ever actually implemented it successful. If anyone has implemented it successfully, they immediately shat their pants when they realized how much surface area and moving parts they would then be keeping an eye on.

The main problem with this architecture is the physical app box in the middle. We call it the remote object circle of hell. This is where the tool vendors solve all kinds of interesting what if type problems using extremely sophisticated techniques, which introduce one thousand actual real world problems, which the tool vendors happily solve, which introduces one thousand more real problems, ad infinitum…

It’s hard to develop, deploy, test, maintain, evolve; it eats souls, kills kittens, and hates freedom and democracy.

Over the past two years, every enterprise developer on the planet has been scurrying to move away from this architecture. This can be witnessed most clearly in the Java community by observing the absolute failure of EJB and the rise of lightweight frameworks like Hibernate, Spring, Webwork, Struts, etc. This has been a bottom up movement by pissed off developers in retaliation to the crap that was pushed on them by the sophisticated tool vendors in the early century.

Which brings us nicely to an architecture that actually works some times and loves freedom.

The Logical Three Tier Architecture

More specifically, the Shared Nothing variant of the Logical Three Tier Architecture says that the simplest and best way to build large web based systems that scale (and this includes enterprise systems goddamit) is to first bring the presentation and app code together into a single physical tier. This avoids remote object hell because the presentation code and the business logic / domain code are close to each other.

But the most important aspect of this approach is that you want to push all state down to the database. Each request that comes into the presentation + app tier results in loading state for a set of objects from the database, operating on them, pushing their state back down into the database (if needed), writing the response, and then getting the hell out of there (i.e. releasing all references to objects loaded for this request, leaving them for gc).

That’s the rule.

So the physical database tier and the physical presentation + app tier make up our logical three tier architecture but I’d like to talk about one other latch-on piece of this setup because it’s interesting to contrast it with how the Physical Three Tier purists deal with the same problem.

Fine Grained Caching

Some mechanism for caching becomes really important when you decide that you are spending too much money on hardware (note that both of these architectures will scale up and out, on each physical tier independently, for as far and wide as you can pay for hardware). Adding some form of caching reduces the amount of hardware needed dramatically because you’ve reduced utilization somewhere.

In the physical three tier architecture, there is generally a lot of sophisticated mechanisms for caching and sharing object state at a very granular level in the app tier to reduce utilization on the the database and increase response time. This is cool and all but it increases utilization on the app tier dramatically because so much time is now spent managing this new state.

The introduction of state (even just a little state for caching objects) forces the app tier to take on a lot of the traits of the database. You have to worry about object consistency and be fairly aware of transactions. When that’s not fast enough what ends up happening is that more fine grained caching is added at the presentation tier to reduce round trips with the app tier.

Now you have three places that are maintaining pretty much the same state and that means you have three manageability problems. But this is, you know, cool because it’s really complex and sophisticated and the whiteboard looks interesting and lots of arm waving now.

Screw Fine Grained Caching

Shared Nothing says, screw that – the database is the only thing managing fine grained state because that’s it’s job, and then throws up caching HTTP proxy server(s) in a separate (and optional) top physical tier. Cached state is maintained on a much simpler, coarse grained level with relatively simple rules for invalidation and other issues.

When the Shared Nothing cache hits, it provides unmatched performance because the response is ready to go immediately without having to consult the lower tiers at all. When it misses, it misses worse than the fine grained approach because chances are good you’ll be going all the way to the database and back. But it turns out that it usually doesn’t matter. My experience says that you get as good or better performance with the coarse grained approach as you do with the fine grained approach for much less cost, although it’s hard to measure because the savings are distributed in very different ways.

The Shared Nothing + Caching Proxy setup scales like mad and I don’t just mean that it scales to really massive user populations. It scales low too. It’s easy to work with when you’re developing and testing on a single machine. It’s easy to have a simple prototype app done in a day. It’s easy to teach someone enough that they can go play and figure stuff out as they go. It’s easy to write tests because the entire system is bounded by the request and there’s no weird magic going on in the background.

The big vendor / big tool architectures sacrificed simplicity and the ability to scale low because they decided that every application was going to have one million users and require five 9’s from the first day of development.

As I write this, Bill de hÓra postulates: All successful large systems were successful small systems. I believe him and what that means to us right now in this article is that it is exceedingly hard to build large systems with the big vendor / big tool approach because it is exceedingly hard to build small systems with the same.

Let’s get back to the woodshed

While Sabbah was critical of LAMP’s capabilities, he said IBM is going to ensure companies which started with that model will be able to “grow and change with the rest of the world”.

He believes most businesses want technology that is stable, evolutionary, historical and had support.

L A M P = (S)table (E)volutionary (H)istorical (S)upport

“What we are trying to do is make sure businesses who start there [with LAMP] have a model, to not only start there but evolve into more complex situations in order to actually be able to grow,” he said.

This is where I really wanted to jump in because I think this mentality is holding back adoption of very simple yet extremely capable technology based purely on poor reasoning. This view of systems design says that complexity is required if growth is desirable and that complex situations can only be the result of complex systems.

There’s a guy who just spent 50 years or something locked in a room writing a 1200 page book proving that this is just wrong. It would appears that there is very little relationship between the complexity of a program and the complexity of the situation it produces.

The complexity for complexity mindset is the bane of a few potentially great technologies right now:

  • Static vs. Dynamic Languages
  • J2EE vs. LAMP
  • WS-* vs HTTP

I like to complain when someone calls Python a scripting language because the connotation is that it is simple. But it is simple, right? So there shouldn’t be any complaining. I’m not objecting to someone calling Python simple, I’m objecting to then saying that because it is simple, it must only be capable of simple things.

The Need For Complex Systems

“You’ve seen us do a lot with PHP and Zend and you’ll see us do more. I can’t say more. It [PHP] needs to integrate with enterprise assets but it needs to remember where it came from and where its value was. It can’t start getting too complex as it will lose its audience,” Sabbah said.

The need for complex systems in the enterprise was and still is greatly overestimated. The trick isn’t to make PHP more complex, it’s to make the enterprise less complex. You need to equate complex requirements with complex systems less and start asking “do we really need this?” more.

The funny thing about all this is that my opinion on this matter has formed largely based on concepts that you guys told me, so I’m sure you’ll pull through on this one.

Comments

  1. Please don’t mistake me for a PHP enthusiast, but you said:

    “I can’t be sure but none of Yahoo!, Amazon, Ebay, or Google seem to be using PHP widely on their public sites.”

    Be sure.

    Also, fine-graind caching does work, if it’s on the Web; check out my XTech preso.

    Cheers,

    Mark Nottingham on Saturday, September 09, 2006 at 01:35 PM #

  2. Hi Mark. Ryan King pointed that out to me awhile a back. I have an email from him here that I could have sworn I footnoted somewhere on this article. Here’s what Ryan had to say in full:

    Just FYI, but Yahoo uses PHP alot on their public site. They even hired the creator of PHP, Rasmus Lerdorf. Here’s an article from when the switched: http://www.internetnews.com/dev-news/article.php/1491221

    Just thought I’d point that out, since you said “I can’t be sure but none of Yahoo!, Amazon, eBay, or Google seem to be using PHP widely on their public sites.”

    And later he sent a few more links:

    For more info on Yahoo + PHP, here are some links:

    Whatever corporate talking heads may say, you can’t argue with the success of companies like Yahoo. They’re the largest destination on the Internet and built without the technology that is supposedly necessary for scalability.

    Thanks!

    Ryan Tomayko on Saturday, September 09, 2006 at 02:16 PM #

  3. Great post, but I have to disagree with you on the finely grained caching part. If you look at big LAMP deployments such as Flickr, LiveJournal and Facebook the common technology component that enables them to scale is memcached – a tool for finely grained caching. That’s not to say that they aren’t doing shared-nothing, it’s just that memcached is critical for helping the database layer scale. LiveJournal serves around 50% of its page views “permission controlled” (friends only) so an HTTP proxy on the front end isn’t the right solution – but memcached reduces their database hits by 90%.

    Simon Willison on Monday, September 11, 2006 at 02:37 AM #

  4. Still brilliant.

    Bill de hOra on Friday, November 17, 2006 at 09:52 AM #

  5. Many years ago at another company we drank the physical 3-tier distributed kool-aid from Microsoft and built out a very complex system using DCOM, Microsoft Transaction Server, etc. Getting things running at the small scale was a nightmare. We generally resorted to shipping around pre-built VMs so that people could avoid the configuration hassles. Debugging was very complex. Explaining it, which ended up being a nearly full time job for me, took days.

    Finally we paid some consultants at Microsoft to review our architecture and give us some advice. They were impressed by how completely we used their technology (probably a bad sign), but at the same time they were surprised that we were using DCOM for (gasp) distributed objects.

    One of the problems we kept running into was that the MTS kept killing the server side objects after seemingly random times of being idle. We got Microsoft to fix several MTS bugs that reduced the problem, but the end result was that they told us that we couldn’t rely on MTS to not kill our server side objects at will. Despite many of the documented benefits of MTS, they told us to rewrite everything as completely stateless, using DCOM effectively only as a binary wire protocol.

    People complain about the complexity of J2EE app servers a lot, but MTS was at least as bad, if not worse. Perhaps that’s why today almost nobody who wasn’t doing Microsoft server side development ten years ago has heard of it.

    Robert Stewart on Monday, December 24, 2007 at 11:39 AM #

  6. Hmm, LAMP systems can’t scale?

    Perhaps these IBM guys should and tell the folks at LiveJournal and Facebook that so they can check their servers – they must be crashing incessantly.

    David F on Monday, December 24, 2007 at 03:59 PM #

  7. “….The idea is that you have a physical presentation tier (usually JSP, ASP, or some other *SP) that talks to a physical app tier via some form of remote method invocation (usually EJB/RMI, CORBA, DCOM) that talks to a physical database tier (usually Oracle, DB2, MS-SQL Server). The proposed benefits of this approach is that you can scale out (i.e. add more boxes) to any of the physical tiers as needed……”

    I’m a C# developer, and at my shop, we’ve developed several scalable apps without using this Physical Three Tier architecture. Instead, we’ve used a Logical Three Tier Architecture, and it works fine. Actually, it works great. And I would say that we are not unique or in the minority amongst c# shops, judging from other .NET developers that I’ve spoken to. I would imagine that I could say the same for Java shops, but I can’t speak for that world. My point: don’t confuse the words and marketing speak from a given vendor with the actual actions of the developers in the field using that vendors technology. And that goes for whether that vendor is IBM, Microsoft, or a Benevolent Dictator such as Rasmus, Guido or DHH.

    Kamau Malone on Monday, December 24, 2007 at 05:15 PM #

  8. “Despite many of the documented benefits of MTS, they told us to rewrite everything as completely stateless”

    Doh… This was always how you where supposed to write distribute object systems using DCOM/MTS according to MS (and everyone else).

    Anonymous Coward on Monday, December 24, 2007 at 06:33 PM #

  9. I noted an interesting point when reading your blog.

    My experience with LAMP is that it can be pretty slow when I do something more complex, and it gets so slow that I have to optimize my code to run efficiently.

    For applications like J2EE, most of default stuff are there ready to run your complex application easily (with lots of built in stuff like EJB and large cpu as minim to even run the basic stuff).

    My argument is that hand optimization is exponentially better than some generic solution with built in facilities. Thus, LAMP easily works compared to complex solution like J2EE as the developer hand optimized the application to run efficiently.

    Henry Tan on Monday, December 24, 2007 at 07:13 PM #

  10. How many layers you have may matter less than how you go about things.

    Around 2000, I was involved in a project to create a “next-generation” e-commerce system for a very transaction-intense retailer. We had good hardware, caches, SSL accelerators, Oracle on the back-end (MySQL wasn’t quite as universally accepted then as it is now, but I still used it internally)… and a big, expensive ($5 million for a CD!), piece of software that used a bastardized Tcl for coding. (Those who should know what I mean do, at this point.)

    We were trying to do things no one had ever done before with that package. We coded it all. It was beautiful. It was elegant. And under load testing, it died screaming. Rather conveniently, there was a merger with another outfit that had their own team working on a next-generation e-commerce system, and between the two teams, they wound up creating something that actually worked. It took them an extra 6-9 months, but they did it.

    I think the thing that actually worked may have still had a physical 3-tier architecture. I know they had truckloads of cheap 1U Suns on the front end serving up the content; I’m pretty sure they had midrange ones in the middle handling all the transactional stuff, and I know they would have needed at least 4500-class gear for the database.

    But the software packages they used? Oh, Apache and Tomcat and stuff like that. ;)

    The codebase in question wound up powering (and may still power) about half of the top third-party flight booking websites in the US.

    To sum up, those “simple” and “primitive” tools have “fewer parts to break” than the big toolsets, which actually does matter when you’re trying to minimize the number of things that could possibly go wrong.

    Personally, I swear by LAMP, for various values of P (or *AMP where * is sufficiently Unix)

    DJ on Monday, December 24, 2007 at 07:55 PM #

  11. Hahahahhaha.

    So what you’re basically saying is that the only way to build large scale applications is to turn them into finite state machines. About god damn time everyone figured that out.

    Mike Seth on Monday, December 24, 2007 at 08:03 PM #

  12. Great article! I was totally with you till you mentioned Wolfram’s book. Great, dare I say amazing, guy, but that book is riddled with flaws and been publicly denounced by many academics.

    Peter Cooper on Monday, December 24, 2007 at 09:12 PM #

  13. Dude, if you’d proofread your article a little, there’d be a chance in hell that my manager would take it seriously.

    seperate —> separate

    alright —> all right

    “…and no one has ever actually implemented it successfully.”

    Not that I disagree with a single thing you say.

    Grammar and Spelling Nazi on Monday, December 24, 2007 at 09:58 PM #

  14. Ditto, great article but you missed…

    “there is generally a lot of sophisticated mechanisms”

    justinhj on Monday, December 24, 2007 at 11:01 PM #

  15. I stopped reading at “kills kittens”. If you’re making a serious point, ditch the cliche'ed hyperbole. It got tiresome the fiftieth time I read it.

    Antoine Valot on Monday, December 24, 2007 at 11:37 PM #

  16. Google does not use php. The three languages authorized for use on public Google web services are C++ (e.g. search) Java (e.g. Gmail) and Python (support and various ancillary functions).

    Owen on Tuesday, December 25, 2007 at 03:18 AM #

  17. To Owen: Where do you get your Google “authorized languages” info? Orkut is built on ASP.NET.

    TS on Tuesday, December 25, 2007 at 12:47 PM #

  18. It’s really up to the development community to figure this out. Well put that the big iron has not solved a single problem that they didnt create for themselves and the rest of us and really these “technologies” are primarily designed to foster and necessitate a consulting relationship with the big iron. Big Big $$$ there.

    For the rest of us, who arent trying to convince corporate America that we can do things in a really obtuse fashion and throw 150 developers for 2 years at every problem to try and “prove” that something which is crap is NOT crap and charge millions of dollars, we are increasing our productivity way beyond what the big iron could imagine by not using their tools and coming up with approaches which make the big iron more or less irrelevant in the “it works” and “it keeps working” space.

    When you are the big iron, every problem is a consulting relationship, millions of dollars and an outsourcing partnership. Thats why failed technology like EJB keeps going, because its the ivory tower, Im complex only to impress you on the whiteboard school of engineering.

    If only there were an ANTI technology to big iron technology besides the human brain. Something we could point to which would be obvious and usable by everyone that would bring things back into perspective and put the cost of IT into a reasonable space.

    Big iron profits by making corporations feel like IT and software is too unmanageable for them so they turn it over to big iron and their outsourcing partners.

    This costs jobs and future careers IT pros and engineers. When you think of big iron technology and its complexity remember that projects built with it are intended to be implemented in overseas sweatshops where complaints and thoughtful contemplation of the manageability of such technology are nil. And where labor is cheap or free for the length of the project.

    Sorry for being so cynical but when the cognitive dissonance reaches the level of obvious you have to realize the real purpose of this technology is not to make things easier for US. Its to make it harder so we give it up to THEM.

    Jabber HQ on Tuesday, December 25, 2007 at 02:09 PM #

  19. Actually Amazon, Ebay, Yahoo, Google, and Facebook heavily employ separated physical tiers, and have been for many years.

    Yahoo & Ebay are both on BSD, not Linux. Google runs on highly customized Linux builds, and from what I’ve heard Yahoo mainly uses PHP as a glorified templating/front-end tier doing very little serious heavy lifting.

    I’d highly suggest that anyone reading this take it with a pinch of salt – it’s mostly personal rhetoric with no evidence supporting claims made. Not that I don’t agree that Linux is a powerful platform, but it’s only one in a room of many.

    Tom Dean on Tuesday, December 25, 2007 at 06:16 PM #

  20. This article is suggesting php is being used in ways which is is not in rather large organizations. Most of the big names listed use php as a front-end tool. A templating engine if you will. They do not use it to do any heavy lifting.

    I’d take this article as a gain of salt. Use php for what it is. Not what it isn’t. Use the best tool for the job right?

    Charles C on Tuesday, December 25, 2007 at 06:39 PM #

  21. Grammar and Spelling Nazi:

    alright —> all right

    actually, ‘alright’ is a viable word. I love when self-proclaimed grammar nazis get it wrong.

    Better Grammar and Spelling Nazi on Tuesday, December 25, 2007 at 08:44 PM #

  22. Where do you get your Google “authorized languages” info? Orkut is built on ASP.NET.

    But isn’t Orkut something that Google acquired? I doubt that they would rewrite existing code, even if it’s not an authorized language. But for new projects, it makes sense to have a list of languages that you may use.

    Anonymous Coward on Tuesday, December 25, 2007 at 09:05 PM #

  23. How could you forget wikipedia?

    Cainus on Wednesday, December 26, 2007 at 12:19 AM #

  24. I don’t get it. Why are you all commenting on a 2 year + old post?

    Anonymous Coward on Wednesday, December 26, 2007 at 02:25 AM #

  25. Why are you all commenting on a 2 year + old post?

    Because most of us are still fighting against the physical architecture nonsense on a daily basis. In fact some of us are locked in hand-to-hand combat with “consultants” and “architecture experts” who keep trying to put all kinds of development-halting obstacles in our way while our bosses ask why we’re not listening to the nice new expert they’ve wheeled in :(

    Another Anonymous Coward on Wednesday, December 26, 2007 at 10:29 AM #

  26. @AC “Doh… This was always how you where supposed to write distribute object systems using DCOM/MTS according to MS (and everyone else)”

    Wrong.

    It’s pretty hard to find now, but if you’re bored, check out the old MTS documentation. For comparison on the J2EE side, read up on stateful session beans.

    Hindsight is easy.

    Robert Stewart on Wednesday, December 26, 2007 at 11:37 AM #

  27. It doesn’t matter much if your pages are static or dynamic. Even entirely dynamic pages, if they are media-rich, are mostly composed of static content, by byte count and by reference count, and will benefit in scalability from the Web’s REST caching architecture.

    Aminorex on Thursday, December 27, 2007 at 03:40 PM #

  28. I agree 100% with nearly everything you say; if I were to re-read your post again, perhaps I’d agrew with everything.

    Except this: why do you think he’s actually making an argument based on reality? He’s neither stupid, nor ignorant. You don’t get to his position at a big software vendor, if you’re stupid or ignorant.

    No, he’s just spreading FUD.

    So if it were me, I’d stop at “dude, all the biggest sites use LAMP; wake up and smell the coffee, before it burns your house down, dude”.

    All your other points were very good, too. But fundamentally, they don’t matter, just as it doesn’t matter that GCC is a much more bug-free C compiler than most of the ones provided by vendors; the vendors will never admit it, the vendors' terrified customers (remember FUD) will never know it, and the rest of world, well, they’re happy with GCC, so what do they -care- what the vendors say?

    Mortimer Snerd on Monday, December 31, 2007 at 12:30 PM #

  29. “More specifically, the Shared Nothing variant of the Logical Three Tier Architecture says that the simplest and best way to build large web based systems that scale (and this includes enterprise systems goddamit) is to first bring the presentation and app code together into a single physical tier. This avoids remote object hell because the presentation code and the business logic / domain code are close to each other.”

    That’s okay so long as you maintain discipline in your codebase and ensure that the presentation code doesn’t get too intimate with the business logic.

    I would like to see a proper definition for what you are calling “remote object hell”. Perhaps you are referring to EJB’s and fine grained method calls?

    “But the most important aspect of this approach is that you want to push all state down to the database. Each request that comes into the presentation + app tier results in loading state for a set of objects from the database, operating on them, pushing their state back down into the database (if needed), writing the response, and then getting the hell out of there (i.e. releasing all references to objects loaded for this request, leaving them for gc).

    That’s the rule."

    That’s a rule – pushing all state into the database can create a scaling problem of it’s own especially for write intensive applications where caching only helps so much. Further you’d better have a partitioning or sharding strategy in place or at least a nice way to get one otherwise you’ll be in pain come the day you can’t scale the Db (perhaps because you’ve become successful and got a lot of customers). Lastly databases can bring with them some nasty availability problems.

    There’s nothing wrong with what you’re saying but it needs balancing against comments from the likes of Werner Vogels in respect of databases e.g. http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

    Dan Creswell on Tuesday, January 01, 2008 at 11:48 PM #

  30. Anyone who talks about “heavy lifting” needs to understand that Wikipedia, the 9th-most visited website according to Alexa, does all lifting with PHP to the best of my knowledge. The only way I could be wrong would be if the Wikimedia Foundation customized their MediaWiki installation, fitting it with C components for heavily used routines.

    You on Friday, January 11, 2008 at 05:18 PM #

  31. We also do heavy scaling (170 mb/sec) on top of LAMP stack – though we also use things like squid, LVS / linux-ha / ldirectord, and a bunch of custom mailing software, but all on top of Linux / OSS tools. LAMP stack dead? we do more traffic than most ‘enterprise’ websites ;)

    -mandrake

    Mandrake (Geoff Harrison) on Tuesday, January 15, 2008 at 01:23 AM #