Scaling Web applications with Scala, Clojure and Groovy
By Cameron McKenzie
For Web applications, it's important to be able to scale up by adding processors as the user base grows and the volume of work increases. For Java applications, this process can potentially be more complicated than simply purchasing and installing 20 new processors. However, the Java platform can and does support scaling Web applications via peripheral languages such as Scala, Clojure and Groovy.
Using the Java programming language, developers have difficulty making Java applications scale linearly. Beyond the initial handful of processors, every time you add a new processor to your system, you won't find anything near an equivalent marginal improvement in overall performance. The problem is that programs written in Java implement concurrency using a combination of threads and locks, and as you add more processors, a Java program spends more and more time worrying about locks, even when there isn't any underlying data contention issue. It is this little reality that creates an inevitable and unavoidable bottleneck as you begin to scale your programs to utilize more and more processors.
The Java scalability paradox
Here's the paradox. The Java platform can scale pretty much infinitely, or at least to the limits of how many processors a modern machine can cram into a server. You see, the scalability limitation is confined specifically to the Java programming language itself, but it is not a limitation of the Java platform as a whole. In fact, when programs are written to properly take advantage of the underlying Java platform, linear scalability isn't an issue. You see, plenty of Web applications are written at this very moment that take full advantage of the infinite scalability of the Java platform. How do you do it? Well, it’s just a matter of using a language other than Java.
Peripheral programming languages to the rescue
The scalability issues with Java aren't a new revelation. In fact, plenty of work has been done to address these very issues, with two of the most successful projects being the programming languages named Scala and Clojure.
Using concepts such as mappers and message passing, in a manner reminiscent of the old, highly efficient concurrency programming language Erlang, Scala is finding ways around the problematic thread and locking paradigm of the Java language. Furthermore, both Scala and Clojure take an 'immutable' approach to managing state data. You see, with Java applications, the various properties an object contains can be changed. Since these properties can be changed, access to these properties must be locked so that the data never falls into an inconsistent state.
With Scala and Clojure, on the other hand, data tends to be immutable. Since it is immutable, it can't be changed, and since it can't be changed, there is no need to ever lock the data. And rather than editing or changing properties of an object, if a change does need to occur, rather than an edit, an entirely new immutable object is created with new properties, along with a timestamp or equivalent demonstrating that this new piece of data represents a new snapshot in time.
"What immutability means is that rather than changing the state of an object, such as a bank account record, you instead create a new bank account record and you copy the state across. If you have a collection and you modify it, you get a new collection object with the modification in it; but the old one is still there." Says James Strachan, Fellow at FuseSource and the inventor of the Groovy programming language.
With a highly concurrent list, you can point to that list at one particular stage of its progression and the object you're pointing to won't change. It will always be that one static object. But future modifications are possible because every time you modify the object you get a newer version of the object with a new time stamp. You could point to either version of the object and each would reflect the state of that list at that time. Strachan likens each object to a snapshot in time.
"As soon as you do it this way, many of the worries about locks go away," says Strachan. "The readers can keep that collection forever with no locks, no semaphores, no actors and no weird stuff. We should all try to be immutable because then you do not have to use locks on readers, you just need to do locks on writers."
By treating data as being immutable, a bit more stress may be placed on memory and data management, but giant, linear performance gains can be achieved simply by adding processors to the system.
Scala, Clojure and Groovy interoperability
Another great thing about languages like Scala or Groovy or Clojure is the fact that they not only run on the Java platform's Java Virtual Machine (JVM), but they can run alongside normal Java applications, and even interact with normal Java applications, linking to the standard Java libraries, or even allowing programs written in Java to access code and libraries that are made from Scala or Clojure at the source code level. In fact, many 'functional languages' currently used and accessed by standard Java applications have been written in a peripheral JVM language, allowing standard Java applications to leverage the benefits of Scala or Groovy programs when it makes the most sense.
There is a reality that the Java programming language itself might not be primed to take advantage of the mega, multi-core, multi-processor systems that are becoming increasingly more affordable, but that doesn't mean that the Java platform won't be able to serve up all of your programming and scalability needs. If massive scalability is what you need, you can always take advantage of peripheral JVM languages like Scala and Clojure, and it doesn't have to be a headfirst dive, either. Using functional languages or linking to existing libraries written in Groovy or Clojure will allow you to slowly test the waters, and discover what types of performance gains these new languages can deliver to your programs.
01 Jun 2012