Why does "Elastic" nature of cloud impose "statelessness" constraint on App servers?


News: Why does "Elastic" nature of cloud impose "statelessness" constraint on App servers?

  1. Cloud is successful with Pay-per-use model in software, because it invented elasticity. The most important point about elasticity is that - 9 machines that are serving a company during the day time will shrink to 1 machine in the night automatically based on load. The 8 machines that are released will then start serving the other company along with 1 other machine. This property of cloud that allows 2 tenants (or more) to share processing capacity is called shared-process multitenancy.

    So why not stateful architecture?
    Lets just say 1000 users are distributed across these 10 machines during day time i.e. 100 users in each machine. In a stateful architecture, the 100 users will be served only the server that created the session for the users during login. This happens because the session state is stored in the memory of that server. This is done by the http load balancer and is called session stickiness.
    When night falls, lets assume that 900 users are logging out of the system, while the rest of the users (i.e. 100 users) are still logged in. Ideally, 1 machine should be good enough to serve all of them. However, these users could be distributed across all the 10 servers with 10 users each. So shrinking back to 1 machine is not possible i.e. it breaks elasticity.



    Threaded Messages (16)

  2. While I am a big proponent of statelessness, I'm not sure this is completely true.  A lot of smart people have put a lot of time into building solutions for distributing state.  I think they will always impose some limitations but at a high-level, I belive it's possible to have elastic statefulness.

    Having said that, if you can make a solution stateless, that's the best option.  There's no reason to impose unecessary statefulness.  And for those times where statefulness is truly a better choice, non-persisted state should only exist for fairly short time-periods.

  3. James,

    I did write about elastic statefulness in the original blog: "One way to solving this problem is to replicate session state across all the 10 servers. This way the user can be served by any server in the 10. But each server will occupy 10x times memory than having just it's users session state. This reduces the usefulness of the servers as more servers are added to the cluster when more users are added to the system. So this doesnt work for exploding number of tenants, who sign-up for your application in shared-process multitenancy."

    Also, the moment you start replicating state synchronously (for consistency) you lose performance. Visa versa, you lose consistency in distributed systems - CAP theorem.

    If you find a better way, pls do share it.


  4. After posting the reply, i realized that the cosistency issue is not relevant to user session. State is not shared across users to have concurrency control issues. But the other point on replication is still relevant, IMO.

  5. Why does[ Go to top ]


    As I tried to note in my original reply, stateless is better.  Why add all the complexity of session replication if you don't need it?  Simple is better, for sure.

    I'm not ready to say that statefulness must be banned entirely.  There's always the option of having the client manage the state.  Maybe statefulness can be eliminated, maybe not.  I'm not ready to make a stand.

    What I'm saying is that there are options for elastic session management.  I don't believe that it's necessary for each session to be replicated to every other server.  You can do that, of course.  If the hosting solution provides a hook that lets your application know that the host is about to be removed, then the sessions can be replicated to survivors at that time.  That's not necessarily the only option but it's a viable one.

    Again, there are always going to be challenges and limitations with statefulness so I think your main message is right on.  Stateless is to be (greatly) preferred.  But you might want to stick to the advantages of statelessness and avoid questionable assertions that may distract the reader.


  6. Why does[ Go to top ]


    Both of us are in agreement that statelessness is the best practice for cloud. However, my interest goes beyond that. I am trying to reason out PaaS providers' (GAE, Azure and possibly vmforce?) decision to constraint the platform to stateless "only" architecture instead of having it as "best practice". They should have decided to do this for one of these 3 reasons:

    • Technical inviability - which is what i was trying to communicate in the blog. Although, i see your point that it can be made viable.
    • "Best practice" as the only way - May be it is too much complexity, as you mentioned. But that would mean that significant number of frameworks and application that are stateful can not be ported to these PaaS. So, why would they make such marketing blunder?
    • Inability to translate into costing/pricing - May be it is difficult price stateful apps (unlike request based pricing for stateless apps) because of memory overhead (due to replication) & session time period.

    My interest as a cross cloud platform provider is to understand the reason to see if there is a common pattern emerging across these architecture and possibility of standardization thereof.


  7. Try Coherence*Web[ Go to top ]

    Coherence*Web is the session management module that comes with Oracle Coherence. It provides a virtual store of sessions for grid- and cloud-based applciations, so that you can expand or contract the number of servers without having any sessions lost, and without having to have the sessions replicated to all of the servers. Furthermore, while Coherence may be the best solution ;-), it's certainly not the only solution, and this is a problem that has had good solutions (including Coherence) for at least five years now.


    Cameron Purdy | Oracle Coherence


  8. Try Coherence*Web[ Go to top ]


    From the context of elasticity, the load (i.e. the number of sessions) should expand or contract the number of VMs/servers, instead of the other way around. Otherwise it would be a grid. not cloud.

    But, i see your point that with some hooks to VMs, you should be able to deliver elastic statefulness with Coherence*Web (and the likes). And billing should happen based on sessions created by that tenant in shared-process multitenancy. Or does it exist already?


  9. Try Coherence*Web[ Go to top ]

    Generally, one would not add or remove instances based on the number of sessions, but rather on the load experienced by the application servers and/or the response times (time to last byte) delivered by those servers. A solution such as Coherence*Web simply makes it possible to add and remove those instances without negatively impacting the end user in any way, i.e. by ensuring that their session is not lost and is available to any server, with or without sticky load balancing.


    Cameron Purdy | Oracle Coherence


  10. Session replication[ Go to top ]

    Has anyone tryed a publish/subscribe approach for session replication?

    It might allow asynchronous propagation (publish) and on-demand retrieval (subscribe) of a session when an application server does not have a session and wants to check if another application server managed it before.

    I have never heard about it, I am simply asking if someone experimented an analogous path.



  11. Session replication[ Go to top ]

    Has anyone tryed a publish/subscribe approach for session replication?

    It might allow asynchronous propagation (publish) and on-demand retrieval (subscribe) of a session when an application server does not have a session and wants to check if another application server managed it before.

    Yes, I have seen at least a few implementations of this approach. In the naive form, it has several serious weaknesses, e.g. related to the async publish (and ensuring that the publish is not n-way, i.e. someone has to be responsible for partitioning the session management).

    Coherence*Web used to use something like this as an option to find and move ownership of a session, but that approach has been deprecated.


    Cameron Purdy | Oracle Coherence


  12. So-call elastic cloud providers force containers to be stateless because to do otherwise is difficult.  Simple as that.  With stateless servers, you can simply start or kill off VMs as needed.  You can't do that with stateful servers (imagine your shopping cart suddenly disappears for no apparent reason).

    Pushing state to the client across the Internet is a non-starter if you have alot of state.  Mindless n-way replication is not the solution either.  Centralizing state via pub-sub or a database creates a single bottleneck (and failure point).

    It would be nice if these cloud platform vendors will develop something, maybe a specially modified servlet container, that can move sessions to another container in another VM when needed.  A better solution might be a distributed cache like Memcache or Coherence that does not do mindless n-way replication.  In any case, the platform needs to involve the load balancers as well.

    The big public cloud providers, instead of tacking the problem, have so far decided to take the easy way out by requiring stateless'ness.

  13. Haam,

    Just to clarify:

    Google App Engine does have Memcache. But it is not a transactional cache like Coherence or JBoss Tree Cache. So there is no guarantee that the sessions will not be lost.

    So, PaaS providers have to come up with transactional cache. It is important for

    - elastic stateful applications

    - singletons

    - read consistency for shared data - for implementing aggregate function queries. (Currently, it is difficult to this with NoSQL DBs that come with PaaS)

    May be companies like Oracle or Redhat/JBoss can come up with such PaaS. But then, oracle seems to hate cloud ;-) I dont know what JBoss is thinking - they were hell-bent on bringing conversation to seam/jee. But they have left spring/vmforce to claim "java cloud". Thats stateless world again.


  14. Elasticity with state[ Go to top ]

    After reading the original blog, it seems the author is suggesting stateless application are the way to go forward. Though I agree that's a best practice approach there are numerous occasions which require a stateful application. There are techniques and software available to solve this problem as Cameron pointed out. There are distributed network caches available similar to Coherence. In fact we are putting together a solution for elastic stateful apps on Amazon EC2, I will share the details once it is actually deployed. If the elastic nature of cloud is fully exploited the next challenge will be in scaling the database. And NoSQL is not the answer as they don't support SQL the way developers code or ORM frameworks use it. Now that's something to think about.

  15. Elasticity with state[ Go to top ]

    C'mon. Dont hate NoSQL ;-)

    Coherence with CohQL is also NoSQL that way, except that it uses distributed network cache. You will have to flush data to db, if you want to use SQL queries.

  16. Elasticity with state[ Go to top ]

    In fact I am not against NoSQL. We have built a search engine on Hadoop & HBase, so we are in fact leveraging NoSQL to its fullest. My point is NoSQL is not a straight replacement to RDBMS. ACID cannot be traded for CAP, they satisfy different use cases.

  17. Coherence[ Go to top ]

    Thank you for this informative post. Waiting for some more info.

    <a href="Iphone" rel="nofollow">http://www.gotronics.com/gocare">Iphone Insurance</a>