Discussions

News: Cloud Computing Demystified: Part-I

  1. Cloud Computing Demystified: Part-I (10 messages)

    I recently took part in a panel discussion on Cloud Computing and how it relates to software architecture. The event used an online event organizer for signing up and the website wouldn’t let me register for the event. I later found out that I could not sign-up because the event was sold out. This says something about the hype surrounding cloud computing which has become the in-thing to be ’seen with’. The panel discussion was interesting and animated and one takeaway for me was that everybody had their own idea of what cloud computing is and to some it was barely more than a buzzword. So, is cloud computing merely hype? While there truly is value in what is provided by Amazon (EC2 and other services), Google’s App Engine, Microsoft’s Azure and a few others, the providers (or at least their marketing teams), if you ask me, have not helped a whole lot. It seems like every company that used to provide virtual private servers is now a cloud computing provider. So are companies that used to provide hosting solutions, and every company that used to be a destination for an outsourced data center. Soon, we will have clouds of every other conceivable kind! When a term can be applied to pretty much anything, the term starts losing its meaning. Cloud computing, which I would put in the first stage of Gartner’s Hype-Cycle (hype followed by disillusionment followed by realization), is the latest buzzword that marketing teams love to apply to anything they can. I asked representatives from several companies that claim to provide cloud solutions what did they really mean when they talked about a “cloud”. When pushed, pretty much all of them ended up saying - “its just a buzzword”. Getting back to the panel discussion, there were several strands of discussion which I would have liked to address in detail but couldn’t because of time constraints. So I decided to use this blog to move the discussion online (and to also share it with a wider audience). In this and subsequent posts, I will try to summarize as well as share my view of this technology, starting with some of the points from the panel. Cloud computing is nothing new: Cloud computing, like most other technologies, is an evolution, not a revolution. If it does end up changing the way we develop, deploy and manage software (and other components), we might start calling it a revolution but at this time, it represents incremental but very useful changes in several existing technologies and how it all comes together. Most offerings use virtualization, deliver services over the Internet and expose HTTP based APIs for accessing and controlling the services. But what is truly important for me is not whether its new or old; but whether it solves my technical and business problems. There are several class of applications and organizations for which EC2, Azure or Google App Engine is the right choice. Moreover, I can not think of a single service in the late 90s or early 2000s where I could provision Linux boxes in a matter or minutes by just signing up on a website using a credit card. If nothing else, the sheer ease of use and affordability (at least in the beginning) is an innovation worth taking note of (and make use of if, it makes sense). My take on it - it works great when you use it to solve the problem it was designed to solve. There is no such thing as infinite scalability: The use of term “Infinitely Scalable” by the cloud computing service providers, in my opinion, is misguided. A better term to use would be “Sufficiently Scalable” but it does not have the same ring to it. Amazon’s or Google’s infrastructure can effectively provide “sufficient” scalability for small application. If you are mostly running on a couple of Xeons with 12GB of Ram or so, a typical spike in demand might need you to scale up by 4 or 5 times which can easily be handled by the Amazons of this world. I suspect, however, that you can not run another Amazon on Amazon’s infrastructure (especially during Christmas time!). There is a slightly different but related discussion about differences of scale and statistical models which I will not get into. What about the security of my data? Its no less secure that using a hosting solution of the past. While people like to complaint about Hypervizor breaches and how most cloud computing providers (especially those who provide infrastructure as a service) use virtualization, the reality is that Hypervisors are far more secure that the operating systems they run. The two main and mostly related reasons for this is that the people who build Hypervizors realize this (and they are a pretty smart bunch), and that the Hypervizors are kept as small as possible. While I am not an expert in this area, the last time I was talking to one of the experts in Hypervizor and virtual machine security, he took pains to point out that inspite of the fact that it is bad for his business, Hypervizor breaches are extremely rare and that most Hypervizors are quite secure. I would, however, like to point out that what happens in case of a security breach is still not clear. We simply do not know enough about this stuff. And if you have contractual or legal requirements that prohibit you from moving your data into the cloud, you will simply have to wait until the lawyers catch up with the technology and the providers address these concerns directly through their SLAs. The clouds are not interoperable. Because every-body’s offering of “cloud” services is different, the questions of interoperability while being a very important question, does not make sense at this time. The really compelling offerings today are from Amazon (EC2 being the big one, but there is also SQS and S3 and a few others), Google (app engine) and Microsoft’s Azure. EC2 gives you virtual Linux servers (and now windows too) and options of launching pre-configured servers. If all you are truly using is the Linux platform, nothing stops you from moving over to someone else who will also provide you Linux boxes (either virtualized or real). While you may have to make a lot of changes to migrate your Java application over to Google App Engine, taking your application out of the App Engine and deploying it somewhere else should not be a problem (assuming you have appropriately abstracted out the areas that use Google specific APIs). I have heard talks of Azure supporting Java (and a few other platforms) in the future and it will be interesting to see how portable Java applications will be between Google App Engine and Azure. What is the real cost of using these services? This is a tough one and merits at least a white paper of its own. I do not have a clear answer but I say this - two main advantages of the cloud, the ability to scale up when you need to, and pay only for what you use go a long way in making cloud computing solutions cost effective. If you know your usage and it does not vary much, you do not really need to use a cloud computing provider. There is work needed for moving an existing application either completely or partially into the cloud and you should not, as a matter of basic engineering principle, use a cloud simply because its the in-thing. It will not make sense and it will not be cost effective. You may, however, have other reasons for using Amazon or Google (for example, cloud computing payment models let you convert your capital expenses into operational expenses). In my next post on cloud computing, I will list some of the prevalent definitions of cloud computing and try to factor out the characteristics that make most sense to me (especially for providers that we are most interested in) and try to come up with a working definition of the term that is most relevant to us as software architects/designers/engineers.

    Threaded Messages (10)

  2. Re: Cloud Computing Demystified: Part-I[ Go to top ]

    And what exactly has been demystified by providing this collection fluff?
  3. Too long...[ Go to top ]

    It is a nice write up but as almost anything cloud-related today it's too long and it's winded. I think the unsuspecting reader will get readily confused after reading this piece. Clouds are data centers that use virtualization for their hardware management and expose it via developer-friendly API (e.g. REST). Cloud computing is your traditional grid computing (combination of compute and data grids) running on the cloud. That about sums it up for me. Yes, there are plenty of known technical and business challenges which we can all list adnauseam - and many of them already have practical solutions today. My 2 cents, Nikita Ivanov. GridGain - Cloud Development Platform
  4. No virtualization needed[ Go to top ]

    Sorry Nikita, clouds have nothing to do with virtualization. That's just a technology most, but not all, clouds employ. Otherwise I agree with your definitions.
  5. Re: No virtualization needed[ Go to top ]

    clouds have nothing to do with virtualization
    Scott, May be you are right in theory but you are 100% wrong in practice in my opinion. My take on these things, by the way, is much more practical due to our work at GridGain project. Virtualization is at the heart of how all clouds are built today (mostly by existing internal and external data centers). And if I may not be aware of virtualization-less clouds - please share as I'm interested to learn about it... Best, Nikita Ivanov. GridGain - Cloud Development Platform
  6. Virtualization-less clouds[ Go to top ]

    A really great cloud service is NewServers. You can even buy dedicated storage by the hour. Their servers are non-shared physical servers billed hourly that are deployed instantly (within minutes) using default or custom images just like you would with EC2.
  7. Re: Virtualization-less clouds[ Go to top ]

    Got it. Thanks for the link. Nikita Ivanov. GridGain - Cloud Development Platform
  8. Good Analysis[ Go to top ]

    This is a good analysis on cloud computing. I strongly feel standards need to evolve across these cloud service providers, otherwise could put users in jeopardy soon.
  9. A better term to use would be “Sufficiently Scalable” but it does not have the same ring to it. Amazon’s or Google’s infrastructure can effectively provide “sufficient” scalability for small application
    The whole objective of cloud computing is to build "massively scalable" applications. Thats why, even the regular database (which is the usual scalability killer) is replaced in the cloud with distributed storage systems such as Bigtable (on GAE) and Azure storage (on Microsoft Azure). http://manidoraisamy.blogspot.com/2009/01/who-are-competitors-for-gae.html Both GAE's and Azure's architecture (stateless, distributed storage) are based on this fundamental need to scale. So missing that point would only lead to wrong assumptions such as:
    Cloud computing is nothing new:
    The article gives a feeling that cloud computing is no different from hosting. It would be good if the author can do more research and correct the factual inaccuracies. thanks, mani
  10. cloud computing is no different from hosting...
    In fact, I think the term "cloud" is really not much different from hosting. I view clouds as nothing more than a modernized data centers (private or public) that are build with some sort of virtualization and exposing it to the developers via something like WS-*, REST, etc. That's different, however, from "cloud computing"... Interesting point is that we've had distributed storage and processing for many years. Traditional MapReduce and distributed in-memory caching have been widely available for at least 3-4 years by now and have nothing to do with clouds per-se. Running these types of applications on the clouds - that is a different story and I believe that is what defines cloud computing as a term. Best, Nikita Ivanov. GridGain - Cloud Development Platform
  11. Nikita, Yes. Distributed storage/processing has been for some years (distributed caching uses memory model, while Google uses disk writes on commodity hardware with redundant storage). But in hosting, the hardware was not available as a service which can be automatically provisioned for "Burst compute" needs. That along with the architecture to "hot deploy" across hardware is the key to scalability on the cloud. BTW, how many people used distributed storage in hosting? Now that the storage architecture is different, the usual sql stuff (and table joins) are gone. Isnt that a big enough shift? mani