|
|
 |
January 2008
Download the Slides for this Presentation
Nati Shalom: Hello, my name is Nati, Nati Shalom, and I am from GigaSpaces. With me is Shay Banon. He is our leading architect, and what we will do today, the main theme of this presentation would be to talk about how we move from tier-based approach to a service approach, and basically I would say that the service approach is the important part of that, but how do we get our existing tier-based application and move that to a point where we could really scale it out even if it is stateful. Stateful meaning that the data sits in the business tier, and we will talk about how we combined things like data grids and also things like scaling out approach, and we will utilize Spring as the way to simplify all that, and so the nice thing that you will see is that you can really get to the point where you could write once as I call it and then scale it anywhere without changing your code. So, that would be another path with that theme.
In terms of the schedule, we will do that in several parts. The first one, we will be talking about the challenges that actually move us to think in those terms and then we will talk about an approach that is generic, and we will see what are the patterns that we utilized and we experienced that worked for us to get to that point. Later on we will actually go and take a more deep dive, and Shay Banon will actually show a real demo, will actually run an application and show you how it can scale across multiple machines. In this case, we will run it on a single machine, but it will simulate the scaling of multiple machines, and we will leave 10 minutes towards the end for questions. So, we have a lot of things to cover, and I think that putting those questions along the end of the presentation should be sufficient enough. In any case those that would not get their answer covered, we have a booth there and you could always stop by our booth and ask questions. So, these are basically the goals of this presentation and what we want to do and what we want to achieve, and if we will skip to the next slide.
A few words about GigaSpaces. How many people are familiar with GigaSpaces here? Heard the name, raise their hands. How is that, quite a few? GigaSpaces is actually there for quite some time, I think starting from 2000. Many people know us by the JavaSpaces background, hence the name. We grew a lot from that point and as you would see today, we are dealing not just with the JavaSpaces part of our product, which we utilize extensively, but also how we deal with the problem that we are actually presenting today. So, obviously, there is a close proximity between what we are saying and what we are actually doing. The core part of our product is built on two layers. One of them is the data grid layer, which is built on top of a space, the space model, and on top of that we have an Application Service Grid, which deals with how we build the application on top that data grid in order that we will be able to address the scaling out of the entire application. These are the two parts of our product, and we will utilize that as part of the demo.
We have an interesting set of customers already. Hundreds of customers are using the technology already, mostly in the financial industry, the most high-end type of application. Interestingly enough, we are in the process of actually running some of the casinos here on top of that model. As obviously in the next conference, you will see some new machines running here instead of the actual machine that you are seeing right now, much more sophisticated with video streaming and things of that line, and that would be running on top of GigaSpaces. So, that is something that is actually being worked out these days. So, that is pretty much the customer base in the spread of application that we are dealing with. Again, the type of problems that you will be seeing that we are addressing is something that is growing pretty rapidly today beyond the financial market.
If we look at the classical tier-based application, one that exists in financial industry as well as in any other industry, we see several parts that are very commonly in most of those applications. We have the feeds, the feeds are the data source and the data source could be a user interacting with a system, but today we could have a lot of cases in which we have a system interacting with a system, whether it is by loading data for a file or by actually interacting through a JMS feed, which can write a lot of the events from that other system to my system. So, a feed could be relatively every data source, and we have a continuous flow of data or best of data that is coming depending on the data source itself, and usually on the business day what we have is some sort of a workflow to deal with that events and data. The workflow usually includes several steps, the generic one would be passing that data, getting that rote data, making it something that is more useful for my business, enriching it, then after we pass that validation we are doing something with that data. In the trading application normally that process would be the actual match between the trade and the existing bits in the system, and then we are doing the actual routing or execution meaning that we are executing the requests and we are completing that.
If you think about some of the interesting terminology, the end-to-end scaling or end-to-end latency as we call it, from a latency perspective what we normally see is that the event flow is such that it goes from the messaging system and then goes through the steps between the messaging system to the database. So, eventually what we have is a message going on, going through step A, then it goes to the database to maintain the state related to that step A, which will be the passing, for example, then we took another event through the messaging system to start another step, which will do the matching, for example, which will, and by updating the state of that event in the database as well and going forth to step C and obviously in a real-life application we could have multiple steps.
The impact of that in terms of latency is the end-to-end latency, the time it takes for us to complete the attachment section will be very much dependant on the amount of those steps, and the amount of interaction between the messaging, the business logic and the data in each step. So, the end-to-end latency would be the summation of the time it takes to do step A, step B and step C, and again it will be very much dependant on all the tiers that is involved in the execution of that transaction.
Now, that is one problem because one of the problems here is that if we really want to scale and add more steps, we really linearly add the latency, increase the latency. If you think of all those steps in phase systems and we add scalability to that, what we really want in terms of scalability is the ability to take that system and get to the point where if one installation deals with 1000 transactions per second, well we could plug in another machine, it will be able to hand out 2000 transactions per second without changing my architecture and without impacting the actual business or correctness of the application. That is really what scaling is and without impacting the latency. That is what we want to deal with when we deal with scalability. Obviously in real life, in many cases, we do a lot of compromise when we actually deal with those problems but that is really the end goal. The end goal is that we will be take that application and just by adding another instance of that application, we will be able to deal with linear scalability of the amount of transaction that we can handle.
If you look at the bottlenecks in that architecture, obviously there are several bottlenecks in each tier. The messaging is built as a central server in many cases. Obviously there has been some progress in that regard. There has been again the multiple steps overhead and also the database overhead, which is pretty obvious. It is a centralized place where everyone, even if they would scale and I think maybe some of you had the under load. The under load basically talks about the point where even though we could add more instances, even if we have 10% of our code synchronized in a single contention, if it will increase the number of instances in our application or the scale of our application by a factor of 100, we will achieve only 10 of that number. I will scale only by a factor of 10 and not by a factor of 100, and all of that is because of this synchronization. The fact that we have a single contention that all those multiple units start to synchronize with and obviously database would be a large part of that contention. So, we will have to address that as well, but the bigger problem if we think about scaling and if we really think about the end goal, how we get to the point where we could really add another instance of machine and scale my entire application is within the architecture itself, the fact that we are thinking in tiers.
When we think in tiers, we actually build each tier in a way that it will handle the high availability of its own, the scalability of its own without relying on the other tier and therefore we have a lot of moving parts. Moving parts would be that the messaging system has its own availability system and scaling system and therefore if it needs to maintain redundancy, you will need for every message that goes through the systems, go through a backup system. Then we have the same thing for the sessions within the business tiers as well as for the database. So, for every transaction we are doing a message, going to the backup of that messaging system, going to the business tier, going to the backup of that business tier, going to the database, going to the backup of the business tiers.
Now, if you start to add more instances like that, obviously you start to see a huge mess and that is really the main problem, and scalability, the nice thing about scalability or the interesting thing, I would not say nice, is the fact that you could be as strong as your weakest link and it is very hard to get to the point where you could really scale linearly without addressing all those issues. That is really the challenge. You can solve several issues, but you cannot really scale linearly, and the goal and what I am going to present is how we scale linearly. How we can go in tier that we can scale linearly? What are the principles to get to that point?
So, obviously one of the patterns that we all heard is that we need to move from that application centric tier-based approach to a service-oriented world. Here you could see, in the picture itself you could see that the picture looks cleaner when you move to that world, but there is a big question mark here. How do we do that? So, everyone heard those buzz words and usually when we heard about SOA, we think about web services, but if you even try to put web services in that world, obviously it would not be solving any problem, and I would say in the optimistic case. In the reality, it would even add more problems. So, how do we move from that world to that world in the right way? How do we really utilize the SOA concept, in a correct way that would get me to the point where I can scale linearly? That is really the first question that we want to answer. Let us see a typical example of how we deal with that today and how we could do that better.
So, basically what we have today is usually we have the term ESB and what we are normally educated to do is have loosely coupled services instead of an application centric. So, we could have that same services hanging on that kind of virtual bus, which becomes the messaging system and they communicate not directly with the feed or with the end users, they communicate through that bus, that allows me to add another unit very easily, if you will click on the next one, and spread the load between those different units without again changing my code. The problem with that world is there are two types of interesting challenges in that solution. One of them is the messaging itself, it needs to be scalable obviously, meaning that it needs to be built in a way that it will be virtualized rather than centralized. The other challenge and that is even the bigger challenge is that the application is not really stateless. We use the word stateless in many cases when we think about scalability, but what we really mean by that is that the state does not sit in the business logic tier, it actually sits in the database. So, there is a state somewhere, we just ignore it. So, there is no such thing as stateless applications even in the SOA world or I even would not even say it is in the mind of the SOA world, and therefore I am limited as my weakest link.
So, how do we solve that? This is where the space model comes into place, JavaSpaces. How many people here are familiar with JavaSpaces, heard the name? Okay, actually quite a few. So, the nice thing without going too much into the detail, and I can spread the entire this session just talking about JavaSpaces and what it really means and the philosophy behind it. The main principle that I want to focus here is the fact that it does not separate between the states i.e., the data and the messaging. There are four verbs, write, read, take and notify that combines the messaging and the data into one runtime technology, one sort of set of APIs. That is really the value that the space model can bring into that world. So, if we look at that picture we could use those four APIs, the write, read, and take and notify, the same way that we would use with an ESB instead of using a"send and receive", we will use"a take and a write","a take" to receive the object and"a write" to send the message. The nice thing that they could be blocking and therefore we could by doing"a take" with a blocking time out, we are actually doing a subscribe. It is a very simple way to do that. The take itself also have some semantics to do query, which enabled me to do an associated writing, content based writing, again, in a very simple way only four APIs.
Now, the part of that is that I can use that same four APIs to share a state, the same runtime. The only difference is that I will use the"write" and the"read" to access the object and I would not really delete them once I consume that. That is really the only difference. So, the only difference would be the lifecycle of the object between the messaging scenario and the data-sharing scenario, but the runtime itself would remain the same. That is the main principle there.
So, by utilizing the space in that world I can address those two things without adding more moving parts into my application and that is why we have chosen the space as the main theme in my department that I am going to introduce in a second. So obviously, if we implement this space model as a centralized server then it will become the bottleneck. So, let us what we are doing with that.
So, coming back to that same scaling picture of that ESB, we can do the same thing with the space, the only difference is that we will be utilizing the"take" and the"notify" and the"write" instead of"send and receive" and the fact that the space implementation can be virtualized, i.e., it can span across multiple machines virtually, meaning that my application code does not know which physical instance to actually interact with. There is a separation between the actual API that I am utilizing and the actual physical instances over the network. I can scale only even in that level linearly without creating more contention. The other part of that is I can do the same thing with the data. So, the scaling becomes consistent. I do not add other moving parts. They are all referenced to that same kind of cloud, and therefore I do not introduce bottlenecks once I add more instances. They grow together.
From a deployment perspective, and that is one of the challenges in any service architecture, once I build services there is one contradiction that will lead me to build my application for the sake of separation of concerns in a very fine grain level. I want every component in my application to become a service. When it comes to deployment, that actually does not make a lot of sense because if I will take every service and start to communicate through the bus for that specific service, that will create a lot of overhead and that is part of the reason why a lot of projects have failed when they started to move to SOA, that same realization. So, the question is, how do I build those types of services in a way that would be fine grained to get to that level of separation of concerns, but without creating the overhead of the interaction between those services, and that is something that we call a processing unit. So, we are separating between the business logic unit, which is fine-grained and the scaling unit and the failover unit. So, the scaling unit would be a bigger unit and will include the data and the business logic and the messaging all in a single VM. The reason why we are doing that is we are doing that because we want to build the system in a way that would be self-sufficient and that is the main principle to achieve linear scalability. I would say it differently. We can achieve linear scalability only if we could get to that self-sufficient unit, only if we will have units that would be independent of the other and therefore while adding another one, I would really get the extra capabilities, in this case in terms of throughput without affecting the other units of my application. So, I have to get to the point where they are self-sufficient. In order that they would be self-sufficient I need to have everything that is required to fulfill the request of the end user within that same process or same VM.
So, what we have done is we have bundled them together, condensed them in memory the data and the messaging and then what we do is we partition them and in that case we spread the load between the machines, between the processing units themselves. So, that is pretty much the basic principle of the processing unit. We separate between the fine-grain approach of services and the scaling unit, which is more looking at the scaling from a different dimension, the dimension of the actual transaction of the request, what part of the messaging needs to sit in the same VM and part of the data needs to be there in order to fulfill that request.
Okay, so that is a very important principle. We will see an example of that in a second. The nice thing about that before we move to the example, coming back to the tier-based approach, if you look at that from a different dimension, what we have done here, we still have those tiers existing somewhere in my application from at least an architecture perspective. The only difference is they become a virtual entity. That is really the main difference. In the classic tier-based approach, the implementation itself enforced an implementation of the tiers as real physical units that I have to bundle. In this case, we actually virtualized the tiers and therefore we could actually build the application almost in a similar way to the way we have built them today, in a more event-driven way, as we will see, but the concept would remain pretty much close to what we are used to. It is the implementation that is going to change dramatically. It is the middleware stack that is going to change dramatically.
So, if you look at an application, a typical application with that workflow, how would we build it? The first thing that we will do is we build the service component, in our case it would a set of approaches. We will utilize this space to craft the workflow between them, in case we will use the"write and take" API to communicate between those different services, and in that case this would be the means in which they would remain loosely coupled. They would not know about the existing of each other and would not really rely on the existing of each other. It will be more run time thing and how we bundle them together. The second part what we will do, is we actually bundle them together in the same VM, that will become our processing unit and interestingly enough if you think about the latency that I mentioned, on the latency part because all the workflow is in memory, we are getting very, very low workflow. Even if we add steps into my workflow unlike the previous approach if you remember that, every step that we added to the workflow, introduced another overhead of messaging and data. In this case it is only memory and therefore we do not introduce the overhead and the latency becomes pretty consistent even if you scale the system, and from a scalability perspective it is very easy to scale. All I need to do, because each unit is self-sufficient, is just add more units and that is all.
So, if we sum it up kind of in a nutshell, there are several interesting aspects that we are getting by moving to that approach, they are laid out in that slide. The main one is the fact that we could really build the application in a single unit, in a single VM and then scale it without worrying on changing my code, if I am running in the low scale application because I do not really know at the beginning if I really need that level of scalability on day one. So, in that case I do not need to be worried about all that, and also from the deployment perspective, from a deployment perspective that we see, we could really deploy those different units in a single operation across multiple machines without knowing which machine I am interacting with. It is all going to be virtualized.
So, how do we do that? We introduce a lightweight container approach. And basically the principle is pretty simple. Instead of having the user interacting with the actual machines, you interact with some component that knows about the machines on his behalf. So, the user does not know anything or does not interact with any instance beyond that instance and that instance would actually pass the request, in this case the request would be deploy my application, my SLA of my application is that if there is a failover event make sure that there is at least a borrow me and a backup running in my application at each point of time as long as there are enough resources available to it. The resources that, that application needs to be running, needs to have Oracle or whatever Version 1.6 of JVM running on that machine, all that becomes kind of a declarative SLA that I apply to my system and then I click on"Deploy", and once they do the deployment, the matchmaking between the actual resources and the application that needs to run on those resources is done in the Middleware stack, and that is how I get the virtualization.
I can very easily use that model to say,"Deploy my application on the QA units". That is a very kind of a common pattern or a common use-case or deploy it in my production system without really referencing to the actual machines themselves or knowing where they are, which means for the operation guys, that they could change the machines without affecting any of the end users utilizing those machines. That is the SLA-driven container model, lightweight container model.
So, how do we deal with failover for example? If one of the instances fails, obviously the manager knows that the SLA defines that there needs to be at least a primary and a backup running. It knows that there is another machine available there in the network, and what it will do is, it will do relocation. It will move that instance into that other machines, what I get as a result of that is a continuous high-availability of my application. It will continuously and proactively handle the failure.
So, this is really the component of the solution in a very generic term. We are going now to dive into, a more deep dive into the actual code and implementation of how that application looks like, and that is going to be led by Shay Banon, but I think that the principles are pretty straightforward. The thing is that we still maintain the loose coupling of my application and get the reliability and the biggest value that we get out of that is it becomes significantly simpler than what we used to have today, and the simplicity is part of the main value of that. The other value that we get is the consistent scalability, the true linear scalability as well as very low latency because we are utilizing the e-memory capacity that we have today in our system. We do not, in most systems that utilize databases for example, we cannot really utilize the memory in an efficient way because we can utilize in that approach the memory in a very efficient way, we could improve the latency significantly and the performance and reduce the overhead, meaning that a single machine could do what most of the machines today would do in 10 instances or even a big machine. That I would say is the immediate benefit that we get out of that. At that point I think I will let Shay Banon take the deep dive and show you the actual code implementation. So, if there is any question, small one actually, we can take it now before we move to Shay. Okay Shay.
Shay Banon: So, the question is how do we take these ideas, these really cool ideas and make them a reality and actually simplify the way that we try to develop this processing unit concept and the ability to cluster or scale out them. So, the first thing that we try to do is, we try to find a way for us to be able to define the different business logics or space instances and other aspects and the first thing that we thought about was Spring because how many people here use Spring? I already know the answer, yeah, okay. So, Spring makes your life very simple in terms of assembling beans and in terms of defining the relationship between them. It also makes your life very simple when you want to add transactional support, declarative transactions, and Aspect-Oriented Programming, and we at GigaSpaces really like Spring and we know that most of the users also like Spring. So, we wanted to combine the two together.
So, if we take this processing unit example, which Nati talked about, so we have all these different parts of the processing units. We have the space itself, which is actually the top part and lower part. We are actually using the same space. We just give it different natures by the way that we interact with it, and we have our business logic. Now, we all like Spring for one of the reasons is because Spring allows us to take our business logic and just write it, just focus on our business logics, and of the different aspects, the fact that it is running under a transaction, the fact that we need monitoring on top of it, all these different aspects actually are done by Spring in a declarative manner, either using annotations or using different Spring configuration. So, we wanted the users to have the same things. So, in terms of GigaSpaces, the space is actually something that can exist within the processing unit. The fact that you are using the space, we can try and hide it from you, not because we do not believe that you need to know about the fact that you are running with a space, but because this makes it more testable. You can mock test it and so on, and this is what we try to do. We also identify the fact that there are a lot of common interactions that you do with the space. The common one is the two operations that you see here. So, one of them is how do I get notified when something happened in the space, a new message arrived or something like that. So, two ways in this space to do that, either I do a blocking take or I get notified, register an interest in a certain data that I want to be notified when something changed. So, we know users use this all the time and we wanted to provide components that already do that. They already provide the ability to perform all these operations. Same goes for,"write". So, I am writing my business logic. I want to get all the features of the space, but without really... there is less interaction as possible with the space just because we like it.
So, this is the first thing that we did. We actually built components within Spring. We extended Spring. Anybody knows here JMS, JMS support in Spring 2.0. So, it is very similar to that. So, we have containers and they notify your business logics, which is usually unaware of the space. You perform an operation and the result is usually written back to the space. We will see how we do it in a minute. We will see some good examples. By the way all this stuff in under a project that we call Openspaces. Now, one of the reasons why we call it Openspaces is because the source is probably going to be included to this project. The main reason behind it is these are components that we found out that are the most common ones. We know that our users use this, will probably start to enhance our components. They will start to build their own components, and we hope to get a community around common usage patterns when using GigaSpaces.
So, what do we gain by doing that? We talked about it. So, testability, this is a very important aspect. So, when I try to test my application, the first thing that I want to do is mock testing or unit testing. So, I want to get my business logic and see that I can order a process or trade correctly without any interactions with the space. The next level of testing is integration testing. So, I want to be able to take this processing unit if you want to run within my IDE using Spring Mock or some other... or JUnit test or TestNG with Spring Mock and just run my Spring application content and see that things are working. The next level is then I want to run the actual processing unit as if it will run in my production system within my IDE, so I will be able to debug it and test it and see that things are working, using different topologies, primary partitions, backup, and the last one is that then I will be able to deploy it either locally on my machine or within my production, pre-production environment.
So, how does this work? So, usually when we try to design an application and that is not necessarily a GigaSpace or space-based application. The first thing that we usually do is we define our domain model. This is the core of our application. This is what we are going to talk to our analyst, our business users and this is the language that we interact with them. So, we build our POJO domain model. The second step that we do, and this is actually the POJOs that are going to be written to the space. So, GigaSpaces support the POJO model in terms of writing objects, pure Java objects to the space. This would be our data. This can also act as messages. So, the data is the message as well.
The next thing that we usually do is we define our business services. Pure business services that only know or are aware of how I am going to process my business logic. I am getting two orders, for example. I am getting orders to my business logic and the result of it is a trade, so I want to write a business logic that does that, and then we want to start worrying about how we are going to wire all these things together. So, how are we going to wire my domain model, my interaction with different space aspects, the in-memory data grid, the messaging, how am I going to add transaction support for it, things that are going to have to work under transactions sometimes, and the last step is packaging. So, I want to take this thing that I did, assemble it together into a processing unit, archive if you want, and then basically bundle it together and be able to deploy this thing either within my IDE, if I just want to run within my IDE or take it and deploy it into our Application Service Grid or something like that without changing the code because we are aware of the fact that the steps that are required from the processing unit, leaving your IDE and getting to productions are a lot of steps and we allow for things to be changed externally. So, using the same package, the same processing unit and you just move it from your IDE to mock testing, to pre-production and then to production just by changing the external parameters.
So, let us have a look at our POJO. So, this is a very abstract example. It does not have any business nature to it because we try to focus on our features rather than any business reasoning behind it. So, we have a very simple class called data. We market it as a space Class, this stuff, this space that is going to be an object that is going to be saved in this space. We give it an IDE. You do not have to give an ID to the space, but if you want to perform updates, for example, then we need to know what the ID is in order to perform the updates themselves, and another interesting thing is the space routing. Now, the first thing that a lot people ask is,"Okay, I have got 50 GB of data", you will not be able to handle them within a single JVM, so the way to solve it is by partitioning the data. So, we take the data, we partition it, usually our business logic is encapsulated within the processing unit and handles only the part of the partitioning that it works with and the way that we decide how to partition the data is usually by the user giving us a hint about how to partition the data. So, this example uses a very simple long type that tells us how we are going to partition the data and the data object to which partition it is going to be written to.
Let us have a look at our business logic if you want. Our business logic is very simple. You just process the data. So, it gets the data object, sets a flag that says that it was processed and returns the result. Of course the business logic here can be much more complex. This is just an example. The question here is, of course, you can have a look here that there is no notion of a space here. So, this can be very easily tested if you want within when we are using mock testing or something like that. Of course if you do need a space API, we provide the ability to interact with the space interface and work with it.
So, what is this process data? It sets the process data to false, the question is how do we hook it up, how do we tell the different components, which you will see in a minute that this will act as some of kind of a listener or a listener to a data event that is initiated by the space. In this case, we are using annotations. So, we use annotations in order to mark it as an event. Of course, we support XML of all declarations where we define which method names are going to be and we also support our concrete interface that you need to implement. So, it is up to the users, and this is one of the things that we are trying to achieve. It is up to the user and the user decides, the power is in the user"s hand to decide what to use and how to use it.
So, this is an example of Spring XML based configuration and the question is how do I hook my business logic into the fact that some change has happened. So, what I am actually trying to do is I want to be notified. I want my listener to be invoked whenever a data object that is not processed gets into the system. So, first of all I define my data processor. The second thing, I am using one of the two different containers that we provide as these common components. The name of this one is Polling Container, which actually uses a blocking take in order to listen to changes that are happening within the space. We give it a template. The template in this case is a data object that its flag is set to false, which means that this container will notify our listener whenever a new data object is written into the space with its flag set to false and we define our listener over there. In our case it is an annotation adapter. So, we are using an annotation adapter because we annotate it in our class with the fact that it is going to be invoked by this component.
Another important thing here is we give it a reference, the GigaSpace part on the right hand side. We give it a reference on what space it is going to work on, what is the space instance that it is going to work on, sorry. So, let us have a look at a real example. Is the font okay, can you read it? Is that okay on the back? It is time to show off the IDE, okay. Okay. So, we took a typical data example, usually when you work with a processing unit stuff, you have got two aspects. The first one is the processor, the one that processes the data. We hook up the events. We listen to them. The other one is the feeder, the one that writes the data into the space. It is important to understand that the processor usually works with its own local space, so if it is a partitioned space and we have 10 of them for example, each processor will work against its own local space, that way we will get all these nice things of high-performance and low latency because everything is in memory. The feeder itself, the feeder does not know where it is going to work with. Now, if it is going to work with that partitioned space with primaries and backups and how many instances, so it is writing to our clustered view of the space.
Let us have a look at how this thing is happening. So, we show the examples. Just want to go over this XML, so we are using the Spring XML support. So, the first thing that we want to use, we want to define is, we define a space. So, we define what is the space, what are the characteristics of this space. Some examples if it is going to work in FIFO mode or not, is it going to be persistent, a lot of the different aspects that you can configure our space with. The second thing that is defined is our own interface that is written on top of the JavaSpaces spec. So, the JavaSpaces spec was written in 1999 or something like that and now we have Generics. We have POJOs. We have declarative transactions. Thanks to the fact that we are working within Spring environment. So, we provide our own simpler interface on top of this interface, of course you can always get it and work with it or we provide a simpler interface that works with POJOs, does not have explicit transactions or integrates seamlessly with Spring transactions and uses Generics, for example, so the template that you give in Generics, you get the results back, so you do not have to do castings and so on.
And this is the Polling event container that we talked about. Now, I am a developer. Let us pass the unit test and integration test. I just want to run this system in my environment. So, what I am going to do is I am going to start a primary backup. Now, notice I did not define anything. I did not define within my processing unit the fact that it is going to run in a primary backup mode or something like that. Everything is external and it is up to you to decide how it is going to run too. So, I have here many examples running it in a partition mode, I will try and run it in a primary backup scenario. So, I will start with two spaces, one of them running in a primary mode, the other one running in a backup mode. While it is loading I want to show you the feeder part, so we wrote the feeder as another processing unit just for this example, but basically the feeder can be any Spring application content. So, you can run our own components and everything within Spring, very simply. Usually this will happen when you interact with a remote space. So, when you interact with a remote space like a feeder, then that is the case where you use a Spring natively. So, you just deploy it in Tomcat or something like that. There is a web application that processes the trade and you just write the trades or the orders into the system.
So, here the XML, the Spring XML that uses it is very, very similar. I use as a space, but in this case it looks up to space in a Jini mode in terms of look up of the actual cluster of the space and it will get all the different clusters together. We define the GigaSpace interface, which is the simplified interface that we have and then we interacted it. So, in our case, we have a simple feeder that starts up and every minute writes a data object to the space. Now, you can see data is a simple POJO, and we are using our interface, which just uses"write" and that is it. So, transactions, POJO support everything is done automatically.
So, let us start to backup itself. So, we are running both a primary and a backup, and let us start the feeder. Of course, running it within the IDE uses our own integrated processing unit container, which basically means that you can take it and run it within Tomcat. You can use that in order to run the processing unit. So, it can integrate with other hosting environment. Actually our own Application Service Grid uses this container in order to start the processing unit and deploy it. So, if we start the feeder, the feeder simply prints out that it is writing things, data objects into the space. It has a lot of other features. If we have time we will go into it, and we can see here that the primary itself is processing the data, and the backup is not doing anything. So, if we have a look at the space itself, you can see here that we have operations running again in this space, many write operations without the feeder writing things to the space, and we have take operations, which are basically our polling take container that performs all the takes in order to take objects from the space.
A nice example to show is if we kill this one, we will see that now we do not have a space here so this space basically went down and then this will happen, okay, and then you can see that basically the backup started to process everything. Immediately it started up the Spring containers, started it up, started the different containers. They are pulling event containers and so on, and it is now processing all the data, and the data feeder itself, it does not lose anything. It keeps on writing to the space. So, this is an example of our primary backup scenario.
Now, let us do an example of taking this processing unit, I have not changed it, and now what I want to do is take this processing unit and actually deploy it into our application service grid if you want. This is the service what Nati talked about that knows how to read different SLA"s requirements and knows how to perform relocation based on failure or based on other watchable things like CPU or memory. So, the fact that we tried to keep everything as a self-sufficient unit, we actually defined the SLA within the processing unit. Now of course, this can be overwritten on the outside when you actually run the deploy command, but just for simplicity you can define it here as well. So, when you run the"deploy", you can specify all these parameters.
So, here, for example, we are running two instances. Each instance will have a single backup and we are running with a schema that is our partition seem to backup. This means that we have two partitions, each partition has it own backup. They are sync between them and the nice example here, the number of instances per VM is one and this is a really cool flag because it basically says, I do not want to run my primary and my backup within the same JVM because it does not make sense. If the JVM fails then both of them would fail. So, this flag basically says when you deploy the system, make sure that you are running a primary and a backup in a different JVM or within a different container.
So, let us start our own container. So, in this case, we start two containers and then we will start playing with it. Sorry. Let us start our UI again. I closed it by mistake. Okay, so here you can see that basically what we have here is we have our own monitoring system and you can see here on the bottom that we have two GSCs, two containers running. You can see their utilization, CPU and memory. You can see all of them here as well. We have two containers, one manager, and what we want to do is now take that processing unit that we defined them and we want to deploy them into our system. So, we have a simple Ant command that basically deploys. We will deploy the first one. We will deploy the processor. See if I got it correctly. So, we are now deploying the processor into this Application Service Grid. So, you can see here basically that it starts up how many spaces, it is going to start two spaces for each... two instances of a space if you want within each GSC, one is the primary for the first partition, the other one is the backup for the other partition and the other way around in the second GSC.
So, here you can see that basically what we can see here that we have two instances PU1 and PU2 of data processor, one of them is the backup, the other one is the primary. This resolution sucks. And you can see here that we have four spaces running within our system. All of them act as a single virtualized space. This is the important part, and you can see that within the processing unit we do not know about it, so I run it within my IDE as a primary backup and now I am moving into a more real-life scenario of partitions seem to backup. So, we have these two and let us see what happens when we deploy our... I am connected to the Wi-Fi that is the problem. Okay, so basically what we see is we deployed our feeder into the system and this thing now starts feeding red eye into the system and it starts to get processed. The problem here is that I do not see it really nice, so what I can do is I can basically start another GSC now and now you can see that we have three GSCs and I will say, "Okay, I am seeing the processing itself and the feeder itself. I am seeing both outputs". So, basically what I can do is I can take this one and just move it to the other GSCs, and now it is going to automatically relocate this processing unit, in this case the feeder and now you can see that the output of the feeder move to this GSC and the other GSCs are just saying I will process the data.
And now one of the final examples, which, suddenly I have Wi-Fi on, I forgot to turn it off, so it might break, but let us see if it works. So, the other important thing is that if we kill one of them, so this is what I am doing now, I am killing one of the GSCs, let us see what happens. So, basically here you can see the Service Grid automatically detecting that the fact that something failed. It is relocating all the other services because we have another GSCs open. It is only running the feeder, so we can basically relocate all the other processing units that failed on the first container and move it to the other one and so this is... and you can see that we continue to process the data. This one died, and that is it. spaces are still coming up.
Now, here in terms of the UI itself, you can see here that we have two processing units and you can see that we have two instances of the processing unit, each actually representing a primary and a backup and we have the feeder itself.
Nati Shalom: Shay we need to move to questions so let us wrap it up.
Shay Banon: Yeah, do we have time?
Nati Shalom: That is it.
Shay Banon: Okay. So, just in terms of the other companies that we have within the system... let us go back to the presentation. So, you saw the... sorry, you saw the polling event container. The other container of course that we have is a notify event container, which basically uses notification in order to notify a bean. An important part of it is that I am still writing my business logic in the same way and the fact that I am using a polling event container or the fact that I am using notification it is really up to my business case, but I am still using the same one. So, I am basically using configuration. I can change it from a state where I am using polling event container to a notify event container.
Another cool feature, which we do not have time to show, but it is really nice, is that we took the same mechanism. The fact that we can put our own polling containers or notify containers, we actually implemented a Spring Remoting on top of a space. So, how many here know about Spring Remoting? Okay, so basically with Spring Remoting you can take a service and expose it without it being known that it is going to be exposed and expose it declaratively to run using JARCS, RCP or RMI or Web Services or XFire and so on. So, what we did is we know the space, we liked the space, we know it is a full tolerant. If we use partitioning, we get automatic load balancing, so what we did is we very easily implemented remoting, so you are using this same remoting services. You are using interfaces in order to invoke your method. The Middleware itself is the space now and it is not HTTP or TCP or something like that. The invocation itself goes through the space and goes through the correct processing unit and we can process it easily. So this is really nice as well.
Nati Shalom: Okay, so I want to wrap up right now, and I just want to show the latest slide about what we are doing with Interface 21, so all that work is done collaboratively with several customers as well as Interface 21, which is deeply involved in that, and again, it will be provided as an Open Source kind of a project, an Open Source community. It is not Open Source in the same way that you used to think about Open Source. It is more open source as a way for us to reach out to the community and give people the ability to contribute to that project and add their add-ons, add their own capabilities into that framework that is pretty much the idea behind it, and I think that we have pretty much seven minutes to kind of get questions and hear it from you while you think about it, and if you have any questions that is pretty much the time.
PRINTER FRIENDLY VERSION
|