Owen Taylor on 'Space Based Architecture'

Discussions

News: Owen Taylor on 'Space Based Architecture'

  1. Owen Taylor on 'Space Based Architecture' (28 messages)

    Owen Taylor has written 'Space Based Architecture' - an implementation of 'TPC,' where 'TPC' stands for transparent partitioning and co-location of events, work, and data. It's a rationale for the use of a JavaSpaces architecture. Owen works for GigaSpaces, so it's not quite a neutral treatment, but covers many of the core issues. Owen says that many applications use databases to represent intermediate state, which adds runtime overhead and extra code to maintain, to map the state back and forth to the database. Viewing instances where databases are used to store checkpoints for data that has no long-term use in the system, he says, "This use of a database as a state-management solution 'guarantees' reliability of state during a business transaction but at the cost of extra code and extra connections to perhaps the wrong kind of resource." He discusses the benefits of using a space-based architecture - Gigaspaces in particular - and says:
    As enabling technologies, our implementation of the TPC-based architecture utilizes:
    • Mobile code through a code-base enabled master-worker pattern where tasks containing work can change over time and be routed to the optimal location based on their state (allowing transparent co-location and partitioning)
    • Intelligent, Adaptive, Service deployment and runtime management (allowing transparent yet deliberate co-location through our declare-able clustering/co-location of managed services [not to be confused with our space clustering]).
    • An easy to configure and highly abstracted clustered JavaSpaces implementation. (allowing transparent yet deliberate partitioning and co-location of state, events and work)
    Note the use of "Transparent" seems to focus on the developer role and to what is visible to the author of the compile-able code artifacts. "Deliberate" refers to the author of the non-compiled code such as the various XML descriptors used to wire services together. Orthogonal to the TPC-based architecture we have additional features and benefits including:
    • Integration with Spring (allows transparency of implemention)
    • JDBC, MAP, JMS, HttpSession, etc APIs (allows increasing transparency of implemention)
    • Space API (allows extremely powerful and simple programming model)
    • C++, .NET support (allows increasing transparency of language)
    • policy-driven dynamic service management (allows additional scaling and fault-tolerance for the entire system)

    Threaded Messages (28)

  2. Good points. Although you don't need a java spaces api to do it. Many applications are currently faced with two options for intermediate state which has no long term future from a persistence point of view. This state is application state with a short life time, it's not business data like a bank account or transaction. A great example would be http session state or state associated with a session but not stored there. 1) Store it in memory and lose it when they die 2) Store it in a database where it's expensive to persist. What they need is are choices or alternatives to these for this intermediate state. Products like GigaSpaces or IBM's ObjectGrid offer options above and beyond the two default options. ObjectGrid allows you to trade performance for durability, i.e. you can have no remote replication, synchronous replication or asynchronous replication. This allows you to choose the trade off for performance that best suits you. The application simply stores the intermediate state in the grid from where it can be retrieved later or shared with other applications. Great points in the article, but it's the pattern thats interesting. Download ObjectGrid here. This is a version thats works in a J2SE environment.
  3. Products like ..
    .. Tangosol Coherence, the original clustered cache, information fabric and object grid ;-) Peace, Cameron Purdy Tangosol Coherence: Clustered Caching and Data Grid for Java and .NET
  4. 1) Store it in memory and lose it when they die
    2) Store it in a database where it's expensive to persist.
    3) Store it in memory and let an J2EE Application Server replicate it to the other servers in a cluster (or a small Replication Group to avoid communication overhead). http://www.enterpriseware.eu
  5. Products like...
    GemFire Anyway the combination of DataGrid and Computing(services) grid is highly powerful, capable, and refreshingly simple. Too bad that GigaSpaces does not provide Development version ( 2 or 3 nodes limit without rights to deploy on production) of Enterprise Edition of GigaSpaces product for developers to play with. Community edition is not appealing because it does not allow to play with the most interesting and powerful features of the product. GigaSpaces if you listen, please provide node number bound license for developers, not time limited one.
  6. I second the motion; the Community edition was interesting to play with for only a few days. A limited nodes development version will keep GigaSpaces on my hot plate. I spoke with a GigaSpaces sales person about this and the response was, after 30 days show me an invoice and then we can talk. This reminds me of BEA, when they played the same game before they offered the developers WebLogic version. Ask BEA’s advice on this, you may learn something for free... So, GigaSpaces, I think you need help to break the pervasive stateless database mindset. Relying on sales forces with useless white papers and PowerPoint’s doesn’t sound like developers are your target. Developers are everywhere; let’s help each other to get JavaSpaces out there!
  7. A limited nodes development version will keep GigaSpaces on my hot plate.
    Not a bad idea. We'll look into doing this. In the meantime, though, you should know that people ask us for and receive extensions to the 30-day eval period all the time. If you want an extension write to info-at-gigaspaces-dot-com. We also try to keep things developer-friendly with a Community Edition, wiki, discussion forums, etc.
    let’s help each other to get JavaSpaces out there!
    Our biggest contribution to the Jini/JavaSpaces community, IMHO, is that we provide a commercially successful, enterprise-grade product. Developers who work in top Wall Street banks now use JavaSpaces (via the GigaSpaces product) in their firms' mission-critical applications. In several of these firms our product now has the status of "core technology." I'd say that's a pretty big way of helping to get JavaSpaces out there. Geva Perry GigaSpaces http://gevaperry.typepad.com
  8. 1) Store it in memory and lose it when they die 2) Store it in a database where it's expensive to persist.
    Billy -- You captured well one of the pains that a space, and potentially other caching technology, can address today. I believe, however, that Owen is trying to extend the discussion beyond that point, i.e., how does maintaining reliable and virtualized in-memory data and messaging allow you to change the current architecture? Is it just a complementary technology? Does it replace existing database technology? Owen is trying to make the point that introducing a space-based model under the context of transaction processing applications can significantly change the typical tier-based approach used with these applications. In other words, we're not just talking about replacing one tier with another, but replacing the tier approach with a more SOA-like approach, which is geared towards high performance transactional applications. With a space-based approach you could potentially get rid of a major part of the current middleware stack in both messaging and data, and replace them with a "virtualized" middleware stack. The space addresses the "how", i.e., it provides an alternative approach by introducing a new middleware stack in which messaging and data are built on top of the same technology, API and cluster. Space-Based Architecture (SBA) defines the pattern of that new middleware stack, and how to build the entire application in a way that will provide linear scalability. Nati S CTO GigaSpaces Write Once Scale Anywhere
  9. Re: Owen Taylor on 'Space Based Architecture'[ Go to top ]

    I believe, however, that Owen is trying to extend the discussion beyond that point, i.e., how does maintaining reliable and virtualized in-memory data and messaging allow you to change the current architecture? Is it just a complementary technology? Does it replace existing database technology?
    Interesting positioning. I ask what is the 'break-down point' of this approach. e.g. with a tradition RDBMS tier, disk data transfer performance (up to 3 orders of magnitude slower than memory data transfer performance) keeps that particular tier's break-down point very low - and hence considered in-appropriate/expensive for transient-state data (I do note the other reason being language impedance). At what point does Memory Cache + Messaging break? How much data can be handled? What is the infrastructure required (Gigabit or better network)? How far the co-located tiers be - in same rack, same room, same city, across continents? How many concurrent transactions at various isolation levels be handled? What are the costs of all of these? When should I rely on Gigaspace's messaging rather than RV, or something grown over JGroups? If my objects are a few MB in size, and cache is replicated to box in another city for business continuity, what happens to the transaction performance of Gigaspace? Without these kind of info, it is difficult to identify the right solution. I have suggested this before - what if Oracle RAC is available for USD 500 per CPU rather than USD 40K? Finally, where are the open source implementations of Java Spaces - to day Java Spaces means Gigaspaces to me. But why can't Arjuna's transaction handler and JGroups be combined to deliver transactional spaces - which are 'good enough' for certain situations where open source appeals.
  10. Re: Owen Taylor on 'Space Based Architecture'[ Go to top ]

    At what point does Memory Cache + Messaging break? How much data can be handled? What is the infrastructure required (Gigabit or better network)? How far the co-located tiers be - in same rack, same room, same city, across continents? How many concurrent transactions at various isolation levels be handled? What are the costs of all of these?
    If we follow the logic that old technology (e.g., RDBMS) will never be replaced by new technology because so much has been invested in the old, we would never see any innovative technology being adopted, which is clearly not the case. When the pain reaches a certain degree and the new technology can produce significant enough value, the forces of change overcome the forces of resistance. In our case, it's not a matter of cost, nor a matter of having an in-memory approach vs. a disk-based approach that drive customers to use space-based technology. It is the architecture that matters. With traditional approaches, we used to think in either messaging terms or database terms. In reality, almost every application requires a combination of both: we use messaging to synchronize state, we use a database to share state. Developers need to ccordinate the two. That's fine as long as we can use a centralized approach. Once you start thinking of scaling-out, i.e., scaling by adding more boxes, this model breaks. It breaks because each tier is built as a stand-alone with its own clustering and high availability models, and in many cases, from a different product. Bundling different messaging and database solutions within the same application context and trying to make that linearly scalable will not work, because of that inherited limitation and complexity. Early adopters of space-based architecture are those who need linear scalability and couldn't find alternatives that meet their requirements. In the past few years, we've broken the barrier of how far an application can scale out by repeatedly showing that we can store terrabytes of data, scale linearly and with higher reliability than traditional approaches. Think about hot fail-over, for example, where you need to switch to the alternate node instantly upon failure. In a stateful environment, that means that the data needs to be there as soon as the "switch" happens, so that you could continue from the exact point of failure. The fact that all the data is stored in-disk doesn’t help too much in such cases, because if it's not already in-memory you will not be able serve the request. I could easily continue and address the different issues that you would face with the traditional approaches, however, I think that the best evidence that we're hitting a major problem, and that SBA is the solution, is the adoption rate and level of investment organizations are putting into this technology. We're increasingly seeing 7-figure deals happen because our customers view this as a strategic investment in a better approach. Nati S.
  11. If we follow the logic that old technology (e.g., RDBMS) will never be replaced by new technology because so much has been invested in the old, we would never see any innovative technology being adopted, which is clearly not the case. When the pain reaches a certain degree and the new technology can produce significant enough value, the forces of change overcome the forces of resistance.
    The zener diode affect. :)
  12. If we follow the logic that old technology (e.g., RDBMS) will never be replaced by new technology because so much has been invested in the old, we would never see any innovative technology being adopted, which is clearly not the case.
    Yes, but if the logic was that every new technology that came along and got backing from some major company, would eventually replace the preeceding technology, then we would have already forgotten about RDBMS, or maybe we would remember it the same dim and foggy way that we remember the hierarchical, or vsam. There was a lot of big money deals signed for OODBMS as well. Someday, someone, will invent something that will replace RDBMS, that is probably something everyone can agree on. If that has already happened, or what it is, is up for debate.
  13. Yes, but if the logic was that every new technology that came along and got backing from some major company, would eventually replace the preeceding technology, then we would have already forgotten about RDBMS [..] Someday, someone, will invent something that will replace RDBMS, that is probably something everyone can agree on. If that has already happened, or what it is, is up for debate.
    The RDBMS will undoubtably still be around and in common use ten, twenty and probably a hundred years. IMHO There is no technology today or on the horizon that is in a position to challenge that status. Within that context, continuously available, highly scalable partitioned in-memory data management solutions like Coherence do not replace a database, unless the database should _obviously_ never have been used in the first place. What we do is process transactions, analytics and events in memory, using some portion of the large amounts of memory and network bandwidth that are available to a dynamically scaled-out server cluster, and with resiliency provided by explicit synchronous redundancy. The end result, in most cases, is that the information is managed for the long term in the safest and most accessible place, which is often an HA RDBMS with an HA SAN for its underlying storage. And while it may be a "niche" market, that "niche" (our customer base) is a big chunk of the Fortune 500. ;-) Peace, Cameron Purdy Tangosol Coherence: Continuous Information Fabric
  14. I think Cameron's hit it on the head. Applications are sometimes forced to use a database because it's the only choice. ObjectGrid and it's competitors are providing alternatives so that if a database isn't appropriate then there are choices for the container to store that state which may scale better, be more cost effective, failover quickly, easier to deploy/implement etc.
  15. The end result, in most cases, is that the information is managed for the long term in the safest and most accessible place, which is often an HA RDBMS with an HA SAN for its underlying storage. And while it may be a "niche" market, that "niche" (our customer base) is a big chunk of the Fortune 500. ;-
    Well, I would say that this setup fits a whole lot of companies that are not even close to fortune 500. Its not a niche, its what everybody use once they can afford it. Now, what Cameron says, i.e. a best of breed cache on top of a rdbms for persistence, is a much more believable story, and much easier to sell to most customer that I know something about, compared to the gigaspaces idea of totally replacing the rdbms. Again, just my 2 cents.
  16. Re: Owen Taylor on 'Space Based Architecture'[ Go to top ]

    Now, what Cameron says, i.e. a best of breed cache on top of a rdbms for persistence, is a much more believable story, and much easier to sell to most customer that I know something about, compared to the gigaspaces idea of totally replacing the rdbms
    It would seem they are trying to put it in its proper place rather than totally replacing it, though it might seem like they are saying that. I've been trying for years to get people to see that, while the db is very important, it is not: 1. The core of the application. 2. A separate system. 3. An integration point. etc. People have a very tough time separating the concept of data from the concept of an Database. Try to get someone to do a Use Case without a line that says "save to database". :)
  17. Try to get someone to do a Use Case without a line that says "save to database". :)
    I wake up every morning and face this exact challenge. In fact, not long ago, the best I could "politically" achieve was to convince a DBA to cache Oracle sequences in blocks of 10k. At least, this stopped the round trips getting next keys on every new record. Now, the next leap is to perform "complete" use cases interacting with objects in memory. This is the divide between objects and database oriented practitioners. I like the "Space Based Architecture" nomenclature, because every time I start talking about users participating in distributed and stateful use cases, someone calls me a "Space Cadet".
  18. Re: Owen Taylor on 'Space Based Architecture'[ Go to top ]

    Now, what Cameron says, i.e. a best of breed cache on top of a rdbms for persistence, is a much more believable story, and much easier to sell to most customer that I know something about, compared to the gigaspaces idea of totally replacing the rdbms


    It would seem they are trying to put it in its proper place rather than totally replacing it, though it might seem like they are saying that.

    I've been trying for years to get people to see that, while the db is very important, it is not:

    1. The core of the application.
    2. A separate system.
    3. An integration point.

    etc.

    People have a very tough time separating the concept of data from the concept of an Database. Try to get someone to do a Use Case without a line that says "save to database". :)
    Yes. Raymond once wrote, "inspired" by Brooks, that: "Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won't usually need your code; it'll be obvious." For many types of applications (most of the business variety I would argue) data is everything, and the most commonly understood way to describe it in detail is using ddl or models of ddl. But, you are right. Data as a concept can, and perhaps should, be independant of the actual mechanism with which it is persisted. But the RDBMS has been so pervasive for years, that some confuse them.
  19. Re: Owen Taylor on 'Space Based Architecture'[ Go to top ]

    Raymond once wrote, "inspired" by Brooks, that:

    "Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won't usually need your code; it'll be obvious."

    For many types of applications (most of the business variety I would argue) data is everything,
    I've done plenty of trying to figure out what is supposed to happen from a data structure. In fact I am finishing one now. Sadly, it only paints part of the picture. It shows the what, but not the how.
    and the most commonly understood way to describe it in detail is using ddl or models of ddl.
    True. But still alot is missing. That is why DFDs and such are needed too. .. that some confuse them. I would say the great majority do. Sadly. But that is just my experience.
  20. Re: Owen Taylor on 'Space Based Architecture'[ Go to top ]

    Sadly, it only paints part of the picture. It shows the what, but not the how.
    I think that depends on how well you know the particular business domain. If you know the domain well enough, then the *how* is self evident from the *what*. But I cant really vouch for if this is generally applicable or not, or if it is only true for the particular domain that I happen to know the best. But I guess it is also a matter of the level of detail you need when understanding the *how*
  21. gigaspaces idea of totally replacing the rdbms
    John I think that were all saying the same thing. I wanted to emphasize that RDBMS is not a solution for everything and there would be a technology shift in that area. I do believe however that RDBMS will continue to play a major role in areas it best suited for i.e. managing large set of persistent storage and that is why we invested in integration with RDBMS quite significantly. The interesting thing that you could make the two work together seamlessly and combine the benefits of the two worlds. You could have the performance sensitive part of your application working with IMDG (In Memory Data Grid), The IMDG will synchronize its state with the RDMS. Other applications could still see the data through RDBMS as if it written to it directly. The fact that the memory can be a reliable store enables asynchronous writes to the Data Base ensuring no data loss, in addition to that the actual writes can be done on the same machine as the Data Base through a Mirror service. Combining the two togather can reduces the synchronization overhead significantly. I hope that this clears up a bit the potential confusion from my previous post. Nati S.
  22. John I think that were all saying the same thing.
    I wanted to emphasize that RDBMS is not a solution for everything and there would be a technology shift in that area. I do believe however that RDBMS will continue to play a major role in areas it best suited for i.e. managing large set of persistent storage[...]
    OK, then I agree with you.
    You could have the performance sensitive part of your application working with IMDG (In Memory Data Grid), The IMDG will synchronize its state with the RDMS.
    Well, there are different kind of performance sensitive actions you could take, some which would benefit from in-memory caching outside of the RDBMS and some which would not. Lets say that you have a long running batch that needs to see and transform 70% of the rows in a 500 miljon row table. Running that through a cache will probably be a lot slower, potentially hog a lot of memory, and negativelly affect all other clients of the cache with a different access pattern. At least that what I would guess, but I am not a cache expert.
  23. Well, there are different kind of performance sensitive actions you could take, some which would benefit from in-memory caching outside of the RDBMS and some which would not. Lets say that you have a long running batch that needs to see and transform 70% of the rows in a 500 million row table. Running that through a cache will probably be a lot slower, potentially hog a lot of memory, and negatively affect all other clients of the cache with a different access pattern.
    Well if you would look at the bottom of our benchmark page you would notice a specific test that is pretty much on the line of what you described. It measures the time to load large amount of data into partitioned Space. The graph below represents benchmark results conducted with 192 & 1022 partitions to hold 150GB & 1TB data. I believe that the numbers speaks for themselves. We actually been involved in many scenario in which the requirement was to speed up real-time analytic applications. We did that by parallelization of the data and the processing, running the processing collocated with the data and that proved to be linearly scalable in both the data capacity dimension (Memory) and the processing (CPU). Nati S.
  24. charts[ Go to top ]

    Nati - Are you sure those charts are correct? With 192 nodes, you were loading a total of 1GB/s? Peace, Cameron Purdy Tangosol Coherence: Replicated and Partitioned Caching
  25. The graph below represents benchmark results conducted with 192 & 1022 partitions to hold 150GB & 1TB data. I believe that the numbers speaks for themselves. We actually been involved in many scenario in which the requirement was to speed up real-time analytic applications.
    Well, those are impressive numbers. I am impressed. But, 26 minutes is still longer than 0 ms, which is the time it takes not to load any data from the RDMBS, which is what you would do (or rather, not do) if you run the batch in a stored procedure. Processing itself is a different matter, and if it is possible to parallelize it, then it might be interesting to move it out of the RDBMS, or even to a different machine, especially if you happen to pay license per CPU to the database vendor. Anyhow, I see your point, and I think that we pretty much agree. Thanks for taking the time.
  26. Is it just a complementary technology? Does it replace existing database technology?
    No, it is a niche solution, and it will remain so. For some types of apps the database was always to slow, and advanced cache solutions has always been used. The biggest problem for any technology replacing RDBMS is replacing the amount of generally available knowledge invested in the technology. Theres a bunch of DBA´s, developers, and business users, out there who knows enough about RDBMS to make most things work. Few companies will invest *everything* in a technology that prevents them from drawing from this pool of existing knowledge and training.
  27. Re: Owen Taylor on 'Space Based Architecture'[ Go to top ]

    Theres a bunch of DBA´s, developers, and business users, out there who knows enough about RDBMS to make most things work. Few companies will invest *everything* in a technology that prevents them from drawing from this pool of existing knowledge and training.
    I am currently facing this issue. It is a sad situation. On the other hand, it presents an opportunity for those willing and able to think outside the box.
  28. Good points Nati All the products on the market do this today to some degree. We all virtualize data on to a grid and allow customers to specify the quality of service for how the data is rendered. Some of us built JMS or JDBC on top of that infrastructure, some of us are adding those capabilities as we speak. Any transparent partitioning middleware does this virtualization, java spaces is one API for that approach, you call it SBA, I called it partitioning. Arguably, WebSphere 6.0 messaging also provides this using their partitioned queues. This allows applications to offer messages to the distributed queue and then consumer can take messages out based on selector expressions, it's very very close to take/offer in Java spaces without the API. Anyway, we need to figure out how to educate the masses or figure out ways to do this without introducing new APIs, i.e. coopt old ones somehow.
  29. Any transparent partitioning middleware does this virtualization, java spaces is one API for that approach, you call it SBA, I called it partitioning.
    To be clear, SBA is not about the API; it is about the architecture. SBA can be implemented with different APIs. In fact, we expose different APIs as part of our product, including JavaSpaces, JDBC, JMS. With our Spring support, we also provide a more declarative approach in which the API becomes different annotations over POJOs.
    Anyway, we need to figure out how to educate the masses or figure out ways to do this without introducing new APIs, i.e. coopt old ones somehow.
    Agreed. And as I said above, that's exactly what we're trying to do. The fact that IBM is also behind these types of technologies definitely helps the cause!