Discussions

News: Using Event-Driven Architectures inside a JVM

  1. Using Event-Driven Architectures inside a JVM (12 messages)

    Gregor Hohpe has written about a project he was recently on that took a "rather lengthy stored procedure" to an event-driven architecture. He discusses the solution, and some emerging patterns that came from it.
    Our client had a rather lengthy stored procedure to aggregate values across a series of data records and compute "scores" from these aggregates using various look-up tables. The records in the database represent essentially a collection of user input over time. Our client was facing two main issues: first, the stored procedure was becoming unwieldy due to new requirements for the support of new types of products and scores. Second, the batch-orientation of the stored procedure caused response times to be uncomfortably slow. While the data access part of the procedure was naturally efficient, more and more business logic inside the procedure started to bog down the database server.

    We addressed the first problem by converting from PL/SQL to Java code, which allowed us to structure the solution using object-oriented constructs. We targeted the second problem by processing user events as they occur instead of waiting for all events to be collected first before starting to compute aggregated values. In order to allow for future flexibility we developed a set of self-contained components, called "calculators", that we could connect to one another via event channels. Each calculator would subscribe to a set of relevant event types and in response publish events of other types. This this loosely coupled architecture enabled us to compose new solutions from existing calculators quite easily, sorta like Lego.
    Read more: Look Ma -- No Middleware! Using Event-Driven Architectures inside a JVM.

    Threaded Messages (12)

  2. Interesting approach[ Go to top ]

    I've used a similar approach in the past and it can work well. The primary limitation is when a calculation requires a large chunk of data, like say 30 million rows. Obviously, it's not practical to load that much data and aggregate it. In those cases, I find a combination of event driven + summary tables/OLAP provides a nice solution. In the more extreme cases where a system has to aggregate a ton of data in semi-realtime (sub minute) and the data stream is constant, the system has to incrementally recalculate. Obviously there are cases where it's not feasible. Calculations like duration, and median that require the entire dataset, a distributed approach like java spaces might be better. ultimately, the size of the dataset determines whether it should be done in the database. Though running a stored procedure to calculate the median constantly will likely overwhelm the poor database.

    It's interesting that some people are choosing to by pass the OLAP route for aggregating data.
  3. Event-Driven & HiveMind[ Go to top ]

    Can't help but plug HiveMind (http://jakarta.apache.org/hivemind/) here.

    In HiveMind, connecting services together via event notifications is built in. When building a service, you may specify another service that produces events that are listened to, i.e.:

    <service-point id="EventSender" interface="...">
     ...
    </service-point>

    <service-point id="EventConsumer" interface="...">
      <invoke-factory>
        <construct class="...">
          <event-listener service-id="EventSender"/>
        </construct>
      </invoke-factory>
    </service-point>

    Details here

    Only the service implementation has to implement the event listener interface, the service interface does not have to extend the event listener interface.

    Many other messaging patterns can be easily accomplished in HiveMind by combining configuration data with services; that is, by contributing services into configuration points. Part of the contribution can be a description of when the service can be used, i.e.

    <handler event-type="auction-ended" service-id="AuctionEnded"/>

    You can imagine a service that receives events of different types, including "auction-ended", then looks in its configuration for a <handler> element with the matching type; the service is then passed the event.

    Gregor mentions "composition ... sorta like Lego" and that's one of the hallmarks of HiveMind development. Being freed from concerns about when and how service objects are instantiated and configured (especially in a multi-threaded environment) is very liberating, making approaches like Gregor's completely practical in a wide range of situations.
  4. There are lots of tricks you can pull to call a long running stored procedure asynchronously both in .NET and Java but is this the right approach. I think the more elegant approach is to calculate every thing at the batch-oriented mode if the calculation is so demanding. Of course it requires some redesigning of the application and how user interacts with the system.
  5. There are lots of tricks you can pull to call a long running stored procedure asynchronously both in .NET and Java but is this the right approach. I think the more elegant approach is to calculate every thing at the batch-oriented mode if the calculation is so demanding. Of course it requires some redesigning of the application and how user interacts with the system.
    From first hand experience, batch mode fails if the granularity needs to be at a transaction level. I'll use pre-trade compliance as an example again. Say I have a constant stream of transactions for buy/sell securitites.

    If the system has to make sure trades do not result in violation of SEC regulations, performing the analytics in batch mode would mean the calculations are neither accurate or timely. The result of processing analytics in this fashion would result in heafty fines from the SEC. Many of the current compliance systems I know of do not perform pre-trade compliance for that reason. In fact some of the biggest firms still only do over night batch processes for complete compliance validation. Very few shops actually do partial pre-trade compliance due to the nature of regulatory rules and the analytics required.

    One tricky part of running fairly complex analytics with stored procedures is it impacts the server. Say you have a rule that says, "an account cannot exceed 5% of the account's total value for any given issuer." An individual account may invest between 20-60 individual securities with a mix of stocks, funds, bonds and etc.

    performing a simple calculation means the database has to perform what is called a "look through" for all funds and all fixed income holdings that have multiple issuers. These kinds of analytics can be done in stored procedures in a batch fashion, but often they take a long time. Long as in 30minutes to several hours depending on how many accounts a database has and the number of securities in each account.

    I know of one company that claims to do pre-trade compliance using just stored procedures, but from what I know of business process rules and business process systems using Sql, it doesn't scale well. There are many cases where it is appropriate, but for the more challenging cases, it fails miserably.
  6. From first hand experience, batch mode fails if the granularity needs to be at a transaction level. I'll use pre-trade compliance as an example again. Say I have a constant stream of transactions for buy/sell securitites.If the system has to make sure trades do not result in violation of SEC regulations, performing the analytics in batch mode would mean the calculations are neither accurate or timely. The result of processing analytics in this fashion would result in heafty fines from the SEC. Many of the current compliance systems I know of do not perform pre-trade compliance for that reason. ............There are many cases where it is appropriate, but for the more challenging cases, it fails miserably.
    Peter, our firm is facing a similar problem but we also have to handle multiple trading markets (but not the US market as this moment) as well. Our analytics usually take under a minute to complete for an acocunt with 10-20 securities so we can afford to recalculate it every time. I am curious to know how many orders your system is designed to handle and what sort of analytics are required. Our analytics are quite simple for the moment, using simple discounted rate formula to calculate risk and exposure.
  7. analytics[ Go to top ]

    Peter, our firm is facing a similar problem but we also have to handle multiple trading markets (but not the US market as this moment) as well. Our analytics usually take under a minute to complete for an acocunt with 10-20 securities so we can afford to recalculate it every time. I am curious to know how many orders your system is designed to handle and what sort of analytics are required. Our analytics are quite simple for the moment, using simple discounted rate formula to calculate risk and exposure.
    Generally, account level analytics like total market value take less than a 50ms using OLAP, anaytics package like tibco or something home grown. the harder analytics are related to "look through", duration calculation and historical analytics. I've seen some crazy analytics for risk/compliance that compare relative delta to historical delta's. Typically 2A7 and 1940Act are easy until it is applied to an entire firm. I can't really say the real target is, but the original wish list was for 10K tps with 4 CPU servers. That was wishful thinking and we had to run several series of benchmarks with COM+ and OLEDB to show the max throughput on 4 cpu server.

    the normal "weight" of a given issuer, GISC taxonomy or country is straight forward, if the positions aren't constantly changing. The harder thing to do is when large batches of orders come in, and the system has to consider regs, firm wide and account rules within the same validation process. some of the older systems process compliance procedurally and suffer from poor implementation. An example of 2A7.

    3/4 of the account cannot exceed 10% exposure to any issuer.

    what some of the older systems did was it calculated the exposure for every issuer in an account. It then sorted the weights and checked to see which ones exceed 10%. Obviously, there's faster ways of checking this particular reg rule. When the rule is applied to a firm as a firm wide exposure rule, there might be 5K issuers and 2million rows of positions. running this for every single transaction would be costly to say the least :)
  8. There are lots of tricks you can pull to call a long running stored procedure asynchronously both in .NET and Java but is this the right approach. I think the more elegant approach is to calculate every thing at the batch-oriented mode if the calculation is so demanding. Of course it requires some redesigning of the application and how user interacts with the system.
    This was how the app was working when ThoughtWorks was hired. In this particular case the user required direct feedback as the new data became available and the calculation progressed. When all data was available the calculation had to almost immediately return.

    The design you're outlining is a valid one, but alas! it could not be used in this case.
  9. I really apprechiate event driven design-- I've been playing around with some handheld development in C# which uses combinations of 'delegates' and 'events' to basically accomplish template based method reflection:
    public void ListenForEvent0(Object Source, Event event) { /* do something */ }
    public void ListenForEvent1(Object Source, Event event) { /* do something */ }
    public void ListenForEvent2(Object Source, Event event) { /* do something */ }
    this.EventListeners += this.ListenForEvent0;
    this.EventListeners += this.ListenForEvent1;
    this.EventListeners += this.ListenForEvent2;

    // notify all listeners
    this.EventListeners(Source,Event);
    I'm trying to find the best way to accomplish this in Java while injecting the Filter/Chain of Responsibility Pattern:
    // action method signature:
    // public (void|String) name (Event, EventContext?)

    void OrderController.validate(Event, EventContext)
    {
      if (!EventContext.hasRole("order")) EventContext.finish();
      else EventContext.continue();
    }

    void OrderController.exception(Event, EventContext)
    {
       try { EventContext.continue(); } catch (Exception e) { .... }
    }

    void OrderController.store(StoreOrderEvent, EventContext)
    {
       EventContext.finish(success ? "pass" : "fail");
    }
    I'm still trying to figure out the details of EventContext-- if I should make something that's more controller specific (view behavior) or take more of an AOP approach, treating each method as an interceptor and visit or filter based on method signatures....
  10. Using the architecture in the article, how would you introduce calculator inter-dependencies? A simple example would be validation of event state before persisting the event state.

    In the article's architecture, it would seem that you would be able to determine process order by finely grained event types-- which could lead to uncessary bloating?

    Suggestions or I missed something?
  11. No Middleware...[ Go to top ]

    Hm, while the architecture itself works, I wonder about the possible consequences. In a real world example you might have various JVMs all doing the same job, some consolidation needs to take place, surely? Meaning concurrent access to a persistant storage? XA Transactions? Or will data that can't be stored just get discarded? If not, where are the safeguards? At what cost? How does it scale?
  12. No Middleware...[ Go to top ]

    Hm, while the architecture itself works, I wonder about the possible consequences. In a real world example you might have various JVMs all doing the same job, some consolidation needs to take place, surely? Meaning concurrent access to a persistant storage? XA Transactions? Or will data that can't be stored just get discarded? If not, where are the safeguards? At what cost? How does it scale?
    Let the Database deal the Transaction. Isn't it's the best place? Besides that we are talking about fire and forget kind of scenario, injecting transaction at the middleware doesn't make lot of sense in this case.
  13. Middle-tier is by definition is layer
    between data source(es) and several clientS.
    Analytical tasks (tons of DB data + small result out)
    and batch tasks (tons of select-update inside DB)
    does not requires midle-tier. To shovel tons of data NOT
    in data layer is not right in most cases both for performance
    (network fee, even for the same machine)
    and for integrity.
    The urge to use OO makes it even worse - you need OR
    while data in RDB.

    In enterprise level midlle-tier is not
    the ONLY access path to data, so any Java/NET greates
    "calculators" will be inconsistent if flat-file batch
    update arrives. Triggers and views with scheduled update
    are the best way to handle such tasks.

    Of course, if you have single java application as the only
    data access and do not like your database - use
    java solutions...

    Alex V.