An Introduction to the Pervasive DataRush Framework

Home

News: An Introduction to the Pervasive DataRush Framework

  1. This article on Pervasive DataRush explores how Java developers, tasked with crunching gigabytes of data, can quickly harness the full power of new multicore platforms using the Pervasive DataRush framework. This article starts with a very simple business problem, showing the thought process the developer goes through during the design phase and then provides code snippets showing, step-by-step, how the DataRush framework is used to rapidly build one of the hyper-parallel, auto-scaling components of the application.
    The Business Problem: Surveillance, Search and Compliance Our sample scenario assumes a medium-sized financial institution has tasked its compliance and risk division with building a high-performance information surveillance framework that can be repurposed for many surveillance applications. The first application of this framework will be to the immediate task of detecting individuals on FBI watch-lists and/or individuals known to be associated with money laundering activities (let’s call this the “hit-list”). Bank officials need to be notified within 15 minutes of any hit-list individuals conducting electronic transactions with the bank and they want to know if the activity was clustering in any one geographic area. Data Avalanche In our scenario, fifty thousand audit records are generated every minute by back-end legacy systems that aggregate credit card, ATM and bank teller transactions. Every ten minutes, the data is to be fed to the surveillance application in a delimited text format. The volume of transactions is expected to almost double every year. The hit-list of suspected felons is constantly changing, but averages 1,000 names and aliases. The hit-list is in a database which is updated in near real-time by either the FBI or the bank’s internal fraud department, so you have to pull from the hit-list every time you scan the transaction data (or apply changes to in-memory lookup tables). Further address information about each individual on the hit-list is stored in yet another delimited text file. Access to this file is logged and audited by the FBI on a monthly basis.
    It's worth considering how you'd approach the problem before reading about a potential solution, if only for comparison's sake.

    Threaded Messages (4)

  2. advert[ Go to top ]

    This is just an advert there are many interesting technologies and developments in this area. This describes or illuminates none of them. cheers Robert
  3. Re: advert[ Go to top ]

    I believe the technology being discussed is a Java-based implementation of dataflow networks, unless I misunderstand your comment?
  4. Re: advert[ Go to top ]

    You are right about one thing -- the other solutions are only "interesting" and not optimal. Find a pure-Java engine SDK that does the same breadth of processing at the same speed/scale as DataRush and post the URL. Post the URLs... apples-to-apples comparisons please.
  5. Re: advert[ Go to top ]

    You are right about one thing -- the other solutions are only "interesting" and not optimal. Find a pure-Java engine SDK that does the same breadth of processing at the same speed/scale as DataRush and post the URL. Post the URLs... apples-to-apples comparisons please.
    I have "some" experience with real-time compliance systems, which perform diverfication and restriction compliance rules. This comment may be off, but using a better algorithm and/or approach should dramatically improve the throughput by atleast 3-4x without having to use something like DataRush. Back in 2004 I worked on compliance system capable of running rigorous compliance (aka all the rules for a given account) on transactions sets ranging from 5-20K transactions at a rate of 200-300/second per engine instance. With 8 engines that would roughly translate to 96K per minute. One wouldn't need to batch the transactions every ten minutes if you use a good pattern matching algorithm and break the process into stages. What did the actual compliance rule look like? The kind of compliance rules I dealth with are govern regulations and diversification rules that calculate exposure based on aggregates. peter