Stored Procedures or Java?

Discussions

Blogs: Stored Procedures or Java?

  1. Stored Procedures or Java? (20 messages)

    Though not strictly a blog entry, this very short but very interesting question has been posed by Frank Sommers on the Artima forums. In the desire to reach the altruistic goal of platform and product independence are we throwing away technology that can help us to deliver highly scalable enterprise applications?

    Frank questions the wisdom of a Java only solution by pointing out the advantages of keeping data and processing close together.
    "Since data is often stored in a database management system, pushing as much data-related computation into the database as possible is not only efficient but is also very convenient."
    Frank uses comments from Ming Bjärevall and Gavin King to support his assertion that we (Java developers) should consider using stored procedures. Gavin is seen by Frank to lament the fact that so few Java developers even stop to think about taking advantage of the many features that each database vendor offers. On the other hand Ming does see that some groups are taking advantage of stored procedures.

    Of course when one starts relying on stored procedures then one is necessarily bound to a specific product. Though this may not be a good choice for product vendors, it seems less harmful for an in-house project team to make that choice.

    The question Frank leaves us with is,
    "how much data processing would you entrust to stored procedures, and how much would you perform in Java application logic code?"

    Threaded Messages (20)

  2. Stored Procedures or Java?[ Go to top ]

    Frank uses comments from Ming Bjärevall and Gavin King to support his assertion that we (Java developers) should consider using stored procedures. Gavin is seen by Frank to lament the fact that so few Java developers even stop to think about taking advantage of the many features that each database vendor offers. On the other hand Ming does see that some groups are taking advantage of stored procedures.

    This is old news, there's a seminar on this site from last years Java One that points out the same thing. This is the line that Sun sold us. Don't put logic in the DB, put it in the app server. It's a policy where I work. We are not allowed to use stored procedures.
    Of course when one starts relying on stored procedures then one is necessarily bound to a specific product. Though this may not be a good choice for product vendors, it seems less harmful for an in-house project team to make that choice.

    I think you can take it to far. You shouldn't put non-trivial logic in the DB. But stored procedures are a good place to create a data abstraction layer. When (if) you move to a new vendor, the DBAs re-create the SPs in the new database just like they will be re-creating all the tables and constraints.
  3. Stored Procedures or Java?[ Go to top ]

    Frank uses comments from Ming Bjärevall and Gavin King to support his assertion that we (Java developers) should consider using stored procedures. Gavin is seen by Frank to lament the fact that so few Java developers even stop to think about taking advantage of the many features that each database vendor offers.

    Didn't Fowler already beat this issue to death? about a couple or so years ago? Depending on the usage patterns, there is a fine balance between how much logic should you embed in the app server versus the database (stored procs) - and there is a case to be made for both - if you are going to be writing small applications, and you do not care about scalability (e.g. I/O latency doesn't concern you), nor do you care about portability of your app to different databases - stored procedures are probably fine, and they might even improve your apps performance.

    On the flipside, if you need scalability, and you want to minimize i/o latency, as well as bind yourself as little as possible to a particular DB, you should leave all your app logic to be performed by the app server.

    E.g. if you take the SP route too far, and your site grows in traffic, and you use your database for a whole lot of things - you will eventually make your application I/O bound - at which point you have to add more iron in order to scale. This could mean adding expensive db clusters (e.g. Oracle's RACs - or build your own I suppose).

    If you don't thrash your DB, then the worst that could happen is you'll run your app server really hot, at which point you could consider adding extra app server hardware.

    Based on your usage scenario, you may or may not encounter one or both of these bottlenecks.

    There's no hard/fast rule to this one.
  4. Stored Procedures or Java?[ Go to top ]

    Another thing Fowler always accentuates is importance of "domain model" - a proper OO design that involves persistence.
    Putting a lot of business logic into stored procedures results in a functional decomposition style with all its drawbacks.

    http://www.enterpriseware.eu
  5. Stored Procedures or Java?[ Go to top ]

    Another thing Fowler always accentuates is importance of "domain model" - a proper OO design that involves persistence.Putting a lot of business logic into stored procedures results in a functional decomposition style with all its drawbacks.http://www.enterpriseware.eu

    Is there any way to use stored procedures in OO pattern?
  6. Stored Procedures or Java?[ Go to top ]

    Is there any way to use stored procedures in OO pattern?

    IMO stored procedures belong to the category of "SQL intrusive" persistence but as you already know, OR mapping frameworks let you map stored procedures as special cases.

    Kuassi, http://db360.blogspot.com/
  7. Stored Procedures or Java?[ Go to top ]

    Is there any way to use stored procedures in OO pattern?

    Yes. Even when you prefer pure OO aproach to persistence, you can use stored procedures that return cursors. In that case stored procedure doesn't perform any business logic but can hide some complex code for data selection (eg. for legacy systems where relational model is very different from your desired OO model). In some cases this may improve performance.
  8. Stored Procedures and Java![ Go to top ]

    As discussed in my book http://www.amazon.com/gp/product/1555583296/, Java stored procedures give you the best of both worlds -- i.e., portability across RDBMSes (Oracle Db2 UDB, Db2/400, Sybase, Derby, etc) and across tiers (client, middle, and database). A preview of the first chapter is available @
    http://www.oracle.com/technology/books/pdfs/mensah_ch1.pdf.

    Kuassi, http://db360.blogspot.com/
  9. Modularity and Data Independence[ Go to top ]

    A lot of these discussions leave out two key issues: modularity and data independence.

    Regardless of what paradigms you may use (OO, Structured, etc.), a key factor for handling complexity and change is separating the system into modules with defined interfaces between client and implementation; this allows you to hide information about the actual internal implementation details from the clients. This provides a (hopefully) simpler abstraction of the capabilities and services of the module to the client, and insulates it from changes to the implementation. This insulation also frees the implementor to change the implementation. In an ideal situation, you could switch the implementation between Java, .NET, stored procedures or mainframe CICS/COBOL depending on what makes sense.

    "What makes sense" is a matter of overall efficiency and reliability. This means human resources for developing, maintaining and fixing the code, as well as storage capacity, network bandwidth and processor "horsepower" to make the system run. Scalability should be considered here for all resources to avoid being bitten in the future. (If you only have one person who knows SQL and stored procedures, the dependent capabilities have limited scalability.)

    But modularity still requires that the systems responsibilities be partitioned and assigned to modules in an effective manner, as well as making the technology selection choices for implementing the modules. These two type of decisions are interelated and cannot be made separately.

    One of the important benefits of a database management system is to provide data independence. If you make a change to the database schema for one part of an application, other parts of the application are only affected if they are using the schema elements that were actually changed. Traditional single-interface OO languages couldn't provide this type of protection, so we now have component based systems and interface inheritance in languages.

    In many RDBMSs, the query optimizer is part of the data independence, and it applies to stored procedures. When the physical or logical structure of the database changes, the DBMS automatically adjusts the code to use the most appropriate (and hopefully most efficient) access paths. This may also take into account the distribution of data values, so the execution plan varies depending on the data requested. This is virtually impossible to do in client code.

    Also, the DBMS will probably do various forms of global optimizaton such as caching, read ahead and coordinated scans. These improve performance across table or object boundaries.

    If this is not a concern, or you can achieve the same result in some other manner, fine. If there are other concerns that are more significant, let them be the primary drivers for your design. But don't discount the benefits of that (often expensive) DBMS just because of ignorance.

    Einstein said "make it as simple as possible, but not simpler." I've seen many cases of oversimplification that either caused unacceptable performance, or simply shifted the complexity somewhere else and typically made it even harder to handle. Doing both stored procedures and Java code may additional complexity or avoiding oversimplification.

    I like to think of myself as an architect and designer because I understand all these issues, and can make decisions on a case-by-case basis. An answer that is appropriate in one project and organization is not neccessarily best in another project or organization.
  10. Modularity and Data Independence[ Go to top ]

    Agree almost everything you sum up here.

    Knowing why and how to use stored procedures between Java apps and RDBMSs is still a minority among Java developers. These developers usually have deep knowledge and plenty experience in RDBMSs. They have insights on both worlds (OO and RDBMS) and can make more optimal decisions.

    I have an analogy, in the enterprise sector, most systems handle information. Data are more in the center than functionalities. Comparing to the real world, RDBMS is like wholesale, and OO app is like retail. RDBMS handles large volume of products, OO app handles one product at a time. RDBMS, as wholesaler, only deals with OO app. OO app is trusted by RDBMS. OO app deals with end users. There's a lot of details going on in dealing with each and every individual customer.

    Back to technology. In the center of relational model lies the core concept: normalization. When building a relational data structure (database), how much denormalization you want to use for which entities, is usually the most often asked questions, and is the question that demands a slow answer. The intention of O/R mapping is a direct violation of this core concept: in the database, most of the tables are normalized and never meant to be used individually and directly. Retrieving and storing data are done on a group of tables at a time. Not on just one table, not one table at a time. The lack of understanding of this concept is basicly the root cause of the failure of Entity Bean in the past, and any "technology" based on O/R mapping, including Hibernate, in the future.

    Sometimes I was scratching my head on why there are so many directly contradictions exist in the Java "technologies". For example, if, when building a class, people want to have most of the fields "private" and only export functionalities and data through "public" methods, it's hard to imagine, or impossible, just put it this way, that you can tell somebody to write a class and have all fields "public", for you to use. But the same OO people do exactly that to a database. IoC is another hot topic nowadays and some people have gone so far that they introduce as many of that as possible (not as needed) and forget IoC is just one more type of indirection. But at the same time, it's perfectly fine for them to hard bind fields in their classes directly to columns in some tables of a database, which is not even on the same physical machine, or sometimes even not on the same network (with the apps where their classes belong to).

    I would admit though, sometimes, from an OO point of view, it's hard seeing the dearly "object" gets decomposed right in front of you and "persisted" (saved) into database as individual data pieces. But as long as object-oriented database is not efficient enough for enterprise use and OO apps want to use RDBMSs, persisting object as individual pieces remains the only choice, it's just a matter of do it in a hard way or do it in a nicer way.
  11. Stored Procedures or Java?[ Go to top ]

    Though not strictly a blog entry, this very short but very interesting question has been posed by Frank Sommers on the Artima forums. In the desire to reach the altruistic goal of platform and product independence are we throwing away technology that can help us to deliver highly scalable enterprise applications?

    I doubt it, as many Java ORM products have features that greatly assist scalability.
    Gavin is seen by Frank to lament the fact that so few Java developers even stop to think about taking advantage of the many features that each database vendor offers.

    I see that as possibly the fault of the database vendors. If they provided features that conformed to standards, perhaps more of use would use them.
    Of course when one starts relying on stored procedures then one is necessarily bound to a specific product. Though this may not be a good choice for product vendors, it seems less harmful for an in-house project team to make that choice.

    Less, but in my experience, not much less harmful. Having your code tied in to a particular database vendor is not the best idea even for purely in-house work.
    The question Frank leaves us with is,
    "how much data processing would you entrust to stored procedures, and how much would you perform in Java application logic code?"

    Personally, as little as possible. My view is that putting my code into stored procedures is a form of premature optimisation - I would use it as a last resort.
  12. I guess we already have them now. As applications use middleware which moves data closer to the application server, then we basically already have this. Products like ObjectGrid and Tangosol can effectively move data into the application JVM or sometimes, the application code moves to the cache server which has the data. Thus, the application becomes a stored procedure in effect.

    ObjectGrid allows the application to register the equivalent of triggers etc using "ObjectGridEventListener" callbacks.

    The problem with current Java stored procedure support in database is that it's just Java and lots of restrictions. Products like ObjectGrid etc provide the ability to live close to the data and have the full power of J2SE, or J2EE or a cache server/app server combo means the stored procedures aren't really stored procs, they are full blown applications with the facilities to integrate with external pieces and colocate with the data.

    Moving the data to where the logic resides is a key theme that vendors are pursuing right now and for applications that can exploit this, stored procedures are kind of mute given the application basically becomes a giant stored procedure.

    Billy
    My Blog
  13. I humbly disagree with Billy. Data and process co-location is a fundamental factor, IMHO. I do realize that the data/code mobility is as old a debate as data/code duality of representation. Mobile code has served me well on many occasions. In particular, I have used, and enjoyed, the invocation service in Coherence: it gets the job done cleanly and efficiently. Its elegance is partly due to an inherent cache partitioning.

    An invocation service is not unlike a Java stored procedure, yet the concept of closure may offer a more accurate description.
  14. I think closures are different[ Go to top ]

    A closure is different though no? A closure is usually an anonymous code block with associated data. Typically languages with closures don't allow arbitrary code to run against that data, just the code block associated with the data.

    I guess you could post references to the closure in a directory and say thats what ObjectGrid partitioning or Coherence is but it feels different to me. I view the data in the partition as logically static (yes it may move but it doesn't normally). You ship the RPC to where the data is which is pretty much exactly what stored procs are.

    Billy
  15. I have been a java programmer for several years and always treated the database as the "non-intelligent" piece of the design. For the past year, I have been assigned to my first data warehouse project (denormalized star schemas, etc.). I read some of Tom Kyte's stuff on PL/SQL and Oracle and have had a change of heart, at least for data warehouse apps. Besides stored procedures, Oracle has analytical functions that allow you to handle lots of programming chores directly in the SQL. I have re-written lots of Java methods into the SQL and PL/SQL and dramatically simplified the code and sped up performance.

    Since the 1980's, I've had apps that were C with Tuxedo, then C++ with CORBA, then Java applets with AWT and Swing, then J2EE, then some Perl, PHP and .NET, with a little Ruby thrown in for good measure. The entire time, my Oracle RDBMS has been humming along in the background. I know my experience is anecdotal, but seems to me the concern for moving DBMS platforms is a bit overblown, compared to the chances of changing the application platform. If my business logic had been in PL/SQL, it would have been accessible from any of these platforms, but instead we just keep rewriting it.
  16. Stored procedures are great for processing large amounts of data plus you save on the overhead of transferring data out of the database to process it and store it back. If your organization/product has already chosen a particular database vendor and there is no need to support multiple databases you should consider using of this powerful tool.
  17. Stored Procedures or Java?[ Go to top ]

    I have collected some link about this, see if anyone interested to take a look: http://www.carfield.com.hk/weblog/database/Arguments+for+and+against+using+stored+procedures.txt?wrapped=true&
  18. I wouldn't dismiss stored procs. I believe that they should provide the data abstraction layer which will hide the murky details of where exactly the data is stored or how the different statistics are computed (do we query the OLAP tables directly or we denormalize and query the denorm tables, are we precomputing the counts or we calculate the counts on the fly?). We were traditionally avoiding stored procs but in this new product that we're working on (with requirements like to handle a TB of new data coming in each month) in order to achieve performance we had to use stored procs. Yes, we sacrificed prtability, but for the kinds of data volumes that we're dealing with there are only a handful of DB vendors that we need to support anyway.

    Maybe I'm missing something and someone can propose a better architecture - our application processes tons of EDI files in real-time, loads in DB and needs to provide real-time user-defined statistics. Also "near real-time" reporting is crucial since the app is used for business intelligence and the sooner the data is ready for reporting - the better. By using stored procs we were able to separate the DB from Java (they act as the interface to the DB) and let the Java developers optimize and deal with Java side while on the DB side the DB developers can go to great lengths optimizing the DB side of things without affecting the Java side.
  19. This might not be true in your case but if your Java/JDBC code has more than 1 SQL statement, the same code might perform faster as Java stored procedure, right in the database (with minor changes to connect string). See a basic but complete example @ http://www.oracle.com/technology/books/pdfs/mensah_ch1.pdf

    Kuassi, http://db360.blogspot.com/
  20. What about horizontal scalability?[ Go to top ]

    Using stored procedures may sometimes improve performance but what about scalability?

    Business logic layer is quite easy to scale horizontaly (app servers clusters).
    Scaling db layer horizontaly is not that easy in general. And scaling it verticaly is expensive.

    Oracle RAC 9i didn't performed well when used as a scalability mechanism (it was rather a high-availability mechanism).
    Oracle RAC 10g is definitely better but in general horizontal scalability of app servers is better than db servers.

    http://www.enterpriseware.eu
  21. What about horizontal scalability?[ Go to top ]

    Hi,

    Interesting comment. As discussed in my blog, http://db360.blogspot.com/2006/03/stored-procedure-as-database.html#links, stored procedure (data logic) and middle-tier business logic serve different purposes and have different scalability requirements.

    Having said that and before discussing discussing RAC scalability, let's say that even with a singe instance, stored procedures (Java and/or PL/SQL) scale well as far as the system permits. I quoted a slashdot.org user in chapter two of my book about how they were able to sustain 150,000 Java sessions in the database while a JDK can only sustain 3000 sessions.

    Regarding RAC, it's scalability is orthogonal to stored procedure meaning even with middle-tier business logic, if the partionning of your applications and your database design incur too much block pinging/exchange between instances, performance and scalability will suffer. A little anaysis is required.
     
    Kuassi