Beyond Persistence: How to exploit databases for better performance

Discussions

News: Beyond Persistence: How to exploit databases for better performance

  1. Databases are optimized for fast read/write access, real-time data replication and clustering. Modern databases can serve hundrets of thousands of concurrent requests.The J EE architecture helped to build better maintainable architectures by encapsulating data access logic and modern persistence frameworks today support a clean cut between business logic and data. From a performance perspective we however need to rethink this design approach.

    The clean designed database-driven applications often suffer from performance problems originating in the database layer. Read the full article on how a change in how we use databases helps to improve performance.

  2. Pot to the Fire?[ Go to top ]

    Let me guess, their product will help me figure it out. 

    From what i can tell, these are db people trying to play the roll of architect without the skills or knowledge.  The problem is not ORM. The problem, in part, is people assuming ORM is magic (just like people assume the db is magic or etc).

  3. Pot to the Fire?[ Go to top ]

    They're not DB people, they just have a product and they've highlighted one small part of their product that can show what's happening in the database. An article a little futher down from this suggests they're being bought for $256 million, that'd be a lot of money if it was a company that only knew about database tuning. 

    Their products cover all areas of application systems so I'd guess they do know quite a bit about architecture in all levels, not just data architecture, which is still a specialised area of architecture, just as software, infrastructure etc. can be.

    Its a common problem with ORM that code gets raw data and processes it in the business logic tier. Sometimes that's the best way, a lot of times its missing a trick because the logic only needs aggregated data or people don't consider using denormalisation in the database.

    Like you said, its not a problem with ORM, just how its used. I don't think ORM should be your only tool in the toolbox when it comes with databases. Knowing what a database can do is a necessity before you can make a decision on what the architecture above it should be.

  4. If you use an ORM in your application, unless it's a trivial application, otherwise, you have to know your database, such as its concurrency model (Lock, Transaction isolation level etc etc). You also have to know your ORM tool, its mapping framework, fetching plan and strategy.

    In my consultant career, I have been walking into many applications with an ORM framework, but developers not know how. Typical problems are excessive association modeling, N+1 Query, fetching too much data and fetching too often. For high volume applications, ORM tools surely can do more harm than help. In this case, JDBC is actually a much better fit.

    But use ORM properly, it can improve code maintainability and testability. Quite often in Unit and Integration environment, I would use a different database then the production database. Derby for DB2 and PostgreSQL for Oracle.

    The other strength of the ORM is to fine tune the fetch strategy and tighter integration with caching framework. 

    ORM is a way to deal with Object persistence, what really matters is the Process. Application profiling (SQL profiling) is the key to make sure you use the ORM properly.

  5. Normally when you need better performance, its better to go for jdbc than tuning ORM tools as they always add one mapping layer over jdbc. For online transactional processing applications, ORM reduces development and configuration time. But when there is a need for invoking complex queries like stored procedure and bulk insert queries for ex., insert into select, its always better to use jdbc and usage of human query processing languages delays query execution time. So online report processing applications might directly use jdbc than tools like hibernate etc. Even in online report processing applications, configuration could be done through ORM tools as they configuration tables need better normalization and transaction. As for as high volume data processing is concerned, usage of jdbc is highly recommended than using ORM tools.

  6. Normally when you need better performance, its better to go for jdbc

    Normally its better to use the right tool for the job. If you're handling flat data then sure go for JDBC. If you're handling lots of interrelations of objects then not better at all to go for JDBC, since it becomes a balance of developer time to write the persistence solution against any possible performance benefit (which isn't guaranteed since complexity goes up).

  7. bind variables[ Go to top ]

    The example in your main article shows lots of similar statements because no bind variables are used (and no PreparedStatements). But your article says nothing about that.

  8. Hi,

    I think the article is flawed.

    1. In response to an earlier comment “They're not DB people, they just have a product and they've highlighted one small part of their product that can show what's happening in the database.”

    I can obtain very similar information from the database server query log or other tools. There is nothing special about the highlighted feature and the example is biased.

     

    2. In the article the diagram displays '3,510 rows found'. The next sentence reads:-

    "As shown above, to process one web request this application is making over 3000 database calls."

    When the statement is read in isolation, given the context of the article is "How to Exploit Your Database for Better Performance"

    An application that makes 3,000 database calls to service 1 http request would trigger 1 very large alarm bell!

     

    3. Then when I did correlate the statement to the diagram, maybe the statement made by the author was just incorrect?

    If 3,510 rows were returned from the database, surely this does not mean an equal equivalent 3,000 odd queries were executed?

     

    A strategy to treat the database as ‘just a data store’ is OK. I think the article is flawed, as it makes an assumption that this strategy inherits large volumes of database queries per web request. The author also implies architects strategy is at fault.

    It is also not clear if the example provided was for a typical application request or for a reporting request. If the example is for a reporting request the author ends the article by suggesting complex reporting queries can be more efficiently handled in the database tier. So the most complex request may have been used as the example, architects strategies scrutinised, but the author concludes by saying it doesn’t matter anyway because it should be a stored procedure.

    I suggest the performance problems are more sourced not at the storage strategy or products but at the developer’s lack of knowledge in the usage of O/R and other tools when writing code that will access the database.

    I have witnessed this many times, most recently in a 3rd party supplied web application highlighting this exact issue. It had nothing to do witht he database, architect or server.

     

  9. Hi,

    I think the article is flawed.

    1. In response to an earlier comment “They're not DB people, they just have a product and they've highlighted one small part of their product that can show what's happening in the database.”

    I can obtain very similar information from the database server query log or other tools. There is nothing special about the highlighted feature and the example is biased.

    Of course you can, nobody is suggesting the product is magical.

    You can look at code in vi or Notepad, it doesn't make an IDE redundant. 

     But such products bring the information together so you can collate and drill down through such information. And I also said looking at database queries was a small part of the product.

    2. In the article the diagram displays '3,510 rows found'. The next sentence reads:-

    "As shown above, to process one web request this application is making over 3000 database calls."

    When the statement is read in isolation, given the context of the article is "How to Exploit Your Database for Better Performance"

    An application that makes 3,000 database calls to service 1 http request would trigger 1 very large alarm bell!

    Very true, how would you find one http request triggered 3000 database calls? That's why there's a market for products like DynaTrace. 

    3. Then when I did correlate the statement to the diagram, maybe the statement made by the author was just incorrect?

    If 3,510 rows were returned from the database, surely this does not mean an equal equivalent 3,000 odd queries were executed?

    The rows refer to the queries/updates executed, as most databases store this information in database tables anyway, the 3150 rows will be the database audit of SQL statements, nothing related to the business data involved. 

    So in this case, 3000 odd queries were indeed executed.

    A strategy to treat the database as ‘just a data store’ is OK. I think the article is flawed, as it makes an assumption that this strategy inherits large volumes of database queries per web request. The author also implies architects strategy is at fault.

    It is also not clear if the example provided was for a typical application request or for a reporting request. If the example is for a reporting request the author ends the article by suggesting complex reporting queries can be more efficiently handled in the database tier. So the most complex request may have been used as the example, architects strategies scrutinised, but the author concludes by saying it doesn’t matter anyway because it should be a stored procedure.

    But the example perhaps showed the architecture in this particular use case was wrong. And if that's the case, it doesn't mean the architecture is wrong, just that is doesn't apply to 100% of cases. 

    I'd never design reporting solutions that used pure ORM unless the queries were very simple and low volume, its much more likely that optimised SQL or stored procedures would be needed for heavy reports.

    I suggest the performance problems are more sourced not at the storage strategy or products but at the developer’s lack of knowledge in the usage of O/R and other tools when writing code that will access the database.

    Completely agree. My concern with ORM is that it is used without sufficient knowledge of databases. 

    I have witnessed this many times, most recently in a 3rd party supplied web application highlighting this exact issue. It had nothing to do witht he database, architect or server.

    And if it was a third party app, imagine how difficult it would be to find the problem without monitoring. YOu might not have the source code, the supplier might not have a database similar to your production system to be able to replicate the problem.

     

  10. I agree with most of your comments Mike.

    But I’d just like to pick you up on 1 of them. (I.M.O) It doesn’t matter if a business method using an O/R layer is invoked from a HTTP request or some other client/protocol. When the services are written & deployed the generated SQL should be tested.

    In the past I have enabled the database query logging/slow queries logging and analysed the outputs when a business method is invoked with possible sets of parameters.

    Any suspicious looking queries in terms of their construct or volumes that contradict what was expected can indicate that inefficiency has been configured in the O/R mappings and this can be revised.

    I  have no opinion over the Dyna Trace products as I’ve never seen/used them. I’m sure they are useful.

    This article takes a long way around to say (I.M.O.) choose O/R tooling for the right situation and know what you are doing.

    I just didn’t like the way the author implied by keeping the business logic out of the database inherits thousands of database calls, this just simply isn’t true.

  11. Fair enough Rob, your comments are well written. I've probably got a bit obsessed with seeing some ORM code hammering the database when a simple bit of well thought SQL would be better.

    And I'm talking about SQL, not stored procedures. I'm not completely against stored procedures but prefer them for simplifying persistence, not holding business logic, so I agree with you on that too.

    "choose O/R tooling for the right situation and know what you are doing"

    I think that's exactly what's required and I wish this message was pushed more in ORM. No tools, whether DynaTrace or Hibernate etc., can turn bad code into good.

    Cheers

    Mike