Embedded Databases: A Performance Comparison

Home

News: Embedded Databases: A Performance Comparison

  1. I've recently conducted a little performance comparison of 3 pure-Java, embedded databases (HSQLDB, Derby and One$DB) and MySQL. To make this information available to a wider audience I have written an article which presents my benchmarking method, the results, graphs and conclusions. The benchmarking code I used is also available for download from within the article.
    Choosing a database is not just about performance—but that is usually one of the criteria. It seems fairly clear that any of the embedded databases will have a significant performance advantage over MySQL.

    For me, the winner is HSQLDB, particularly due to its wonderfully fast INSERTs performance, which will be the bulk of the work required from the database. The SELECT performance would also appear to be excellant when the SHUTDOWN times are excluded. Since MySQL possesses useful GIS support, and better integration with external reporting tools, I won’t be completely leaving it behind. My intention is to use HSQLDB during the simulation runs for input and output data, but then load the data into MySQL when performing the analysis of the experiments.

    Read Performance Benchmarking of Embedded Databases.

    Threaded Messages (35)

  2. interesting[ Go to top ]

    although the results are not conclusive, they are still interesting and informative. thanks for posting it.
  3. HSQLDB and OpenOffice.org[ Go to top ]

    OOo developers contacted the HSQLDB folks in early 2004 with a view to embed HSQLDB into OpenOffice.org. A lot of work has been done, resulting in the latest pre-release versions of OOo and HSQLDB working together as a powerful database development and query environment that has the potential to rival any commercial alternatives.

    While OOo development is funded by its sponsors, HSQLDB is an independent project run by individuals and receives no external funding. The integration with OpenOffice.org has so far required several unpaid man-months of extra work by the maintainer and lead developer of the HSQLDB project. There are still features that remain to be done.

    Unless immediate funding becomes available, the core engine work will be postponed, resulting in less-than-wholehearted acceptance and not-so-glowing reviews when OOo 2.00 is released.

    You can help now! Please see the HSQLDB-OOo page to learn how.
  4. Interesting read. I was hoping there for a moment that the comparison would have a heavier emphasis on embedded criteria, where navigational ISAM style engines could be reviewed. Oh well, I'll have to keep holding my breath.

    I wonder if I'm the only person who'se been looking for an embedded DB engine with the following criteria: (1) transactional, (2) navigational/ISAM with single data file, (3) SQL support for when necessary, (4) update thousands of rows per second, (5) navigate/traverse hundreds of thousands of rows per second.

    I can't be the only person who misses the performance of the xBase days, but wants modern features like single-file footprint, transaction isolation (real ACID, not just Atomic like Prevayler, but C-I-D as well), a nagivational API like Borland BDE, and a SQL interface as a convenience only.

    Products like VistaDB (a .NET engine) play in this space, but I was hoping for something done in Java, and I am also hoping for more options.
  5. Berkley DB should provide all of the ISAM bits you want, but doesn't have an SQL interface.

    I'll bet if you poke around in something like Derby, you'll find a block level ISAM model buried inside of it that wouldn't take much uplifting to expose properly, though you may not be able to easily seperate it from its transactional bits. The old Informix Standard Engine SQL database was a layer on top of its C-ISAM routines, and you could use those C-ISAM routines directly against the SQL database files if you were so motivated.

    But I certainly hear your plea. Particularly for embedded applications, an ISAM DB can be very performant.

    My biggest gripe with them Back In The Day was the lack of transactional support (particularly since with the platform I was using, the coder was resonsible not just for the rows in the tables, but maintaining the indexes as well). Any minor crash of an app or the DB core typically meant an index rebuild. Blah!

    But Ye Olde XBase is basically purty durn fast.
  6. I can't be the only person who misses the performance of the xBase days, but wants modern features like ...

    +1

    Yes, sometimes I miss FoxPro Rushmore amazing speed. Complex SQL queries (a kind of limited but usually enought SQL syntax) and hundreds of thousands of rows processed in few seconds second.

    Sometimes I wonder that a simple storage + index engine like this could easily break any other DB performance mark. And the (distributed) transaction processing could be delegated to a server side Java tier like, hummm... Coherence ?
    Products like VistaDB (a .NET engine) play in this space, but I was hoping for something done in Java, and I am also hoping for more options.

    There also http://www.codebase.com. But its does not support FoxPro CDX index formats (they have their own query / index optimized engine). They have both simple navigational API and a JDBC interface. The core engine is in C, off course.

    BTW, http://www.sqlite.org should be in the benchmark test above.
  7. CodeBase, BerkeleyDB[ Go to top ]

    Thanks for the suggestions.

    I'm a CodeBase customer, and I don't see Sequiter innovating at all. So much potential, same old stuff as 6(,8,10?) years ago. None of the modern necessities I spoke of and others alluded to, such as single file footprint (consistency problems & XA log implied), language binding without dragging in a whole middle-tier, etc. But maybe nice if you have an English C app that needs bitmapped indexes without ACID.

    I remember looking at BerkeleyDB for embedding, but I seem to recall the license being the issue. Maybe unicode was an issue as well, I can't remember.

    And BTW, VistaDB does not support unicode either, which is strange for a newcomer to the market. But VistaDB performance is just about right... I especially like how performance degrades less until you add concurrent operations... good performance profile for embedding I think, especially for a single-user desktop app.

    I remember the day when InterBase was the big hope for embedding. They were going to ship an Express ISAM BDE driver that supported direct navigational operation over the wire, concurrent with other SQL users. But the developer quit right around the time Msft recruited what's-his-name the Delphi architect, so the ISAM driver never got built. Apparently the engine was in good shape already though, being multi-generational which is a good lock model for mixed ISAM/SQL work IMHO.

    I have high hopes a new player will come onto the scene and create the next great embedded DB for Java and/or mono. Too many though are doing what's already been done... creating yet another relational engine, then targeting the embedded market with single-user performance about 2-3 orders of magnitude below ISAM engines.
  8. Take a look on Firbird now.[ Go to top ]

    Yes, Interbase was a hope...
    Not it's better watch for a free flavor - Firebird. They ready to release v. 2.0.
    Also take a look on Vulcan subproject. It looks very interesting and it will be merged in main codebase at v.3
  9. See JDBM (http://jdbm.sourceforge.net/)
  10. http://www.db4o.com/

    Yup - java, and yup - embedded.
  11. Berkeley DB[ Go to top ]

    Berkley DB should provide all of the ISAM bits you want, but doesn't have an SQL interface.

    I was going to suggest the same thing. u can find it at www.sleepycat.com
  12. Axion[ Go to top ]

    Very interesting article. It gives an idea for databases related to cca. 20k rows. I agree some apps are related this amount of rows. Will be fine to see another tests with said to 100k, 1M, 10M and 100M rows. ;-)

    If posible please also add there PostgreSQL and Axion currently in Apache incubation (http://incubator.apache.org/projects/axion.html).

    Thanks again for the post.
  13. Scalability[ Go to top ]

    Very interesting article. It gives an idea for databases related to cca. 20k rows. I agree some apps are related this amount of rows. Will be fine to see another tests with said to 100k, 1M, 10M and 100M rows. ;-)

    :nod: :nod: :nod:

    Sclability is a very important factor for an RDBMS. Forgive me, if wrong, but my impression, from a quick look, is that you tested on data volumes of the order of mangitude of tens of thousand records.

    You do, indeed, evaluate the dependency nature and conclude that all of them seem linear, but this to give a clue what the picture will be on millions, billions and zillions of records, you need to prove that rules of Mathematical Induction do apply to the databases' scalability. That is not a fact. The performance on the embeded databases may degrade on large data, or they may crash, altogether.

    So, while this is very interesting, indeed, it can not be used as a proof that MySQL performs worse than the embeded databases.

    From my limited experience, and what I've heard from others - MySQL performance is really superb, on large-volume data, too.

    Thank you for the article.
  14. Scalability[ Go to top ]

    ...clue what the picture will be on millions, billions and zillions of records...

    For that you don't need benchmarks :-). You use something like
    Oracle, DB2 or whatever. Installed in server mode. Not embedded, not java database. Or at least I suppose so.
  15. Scalability[ Go to top ]

    That seems quite reasonable and it's definitely what I and most people would do.
    On the other hand and if I'm not mistaken, Google has managed to build it's indexing and searching engine on Berkeley DB - quite an impressive architectural feat.

    Regarding embedded databases, I suggest SQLite at http://sqlite.org/ ; it's C, _very_ fast, ACID, SQL-only and talks to C, C++, Ada, ODBC, Delphi, Lua, Objective-C, .NET, Java, Perl, PHP, Python, Ruby, OCaml and functional and non-functional languages. It also embeds (pun intented) some interesting concepts like a virtual machine that executes queries and you can see the "programs" that the parser produces for your queries. With a bit of work you can even access it from different programs simultaneously. It's actively developed and enhanced and also very mature.
  16. Scalability[ Go to top ]

    You do, indeed, evaluate the dependency nature and conclude that all of them seem linear, but this to give a clue what the picture will be on millions, billions and zillions of records, you need to prove that rules of Mathematical Induction do apply to the databases' scalability.

    You're quite right. What I actually said in the article was (emphasis added):
    All of the databases appear to scale linearly although it is not possible to be sure with the experiments performed here
    .

    Also, regarding MySQL performance. I have nothing against MySQL and, if it wasn't for my deployment scenario, I'd happily continue using this database. I think we all know that MySQL is scalable and performant but I required an embedded database, which MySQL is not. Under the very naive and blantently unoptimised tests I performed, MySQL seemed to come out badly. However, in a real world application, the MySQL performance could be easily optimised.

    Thanks for the feedback!
  17. One thing that does seem apparent (from these number, anyway) is that Derby is surprisingly fast.
  18. Derby i found ok[ Go to top ]

    I just started using derby in my project i found its good in performance (means its ok for my requirement went around 1400 batch inserts/sec) and java stored procedures helped me a lot
    from migrating my java code which used postgres in prior..

    Derby's project documentation is better but can be made more better by giving some detailed user's document

    -GK.Sezhian
  19. Choosing a database is not just about performance—but that is usually one of the criteria. It seems fairly clear that any of the embedded databases will have a significant performance advantage over MySQL.For me, the winner is HSQLDB, particularly due to its wonderfully fast INSERTs performance, which will be the bulk of the work required from the database. The SELECT performance would also appear to be excellent when the SHUTDOWN times are excluded.
    I don't think that create/drop tables is a very useful criterion for comparing DBs. Multiple insert time is an ambiguous benchmark too, because a good RDBMS may need to perform additional work in order to minimize select times or update a transaction log. Moreover only Derby may be considered satisfactory choice from the abovementioned list for use with J2EE applications, cause HSQLDB and MySQL transactions/concurrency support is insufficient.

    Regards,
    Theodore Kupolov
  20. J2EE applications, cause HSQLDB and MySQL transactions/concurrency support is insufficient. Regards,Theodore Kupolov
    I am lost here. What do you mean? Why is "transactions/concurrency" support insufficient in MySQL?

    Also, if we are talking about Java/J2EE, it's not necessarily the database who has to support transaction isolation levels (if that's what you meant), Hibernate provides that out-of-the-box or you can use your App. Server container capabilities, etc. etc. etc.

    Though, it is still obscure, to me, what do you find insufficient in MySQL.
  21. How pathetic is it to use something like Hibernate to provide the required transaction isolation for your data ?
    And even if in this fantasy world someone authorises a project to run on a database relying on 3rd party for proper transactional support, what do you do when you need to view or god forbid alter data from a session not protected by every java script kiddie's favourite pet (aka hibernate) ? Who's going to provide required isolation then ? Crystal Reports ? /usr/bin/mysql ? The ODBC driver ?

    How pathetic is a database how's been around for at least a decade and has failed to produce convincing results in supporting transactions ? Read the comments down at http://dev.mysql.com/doc/mysql/en/ansi-diff-transactions.html - really worrying. Does mysql not use a transaction log ?

    How pathetic is a database that has had transaction, trigger, view, subquery support for about 1/100th of its lifetime ?

    How pathetic is a database that has no stored procedures ?
    That has horrible bugs like re-issuing the auto-increment value after rollback/shutdown/restart ?
    That simply crashes on querys and leaves you no choice apart from restoring previous backup on another machine and upgrade ?

    How pathetic is to look up to a pet database, to drool over the features and stability of something probably done as R&D by summer interns at SAP (namely SAP-DB) ?

    These are only a part of the flagrant, in your face flaws. Do not even get me started on what's nice and cool and productive about other databases, while mysql does not even know when it's going to go to sleep and begin dreaming about going to sleep and begin dreaming about them. Async IO ? Tablespaces ? PITR ? Analytics ? Materialised views ? Partitioning ? Unicode is available starting with 4.1.1 - the very latest version.

    By the way, did you know that mysql is GPL ? That means that if you write a piece of software that uses it and it connects through either their JDBC driver, .NET or the native C interface, you're linking to GPL code and your as* is GPL as well:
    By downloading the software from this page, you acknowledge that the software available from here is licensed under the GPL. We advise that you review the GPL before downloading.
    If you need commercial, non-GPL, licenses, you can order them online.

    Why would anyone in the right mind choose to pay for mysql when they can use Postgresql for free, which is a decade ahead of mysql in terms of features and maturity thereof ?

    Granted, MySQL AB guys know their shit, they're a lot better at doing business, creating hype and buy-in than say people involved with PostgreSQL or Firebird. Granted, it's hard to find any ISP that does not provide mysql. But by god that doesn't make it right. It's like saying right, fast foods are everywhere so they must be just right for me, cause like, you know, they're like easy and cheap and shit. Well, guess again.

    The world does not start and end with your hibernate. Nor with your uber potent jboss j2ee server. Also it seems that you're not aware that application servers mediates 2PC between parties that have to have proper transactional support at a minimum (real databases do) and moreover they need to be acknowledge the fact that they're part of a distributed transaction.

    Remember, data is here to stay.
    Careful where you choose to store it, it's definitely the only thing that's going to survive from the project, if anything at all.
  22. How pathetic is it to use something like Hibernate to provide the required transaction isolation for your data ?And even if in this fantasy world someone authorises a project to run on a database relying on 3rd party for proper transactional support
    Hibernate does *not* implement transaction isolation. It *delegates* to either the JDBC API or the JTA implementation.
  23. Hibernate does *not* implement transaction isolation. It *delegates* to either the JDBC API or the JTA implementation.
    ...which sounds entirely reasonable and how it's supposed to be.
    My experience with Hibernate is shall we say less than extensive, so I assumed he was referring to some optimistic locking scheme implemented inside Hibernate.
  24. MySQL software does not need necessarily to be GPL. You distribute your own software (no MySQL, no MySQL drivers etc.) and user can obtain the license. In fact MySQL has dual licensing scheme and you can rely on second one.

    So you can (if you want to be completely legal) obtain a single license for your own development.
  25. SAP DB[ Go to top ]

    How pathetic is to look up to a pet database, to drool over the features and stability of something probably done as R&D by summer interns at SAP (namely SAP-DB) ?

    Actually SAP DB is quite an old DBMS, formmerly known as ADABAS. Check out this presentation for the history of SAP DB

    Not that it matters much, especially when the only recent changes in this database are related to its name... (adabas --> sap db --> max db)
  26. I wonder if anyone has done studies on embedded OODBMS performance and compare it to RDBMS? Many people would probably think an RDBMS would smoke the crap out of an OODBMS, but maybe not so for all DB operations. Any ideas?
  27. As soon as the schema gets complex, OODBMS will outperform any relational DB when navigating/querying and focusing on insert operations is just one side of the coin.

    If you are interested in testing an embedded pur java OODBMS , take a look at FastObjects j2 MicrodDatabase and its free community version "FastObjects JDO community edition".
    Both can be downloaded from http://www.versant.com/downloads

    Thank you for the article.


    Ulf Schenk
    Versant Corp.
  28. As soon as the schema gets complex, OODBMS will outperform any relational DB when navigating/querying and focusing on insert operations is just one side of the coin.

    That statement has absolutely no relevance unless you clearly define the scenarios where it might be true.

    I would have sincerely expected more from an employee working at such a big name company and I do hope you're not implying that a OODBMS would be faster in any scenario over a RDBMS.
  29. Just to add to the frey - I have been truly impressed by PERST beautifully written code and incredible performance. Check it out.

    Gad Barnea
    GigaSpaces, Inc.
  30. I still remember the sound of my jaw hitting the keyboard while reading Konstantin's CV about a year ago.
  31. memory usage with hypersql[ Go to top ]

    i was doing a project that required me to import all known DoD PKI certificate revocation data into a database. this means very large tables of very simple data. it used hypersql as the default embedded database (zero-config setup) with the option to configure it to use Oracle if you were so inclined.

    - oracle was much slower on inserts and updates. but i could put in ANY amount of data into the database while getting constant memory usage no matter how much data I put into it (5 million rows of data for instance). remember that Oracle has all of these binary files to page data out of memory.

    - hypersql was *much* faster until i started getting into millions of rows. the VM would run out of memory when there were millions of rows in the database - even after maxing out the VM size to 1024m. this version of hypersql did not page anything out to the database. what it stored on disk was just every SQL statement applied to it since it was initially started. note that hypersql became infeasible for my project as i approached getting all the data in the database...because in-memory usage is proportional to all the data in the database... not just what you might be using in your queries.

    So clearly, you *must* use a database that will page data out of memory when the amount of data in the database will exceed the memory on your machine. but it's slower.

    So, if you are using a database that doesn't page out to disk, then you essentially have an IN MEMORY database. Note that this does not mean that when the process goes down, you have lost your data... when data is changed, the transaction is logged on the disk.

    *if* you use hypersql with hibernate, then you have in-memory-persisistence that is trying as hard as possible to look like persistent objects. in that case, you begin to wonder if you can actually *increase* the capacity of your database and reduce overhead dramatically by using something simple like Prevayler (whose main limitation is that it's in-memory object persistence...implemented as a POJO change log to disk). you would not have all the indirection and overhead of the RDBMS since joining would be done by following references, and there would be no OR translation going on. essentially, hibernate+hypersql (or any non-paged SQL database) makes the case for just using a simple in-memory POJO persistence mechanism.
      
    it seems that hypersql with hibernate's primary advantage comes into play if you are actually taking advantage of the fact that the database is relational in some places. the second advantage of hibernate and hypersql is that your implementation would not be forced to change if your database had to support much more persistent data than you have available memory....you could easily switch to another underlying database while still providing hypersql as the embedded default.
  32. Did you really mean hypersql or hypersonic sql / hsqldb ?
  33. yes.. i mean hsqldb[ Go to top ]

    i think that the hypersonic name is just the name of the project before they started calling hsqldb... the version i was using was the latest as of some time last summer. hsqldb's greatest strength is that it's SO EASY to set up (hats off to them for that!)...especially if you want to make an embedded setup that can be reconfigured to point to another database only in the event that it becomes a REAL requirement. (not just a percieved, or projected requirement.)

    unless hsqldb has changed fundamentally in the last few months... if you go digging around in hsqldb's disk files, it becomes obvious that you have a database whose total size is constrained by the amount of RAM on your machine. i mean... it's a properties file, and a file full of SQL statements that start off making the tables, then doing all the inserts. that's why there's no directory full of obscure binary files; hsqldb starts up by iterating all of the SQL statements in its startup file to get it into its initial state...which can be a lengthy startup, btw. if you have more data than you have RAM, then you can't start up the database because there's nowhere to swap out unused data to.

    if you are doing this in conjunction with an OR layer, you really start wondering if you should have just used something direct and dead simple like Prevalyer to do the task. ... of course, if you do go with something like that, then you are REALLY committed to staying within RAM imposed size constraints now. but think about how much memory and CPU all that relational to OR mapping uses versus POJOs that are just writing change logs and snapshots to maintain persistence.
  34. Hi Robert,

    Agreed, this kind of in-memory application is different from most transactional business application requirements.

    Just to let you know however, that O/R and JDO in particular are actually rather efficient - data transfer with jdo is fully compiled, so it takes very excess cycles to transfer data from the object to the JDBC driver.

    Memory overhead also is minimal, and is mainly libraries rather than per-instance overhead.

    By comparison, simpler POJO persisters, Prevayler, HB etc are all running thru property/ reflection layers. Properties are less than orthogonal to the model, and change detection can become expensive if snapshots & field comparisons are required.

    And as we saw with Jamie's article, even such a trivial performance test exposed differences & mapping issues between the different databases.


    Cheers,
    Thomas Whitmore
    www.powermapjdo.com
  35. yes.. i mean hsqldb[ Go to top ]

    i think that the hypersonic name is just the name of the project before they started calling hsqldb... the version i was using was the latest as of some time last summer. hsqldb's greatest strength is that it's SO EASY to set up (hats off to them for that!)...especially if you want to make an embedded setup that can be reconfigured to point to another database only in the event that it becomes a REAL requirement. (not just a percieved, or projected requirement.)unless hsqldb has changed fundamentally in the last few months... if you go digging around in hsqldb's disk files, it becomes obvious that you have a database whose total size is constrained by the amount of RAM on your machine. i mean... it's a properties file, and a file full of SQL statements that start off making the tables, then doing all the inserts. that's why there's no directory full of obscure binary files; hsqldb starts up by iterating all of the SQL statements in its startup file to get it into its initial state...which can be a lengthy startup, btw. if you have more data than you have RAM, then you can't start up the database because there's nowhere to swap out unused data to. if you are doing this in conjunction with an OR layer, you really start wondering if you should have just used something direct and dead simple like Prevalyer to do the task. ... of course, if you do go with something like that, then you are REALLY committed to staying within RAM imposed size constraints now. but think about how much memory and CPU all that relational to OR mapping uses versus POJOs that are just writing change logs and snapshots to maintain persistence.

    If you are really considering using Prevalyer you should read Mike Spille's comments on Prevayler http://www.pyrasun.com/mike/mt/archives/2004/12/25/15.02.00/ plus personaly i think Persisted Object Tree is better. pot.forgeahead.hu
  36. Embedded Performance[ Go to top ]

    For performance, Raima has RDM Embedded which has full multi-core support, in-memory capabilities and b-trees & hash indexing with a small footprint. 

     

    Although it is based on performance, you can test the benchmarks through the evaluation download offered on the website.