Another day, another benchmark

Discussions

News: Another day, another benchmark

  1. Another day, another benchmark (21 messages)

    Benchmarks were made to be broken, or Another day another benchmark. [Courtesy of Tim Fox] - HornetQ 2.0.GA obtained scores up to 307% higher than previously published SPECjms2007 benchmark results, on the same server hardware and operating system set-up. The peer-reviewed results are available on the spec.org web-site: http://www.spec.org/jms2007/results/jms2007.html The results were obtained by Kai Sachs and Stefan Appel from an independent research group at the TU Darmstadt, Germany. Their release announcement can be found here: http://www.dvs.tu-darmstadt.de/news/specjms2007Results_HornetQ.html Work is currently occurring on HornetQ 2.1 which includes another round of enhancements to take performance to yet another level. For more information on HornetQ, please see the web site http://hornetq.org SPEC® and the benchmark name SPECjms2007® are registered trademarks of the Standard Performance Evaluation Corporation. The results used in the above comparison refer to submissions made on the 17 Sep 2009 and 20 Jan 2010 by TU University Darmstadt

    Threaded Messages (21)

  2. Benchmarks were made to be broken?[ Go to top ]

    Anyone can look at SpecJMS and see what the benchmarks is about. Different than other vendors that simply say they are the fastest without providing any facts such as guarantee of persistency and delivery, besides showing fancy graphs over amateur benchmarks over the cloud. SpecJMS is peer reviewed by the industry. Our competitors evaluated the results before they were submitted. No surprises and no arguments here. The results are real. It's easy to write a benchmark that ignore every rule and say you are the fastest. This is very different on SpecJMS. So, in regard to the "Benchmarks were made to be broken"... please don't confuse amateur benchmarks with serious ones were the whole industry participate on it. SpecJMS is a serious benchmark, and the results actually show how fast HornetQ is. If you doubt it try yourself.
  3. It's easy to write a benchmark that ignore every rule and say you are the fastest. This is very different on SpecJMS.
    So, in regard to the "Benchmarks were made to be broken"... please don't confuse amateur benchmarks with serious ones were the whole industry participate on it.
    Exactly, the whole industry. And where's the comparison against the rest of the industry? Only one carefully picked contestant (ActiveMQ)?
  4. Exactly, the whole industry. And where's the comparison against the rest of the industry? Only one carefully picked contestant (ActiveMQ)?
    The benchmark is open. Anyone can publish. You should ask them why they didn't publish their results.


  5. The benchmark is open. Anyone can publish.

    You should ask them why they didn't publish their results.
    Why don't you do that instead? Comparing your HornetQ to only one single product does not seem a valid proof of your statement.
  6. Benchmark comparisons[ Go to top ]



    The benchmark is open. Anyone can publish.

    You should ask them why they didn't publish their results.

    Why don't you do that instead? Comparing your HornetQ to only one single product does not seem a valid proof of your statement.
    The results were submitted by an independent third party, a research group in TU University Darmstadt, not by JBoss / Red Hat - so it#s not us that chose to do the comparing. However, I do agree that what we really need is a comparison between all the major players in the messaging space. Hopefully we'll get more results out soon, although we are limited by license restrictions with most proprietary messaging systems - usually they prohibit benchmark results being attributed to their systems, so this can be a little tricky.
  7. besides showing fancy graphs over amateur benchmarks over the cloud.
    At least you should pass these "amateur benchmarks", right? I gave it a try and tested HornetQ 2.0.0 GA with a very simple P2P test: 1 sender 1 receiver 1 queue 1 KB, non-persistent non-transacted, auto-ack I've tested it on my local box, an iMac, with Sonic's TestHarness. Here is what I got at the receiver side (msgs/sec): 12719,44 13297,59 6661,73 4167,77 2127,30 1343,78 1035,23 664,44 572,83 890,77 I was connected with jconsole. HQ's heap bumped up to 1 GB, CPU 50% and I got an OOM: [New I/O server worker #1-1] 11:50:13,571 WARNING [org.jboss.netty.channel.socket.nio.NioWorker] Unexpected exception in the selector loop. java.lang.OutOfMemoryError: GC overhead limit exceeded I ran the same test on the same box with SwiftMQ. Here is the result (msgs/sec): 29791,93 32654,25 33764,00 33024,23 33487,53 33094,20 32294,93 33471,34 33531,55 32480,37 jconsole stats: Heap between 30 and 50 MB (!), CPU 38%.
  8. Re: Benchmarks were made to be broken?[ Go to top ]

    besides showing fancy graphs over amateur benchmarks over the cloud.


    At least you should pass these "amateur benchmarks", right?
    That's interesting Andreas. We've actually tested against SwiftMQ internally, and our results are *very* different to yours. However our tests were done on real servers, with a real network in between. And our tests use a more "real world" workload - not just one producer/ one consumer. We'll be publishing these in due course. Creating a good benchmark takes a lot of thought. Off the cuff microbenchmarks ("amateur benchmarks") like you report rarely show anything useful. Common mistakes are: 1) Doing test on a laptop/desktop - this does not simulate "real world" where people run messaging servers on real servers. Real servers have very different performance characteristics to laptops/desktops and real messaging systems are optimised for servers, not laptops ;) 2) Not turning off disk write cache, if appropriate 3) Using loopback instead of a real network 4) Testing with just one producer or consumer. System A may seem faster with just one producer - but System B might be able to sustain an overall higher throughput in the "real world" when there are many producers/consumers. 5) Not allowing system to warmup. JIT and other optimisations might take significant time to kick in. Any benchmark needs significant time to warm up before taking results. SPECjms for example has a full 10 minute warmup period before results are taken. 6) Not taking results over a large enough time. Results should be taken while the system is in a steady state and over a long enough period to minimise indeterminacy due to garbage collection, thread scheduling etc If the above basic criteria aren't meant it's really not worth commenting on the results. (BTW the reason you got OOM was because you didn't configure a blocking max size for the topic - if you'd consulted the user manual or asked in our forums we could have easily shown you how to do that.)
  9. Re: Benchmarks were made to be broken?[ Go to top ]

    Well, setting a max-block-size of 10 MB per queue prevents the OOM and lead to a 15K/sec message rate, though still the half of SwiftMQ's rate. Same problem (slowdown 1K/sec, 800 MB heap, but no OOM) occurs when testing with 25 pairs. I know a bit about JMS benchmarks. I wrote a few the last 10 years. I don't think that I need private lessons here.
  10. I know a bit about JMS benchmarks. I wrote a few the last 10 years. I don't think that I need private lessons here.
    Then you since you are such an expert, you would certainly agree that the benchmark you just posted is more or less worthless, right? Which makes me wonder why you are wasting our time with it.
  11. Then you since you are such an expert, you would certainly agree that the benchmark you just posted is more or less worthless, right
    This was a test how HornetQ performs in a 1:1 non-persistent scenario and it says just this: it doesn't perform very well but certainly enough for an open source JMS.
  12. This was a test how HornetQ performs in a 1:1 non-persistent scenario and it says just this: it doesn't perform very well but certainly enough for an open source JMS.
    Translation: "it doesn't perform very well when benchmarked by a vendor of a competing proprietary closed-source solution on his laptop, but when subjected to the rigors of a true enterprise benchmark designed by the well-respected Spec organization, it displays excellent performance and scalability".
  13. Translation: "it doesn't perform very well when benchmarked by a vendor of a competing proprietary closed-source solution on his laptop, but when subjected to the rigors of a true enterprise benchmark designed by the well-respected Spec organization, it displays excellent performance and scalability".
    Typical JBoss speech. It just means that HornetQ fails on the simplest PTP test I could do. And not only that. I did many other tests meanwhile. The point is that you are not able to show your blazing performance by simple tests. Instead you hide your ass behind a spec benchmark which took you weeks of sweat and tears to set up and get through. And only if we would use the very same hardware and the very same setup we would be comparable, otherwise it says only which throughput is reachable on a specific platform setup. I remember the benchmark battles Oracle vs DB2. This ends on a battle of hardware so sorry, this is too expensive so we prefer these simple tests which show a lot. We have a performance suite which tests every single aspect of the JMS spec, incl. rollback and recovery, and consists of 14'000 single tests. This suite runs a few days, tests the provider under max load (on "real" hardware and "real" n/w!) and results in a complete profile of the tested JMS server. We use it to verify SwiftMQ's performance between releases when we made performance improvements. We tried to test it against other providers but most of it fail, e.g. ActiveMQ. So before I run this suite I just test a few things like that 1:1 or 25:25 PTP test. If that fails, I don't need to setup the perf suite.
  14. The results were prepared and submitted by us (TU Darmstadt), not JBoss. As you can see in public available configuration files we did not need much tuning for the HornetQ results (we only adjusted one or two parameters, which were well documented). The HornetQ results were (as all SPEC results) reviewed by several SPEC member organizations. Therefore a SPECjms2007 result has more value than a self generated test.
  15. Andreas, you *do* realize you're making a fool of yourself in public, right?
  16. You try to make me a fool, yes. Nevertheless did I some persistent message tests with full disk sync enabled. First of all I measured the number of possible syncs per second of the disk I can get with java.nio. It wasn't that much, only about 64. So if I run a JMS broker which has disk sync enabled and does not cheat, using a single non-transacted producer, you would certainly agree that the max theoretical rate the broker can reach is 64, because it must sync every single message to disk, right? I was surprised that HornetQ was reaching up to 190 msgs/sec! Can you explain? (I've used this tool and also DiskPerf from Sonic's TestHarness, same result.)
  17. Disk synching[ Go to top ]

    If you're using Java NIO, speed of syncing is dependent on many things, e.g. amount of data in the OS buffers to sync, where that data is located on disk, etc. It's really not a constant. If the data is located all over the disk the disk needs to both spin and seek to reach all of it. Seeking is slow. That's the worst possible scenario. If it just has to spin, then for a 7200 rpm disk, to write data located all around the disk, the theoretical max speed = 7200 / 60 = 120 syncs / sec Part of the design of the optimised journal is to write data append only thus minimising disk seeks. If there is not much data to sync the disk may be able to write it all on average, by just spinning only a *half* revolution to reach a random point on the disk, and no seeking, in this case you could get up to a max. 240 syncs / sec So 190 sounds reasonable for "normal" disk and a reasonable sized message. In fact it's almost exactly what I would expect. Also bear in mind, that if you're using AIO journal then it doesn't use syncing anyway, it uses Linux asynchronous IO - with AIO we get callbacks from the OS when the data has been persisted (Look it up). On this case you would get closer to the 240 figure since the disk spins on average a half revolution to reach the place on the sector where the data needs to be written.
  18. Re: Disk synching[ Go to top ]

    I've measured the disk sync performance with java.nio and direct buffers, message size 1024 and 8192 and I'm getting 64 sync/sec every time. Data is appended. That's not rocket science. HornetQ get's 190 sync/sec which is impossible. Hence, you don't sync every write or you do it async.
  19. Re: Disk synching[ Go to top ]

    Andreas, Of course HornetQ syncs at the appropriate times when using the NIO journal. If you don't believe me then you can check the code, we have nothing to hide - after all it's open source (unlike SwiftMQ), we *can't* hide anything :) As I explained in my previous post, speed of fsync() depends on where the data that is being written is on the disk. If the data is fragmented over the disk, then the disk head will need to seek which is slow. Appends can be optimised to occur on the same disk sector most of the time, in which case max fsync() time can approach 1/P where P = period of revolution of disk. If you're using AIO (that's *Linux* asynchronous IO) it doesn't use an fsync() approach. We get callbacks from the OS when each write has been persisted. With AIO we can typically half disk latency since the average time the disk needs to spin to get to a random point of the disk is given by 0.5 * P. That's why AIO is better than anything you can do in pure Java - and this is why HornetQ has a major advantage over pure Java persistence solutions. I trust that next time you will think a little more before making any further false accusations to avoid further embarrassment to yourself and your company. Best regards
  20. Re: Disk synching[ Go to top ]

    Tim, if a test of java.nio disk syncs per sec results in and you get 3* then it's quite suspect to me and it was obvious that you are either don't sync each time or you do it async. However, I found out that the reason is just the write into a pre-initialized file (you) vs extending it (leads to fragmentation; the test program). This gives you a factor 3 more disk syncs per sec on Linux. So sorry about the false accusations! I've checked your code and it's all done properly. Regards, Andreas
  21. JMS Client setup[ Go to top ]

    Why are there differences in the client setups? How does this affect the benchmark results?
  22. Re: JMS Client setup[ Go to top ]

    For the HornetQ results more client power was needed. However we used the same client hardware for internal AcitveMQ experiments and did not observe any differences to our previous experiments using the other hardware.