JCR news: Alfresco creates CMS benchmark

Discussions

News: JCR news: Alfresco creates CMS benchmark

  1. JCR news: Alfresco creates CMS benchmark (12 messages)

    Alfresco, along with Red Hat and MySQL, has published a white paper covering the creation of a benchmark for JCR (a topic often-discussed on the Apache Jackrabbit lists.) The paper (titled "The Fastest JCR Repository," freely downloadable but requires registration) addresses Alfresco's JCR implementation, and could be broader in application, but they're also going to make the benchmark open source. The Alfresco numbers could serve as a reference point.
    A true performance profile, not just for Alfresco, but for the stack being used can be determined to deliver a reflective cost in terms of dollars per transaction per second. Benchmark metrics are assessed by independent experts through a formal auditing process so that they assure compliance with pre-determined benchmark rules. This validation process gives customer’s confidence when comparing the performance of different stack configurations. The benchmark is based on the following phases and principles:
    • Uses the JSR-170 Industry standard API
    • Large repository with 10 million documents and above across 10,000 folders
    And the following phases
    1. Perform a continuous load of documents, that is transactionally safe, to demonstrate linear scalability for document uploads, as the size of the repository increases
    2. The document load is performed with a failure, to validate that transactionality is maintained and the repository is not corrupted
    3. Mixed, concurrent read and write to demonstrate linear sub-second response times without performance degradation due to concurrency issues
    With
    • Different User Loads
    • Different Machine Configurations
    The white paper then walks through the benchmark process, showing what elements were tested, and what requirements were fulfilled. It then produces two sets of results, based on the ratio of reads and writes for the repository.
    The mixed read/write benchmark delivered the following performance results:
    • Read Content – 0.34961s
    • Read Property - 0.41976s
    • Create Content – 0.58788s
    • Create Folder – 0.54419
    The paper doesn't compare other stacks, so these numbers may be fantastic - or they might not. When the benchmark is made available, the Alfresco numbers will be able to be compared and the test validated. Regardless of the benchmark scores, the fact that Alfresco is committing resources to this effort is excellent news.

    Threaded Messages (12)

  2. Link to a wrong paper[ Go to top ]

    The alfresco's web site linked the paper to "Scale out for Enterprise Content managment" by John Newton instead of JCR paper. Chester
  3. Re: Link to a wrong paper, ok now[ Go to top ]

    OK, looks like they have corrected and point to the right paper now.
  4. I think its VERY important to look at a variety of use cases. Alfresco is talking about benchmarking an absolutely massive deployment here, and their benchmarks may very well show beating out Jackrabbit based CMS' in this scenario, but many people don't have a need for anything that can handle this level of load. What if you only have a few hundred or thousand documents over a few hundred folders? This is going to be more common with the less "I'm willing to spend whatever it takes" crowd. Many people using some of the CMS products are simply using them for small to medium sized operations. I have often asked people (CMS developers) where the performance breakdown with a Jackrabbit solution is, and I've NEVER gotten a straight answer except when it comes to the style of system that would be running on CNN.com or similar. Yeah, its clear that getting to that level will require something more robust - but is that the ONLY scenario?
  5. fantastic.[ Go to top ]

    i've been looking very seriously at alfresco these days. efforts like this on behalf of the alfresco team are to be commended.
  6. What if you only have a few hundred or thousand documents over a few hundred folders? This is going to be more common with the less "I'm willing to spend whatever it takes" crowd. Many people using some of the CMS products are simply using them for small to medium sized operations.
    But even if it is only for small/medium websites it can still be important, because it seems you are assuming that one server runs one website. In our hosting environment we have instances of our CMS which runs between 30-50 websites on one server, ranging from super-small to medium-sized. In such cases raw performance is still important (as we noticed when one of our customers did a launch for China).
  7. I think its VERY important to look at a variety of use cases. Alfresco is talking about benchmarking an absolutely massive deployment here, and their benchmarks may very well show beating out Jackrabbit based CMS' in this scenario, but many people don't have a need for anything that can handle this level of load.

    What if you only have a few hundred or thousand documents over a few hundred folders?
    From what we've seen, Alfresco is comparable to JackRabbit for small case scenarios - but Alfresco is much more scalable and does not suffer from locking issues with many read/write threads as JackRabbit appears to. We tried to load up JackRabbit with millions of nodes but always ran into blocker issues after about 2 million or so objects. Also when loading up JackRabbit, the load needed to be carefully performed in small chunks e.g. trying to load in 100,000 nodes at a time would cause PermGenSpace errors (even with a HUGE permgenspace!) and potentially place the repo into a non-recoverable state. Where-as Alfresco could happily deal with 100,000 node transactions with no issues. I personally wouldn't directly compare JackRabbit with Alfresco anyway - they are different solutions trying to solve different problems. JackRabbit is a reference library that implements a JCR-170 repository and that's basically it - a developer can quickly include it in a project as a simple JAR to get a bare-bones JCR-170 repo - nice and simple. Where as Alfresco is a full enterprise scale repository with a well thought out service based architecture. It is a full development platform with a JCR-170 interface, CIFS interface, FTP interface, security framework, script engine, templating engine, rules framework, JBPM workflow integration, transformation services and a fully functional web-client application with customisation framework etc. etc. need I go on? So you can't really compare them. This kind of thing is always good for the industry as it keeps all vendors on their toes and it's good that Alfresco aren't scared to stand up and provide an open benchmark framework with real world numbers.
  8. Interestingly enough we tested our own Jackrabbit based repository (and Jackrabbit as well) in the same general setup (10mio files) in numerous different configurations. There is absolutely nothing in Jackrabbits core that constrains a Jackrabbit repository in size. Scalability is mainly a matter of choosing and configuring the persistence layer correctly. Since I don't know the internals and tuning parameters of Alfresco's content repository well enough to make any statements about its scalability or performance, I really appreciate to get the information from the source. Likewise I think it may not make sense for Alfresco judge Jackrabbit's "tuned and configured" scalability. Looking at the benchmark, I can see that it is a very narrow test that covers basic read/write operations. Which is obviously most important, however I think it could be interesting to include other more or less frequent real-life operations such as versioning, search, observation or even dealing with unstructured content. I think this benchmark is an excellent step into the right direction and I am very happy that Alfresco took that first step. Congratulations. We would be happy to release performance numbers for a variety of Jackrabbit configurations as soon as we get our hands on the actual scripts. Also we would be more than happy to participate in the further evolution of the "opensource" benchmark since I think something like this has huge potential and is very valuable for the industry as a whole. Let's make the "opensource" benchmark also community effort. regards, david
  9. We would be happy to release performance numbers for a variety of Jackrabbit configurations as soon as we get our hands on the actual scripts.
    Are there any results yet? I'm very interested in how Jackrabbit performs against Alfresco.
  10. JCR by APL[ Go to top ]

    This is yet another marketing manipulation of Alfresco, similar to their APL license (http://www.crynwr.com/cgi-bin/ezmlm-cgi?3:mss:11955:200611:cehhbmkbfmficlkgglkf). Until they do not change their licensing policy, all such maneuvers will be just another marketing campaign.
  11. detailing the benchmark[ Go to top ]

    I haven't got the time to completely read their document, but I am seeing a couple of questions that haven't been addressed upfront (and I consider them essential): 1/ why MySQL and RedHat are mentioned? Probably the fact that Alfredo storage is a MySQL, but what has RedHat to do with this? Why not mentioning also JBoss, AMD (dual core opteron), etc. 2/ the benchmark speaks about handling "documents". But there is no such JCR term. And I couldn't find a definition of "document" in their benchmark. 3/ Maybe I am a bit ignorant but before this report I haven't heard of Optaros ("open source leaders"). 4/ Even if this benchmark wasn't run against any other implementation (and even if the scenario is not so generic - f.e. flat storage benchmark, etc.) the document header is: "Alfresco 1.4 Network - The most scalable JSR-170 Repository"). Even if the initial intentions are good, I am getting the feeling that the document is just marketting. ./alex -- .w( the_mindstorm )p.
  12. Re: detailing the benchmark[ Go to top ]

    1/ We worked with MySQL and RedHat as the most prominent independent proponents of open source. We are looking for open source results against proprietary results. 2/ The JCR term of "nodes" is compatible with an XML view of the world and is a technically correct view. However, past benchmarks by Microsoft and EMC have referred to documents, hence our usage of the term. 3/ In terms of specialist system integrators in open source, Optaros are leaders having come originally from Cambridge Technology and having a very strong, global open source practice. 4/ We attempted to run the benchmark on other systems, but had difficulty getting past somewhere around the 1 to 2 million number of nodes along with resources. We are making the benchmark open source to allow other JSR 170 implementations to do their own results. You can download the benchmark at: http://sourceforge.net/project/showfiles.php?group_id=143373&package_id=195365&release_id=458300 Information on running the benchmark can be found here: http://wiki.alfresco.com/wiki/Running_Benchmark_Tests This is more than a marketing exercise. We constantly get requests from users and customers on how high can Alfresco scale? Can it match commercial systems? This only the first answer to that question. We will continue to scale higher in this benchmark and expand the number and types of tests. We believe that these tests are representative of typical usage in a high concurrency environment of accessing and updating information in a content repository. It is exemplary of usage in a medium size enterprise or a department in a Fortune 1000 enterprise. We look forward to future input from developers and the results of even more higher scalability tests.
  13. Re: detailing the benchmark[ Go to top ]

    John thanks for clarifications, but I feel there is still one more detail I would like to know :-)
    2/ The JCR term of "nodes" is compatible with an XML view of the world and is a technically correct view. However, past benchmarks by Microsoft and EMC have referred to documents, hence our usage of the term.
    This makes sense, but I would like to know what is the real description of the "document" used (in whatever format you would like to provide it: node type or anything else).
    We are making the benchmark open source to allow other JSR 170 implementations to do their own results.
    This is definitely a decission I salute. Gonna check it out soon.
    3/ In terms of specialist system integrators in open source, Optaros are leaders having come originally from Cambridge Technology and having a very strong, global open source practice.
    Still, haven't heard of them... but it is always a good time to learn new things (Google has only 141k results about them :-) ).
    This is more than a marketing exercise. We constantly get requests from users and customers on how high can Alfresco scale? Can it match commercial systems?
    Makes sense (... and thanks for saying "more than" :-) ). BR, ./alex -- .w( the_mindstorm )p.