Discussions

News: JCrawler 1.0 Released under CPL.

  1. JCrawler 1.0 Released under CPL. (17 messages)

    JCrawler is an open-source Stress-Testing Tool for web-applications.

    If you want to be more confident about the overall performance of your web-application during pick loads - download and try it. JCrawler follows a more "human" browsing pattern and ensures constant load on your application, hence it can give more realistic results than the alternatives available.
    The What

    JCrawler is an open-source (under the CPL) Stress-Testing Tool for web-applications. It comes with the crawling/exploratory feature. You can give JCrawler a set of starting URLs and it will begin crawling from that point onwards, going through any URLs it can find on its way and generating load on the web application. The load parameters (hits/sec) are configurable.

    The Why

    But, wait a second! Aren't there already a whole bunch of tools like that? Why would anybody write a new one? You could bet there are a number of such programs in the open-source and definitely have to be some kick-ass commercial ones.

    Well, that's what we thought, too. Frankly, we had no desire to write a load-tester tool. We are writing a web-portal system (http://www.digijava.org) not load-testing tools. But then we had a problem with one of our portlets that would only occur on the production server, during a high load, and none of the existing tools we tried was able to recreate it. Log-replay tools were not much help either, because the problem would occur in several hours and we needed some tool to really stress the application so it would crash in more reasonable (i.e. less) time.

    We spent a lot of time trying not to "reinvent the wheel" and find an existing wheel that would help us. There was none. We tried both OSS and commercial tools. None of them gave us the kind of result we needed. So we ended up with the JCrawler.

    JCrawler was irreplaceable in helping us to identify and solve the problem we had. We have released it in under an open source license because we hope it may help somebody else, too and that somebody will not have to go through what we have gone. Also, it may be a good chance for the JCrawler itself to get enhancements. We are very open to the suggestions and especially - help :) We continue to use JCrawler for testing our applications and would not mind, of course, it to get as good as it can.

    If you want to know why is it more than "yet another tool", in its class, visit http://www.jcrawler.com.

    Threaded Messages (17)

  2. threads != hits/sec[ Go to top ]

    I took a quick look at the site. It's nice to have more tools, but the last time I checked TestMaker and JMeter both have the facility to simulate requests/second. Though honestly, requests/second is rather useless.

    I've worked on sites that get 10Million+ page views a day and what matters in my limited experience the number of concurrent requests per machine. Requests/second mean very little if 3/4 of the hits are images. Even when you filter out the requests for images, that still not enough.

    From my experience, it's usally concurrent load that causes problems. In many cases, it's as simple as not closing a db connection. Other common causes are components that are not threadsafe. I believe mercury loadRunner can simulate a given rate also, so I'm not sure the statement on the website is accurate. Perhaps the documentation isn't as concise as it should be, but I'm pretty sure loadRunner can simulate a specific rate.
  3. threads != hits/sec[ Go to top ]

    Peter,

    In my oppinion, these two test different aspects. From my limited experience, number of concurrent connections is very important factor for testing the infrastructural performance - to test what load the Application Server Container can take, the same for the Database or even the layer between the application and the database (e.g. O/R and persistence) if such exists.

    But there is also another, very important factor - the performane of the application code itself. Possible bugs in it.

    Imagine a case when you hit your application with, say, 20 concurrent threads. Say, after some time, each of these threads reach 20 URLs, each of which take 1-2 minutes to load, for some reason (e.g. slow queries). Each of the threads will have to wait for the HTTP request to finish. So each of the threads halts for two minutes. And what is the load you get in human-understandable terms? 1 HTTP request per minute, right?

    That is NOT anything like the real-life situation. Users are not going to wait for minutes for a page to load. One will go away, another will come and you will soon accumulate more concurrent connections than you hoped for. Whether some of your requests halted or not, if you are getting 3 new request/sec traffic, you will be getting that. And the number of concurrent connections will jump up way beyond what you planned for. This is _exactly_ what JCrawler is testing. It is trying to find glitches in the web-application code, itself.

    Of course, I am exaggerating with the numbers in this example. No URL will, probabely, take several minutes (or maybe yes?) but then - you will, also, be getting more traffic than 20 concurrent requests, as well. So the picture is still right, just put in more human scales, I think.

    So, in my oppinion, while it is right to test the infrastructure in terms of concurrent threads, he application, itself, is used by users who operate in terms of hit/sec, hence this test can not be neglected, either. And, yes, indeed threads != hits/sec, this is why testing in terms of threads leaves the other part untested.
  4. totally agree[ Go to top ]

    Peter,In my oppinion, these two test different aspects. From my limited experience, number of concurrent connections is very important factor for testing the infrastructural performance - to test what load the Application Server Container can take, the same for the Database or even the layer between the application and the database (e.g. O/R and persistence) if such exists.But there is also another, very important factor - the performane of the application code itself. Possible bugs in it. Imagine a case when you hit your application with, say, 20 concurrent threads. Say, after some time, each of these threads reach 20 URLs, each of which take 1-2 minutes to load, for some reason (e.g. slow queries). Each of the threads will have to wait for the HTTP request to finish. So each of the threads halts for two minutes. And what is the load you get in human-understandable terms? 1 HTTP request per minute, right?That is NOT anything like the real-life situation. Users are not going to wait for minutes for a page to load. One will go away, another will come and you will soon accumulate more concurrent connections than you hoped for. Whether some of your requests halted or not, if you are getting 3 new request/sec traffic, you will be getting that. And the number of concurrent connections will jump up way beyond what you planned for. This is _exactly_ what JCrawler is testing. It is trying to find glitches in the web-application code, itself.Of course, I am exaggerating with the numbers in this example. No URL will, probabely, take several minutes (or maybe yes?) but then - you will, also, be getting more traffic than 20 concurrent requests, as well. So the picture is still right, just put in more human scales, I think.So, in my oppinion, while it is right to test the infrastructure in terms of concurrent threads, he application, itself, is used by users who operate in terms of hit/sec, hence this test can not be neglected, either. And, yes, indeed threads != hits/sec, this is why testing in terms of threads leaves the other part untested.

    I absolutely agree, that's why I wrote the distribution graph for JMeter. This way I can simulate concurrent load and look at the results in a distribution graph.

    I can see how the response times layout and where the 50% and 90% line is. If the 90% line is within my requirements, say 80 seconds, then I know I'm good. On the otherhand, if the 50% line is within my requirement, but the 90% line is 160seconds, I know more work is needed.

    I should have mentioned I am a committer on JMeter in the previous post. My mistake for not mentioning that.
  5. one more note[ Go to top ]

    If you happen to use Tomcat5 for your development, JMeter contains a health/performance monitor for Tomcat5.0.19 and newer.

    it will graph the performance of Tomcat to give you extra information about the performance. Sorry for the shamless plug for jmeter.
  6. what about POSTs and forms?[ Go to top ]

    Looking through your code, I didn't see any facilities to handle POST requests, or parsing forms in web pages. Does this tool that does what no other tool can do not handle HTTP POST requests?
  7. what about POSTs and forms?[ Go to top ]

    Oops, I guess the pretentious message on the website ("A perfect load-testing tool") caused some confussion :) First off, as loud as that message may be (hey, we have to tell our management that we spent time on something important, too, eh? LOL) it has "a" article, so at least - it is not claiming to be the perfect one. just kidding :) Second, let me clarify - JCrawler has no intention to "compete" with any existing project. It is not a commercial product.

    I do not think it is even serious to "compare" JCrawler to "monsters" like LoadRunner or JMeter or even Grinder. JCrawler is MUCH smaller code, way more specific in its intentions.

    Yet, we think, its very specific job - it does _better_. Maybe so, maybe not - but that's what our experience has been. Otherwise, we had no intention to spend our time on a load-testing tool creation. But since we did, we wanted to share it, in case somebody experiences the same problem as we have. We have reasons to believe that our case was not that unique.

    Yes, as a matter of fact, LoadRunner, JMeter and Grinder were all among the tools we tried. You could expect that - they are definitely the obvious ones, to try first.

    We are not saying that no other tool has any features that JCrawler has, what we are saying is - none that we found had ALL the features that JCrawler does, and in our case the combination mattered a lot.

    The last time I checked (please, correct me if I am wrong) neither Grinder nor JMeter were able to crawl URLs and could only load-test indicated pathes. Which is perfectly fine and not a con of these products, but we really needed a crawling tool. And, I believe, there are others who need this specific feature, too.

    Also, the last time I checked LoadRunner was far from being open-source :) The ability to cutomize and enhance, as well as understand what the tool does, in reality (not in a PR flyer) can be very handy, sometimes, I am sure you would agree.

    As of HTTP Posts: JCrawler is not using its own code for making HTTP requests. It is using Jakarte Commons for that. I believe, it is able to do HTTP Posts, too. The absence of Post submission, in JCrawler, is not about the actual ability but how to do it reasonably and how important it is.

    Please, correct me if I am wrong, but I believe HTTP Posts are mostly used for Form Submission and form submissions _mostly_ lead to SaveOrUpdate operations in a web application. In my experience, these are not the biggest performance problem causes (except one obvious exception - search form, but that you would want to test separately with a profiler). Where the application slows down, and where it hurts the most - is during the Query operations, not update operations. As it is stated in its name JCrawler is a tool which crawles URLs and creates load that way, not a universal performance-meter. Yes, it does not have memory and CPU profilers, either :)


    I do understand that some web-applications have more forms and not all of them are mostly http links. I would say - JCrawler is not the best tool for those. Also, please, forgive my ignorance in that area, maybe it is simple, but I am not sure how to make a small piece of code independently fill-out a complex HTML Form in any reasonable way. That would go all the way up to AI and well beyond a simple tool. From what I know, even Google bot avoids forms and I definitely have way less expertise in the web-crawling than they have.

    If you have ideas about how these things can happen, I would be very grateful to listen and learn from you and add that feature or let you add it, in the great spirit of open-source. :)

    P.S. In any case, thanks for looking into the code. Your interest is very much appreciated.
  8. what about POSTs and forms?[ Go to top ]

    JMeter currently does NOT have a crawl feature, but it wouldn't take more than 5-7 hours tops. Mercury definitely is not open source and can't really be extended by end users. I'm sure Mercury would gladly extend it for users at a big price :)

    I have to admit our developer docs could be better and is kinda lacking. It is getting better and recently mike and I wrote a tutorial for developers. I'm sure other people find the crawling feature a great value. Most of the time, I use access logs to do perform stress/load testing, but the crawling feature sounds like a good feature to have. If you want to port it to JMeter as a sampler, I'm willing to assist.
  9. JCrawler 1.0 Released under CPL.[ Go to top ]

    but the crawling feature sounds like a good feature to have. If you want to port it to JMeter as a sampler, I'm willing to assist.
    Sure, why not? Sounds good to me. With pleasure.
  10. What about authentication?[ Go to top ]

    From your website:
    Http Redirects and Cookies - Some of the tools were not able to properly handle it. These can leave your application's authentication completely untested and give you another set of surprises in the production. This is especially true if you are using a single-sign-on system of some kind, which usually employs transparent HTTP Redirects.

    Which leads me to believe that authentication is supported by JCrawler. If it isn't, the tool wouldn't be that useful. However, I can find no place to configure any type of authentication. Am I missing something?
  11. What about authentication?[ Go to top ]

    From your website:
    Http Redirects and Cookies - Some of the tools were not able to properly handle it. These can leave your application's authentication completely untested and give you another set of surprises in the production. This is especially true if you are using a single-sign-on system of some kind, which usually employs transparent HTTP Redirects.
    Which leads me to believe that authentication is supported by JCrawler. If it isn't, the tool wouldn't be that useful. However, I can find no place to configure any type of authentication. Am I missing something?


    Mike,

    I think, "authentication" and "logged-in user" are not the same notions. In out system non-logged-in users are authenticated too. The authentication results in them getting a Guest role and they do not become logged-in but they are definitely authenticated. I believe, a lot of other systems do it the same way. You have to authenticate first to see if a user is logged-in and if yes - what is his role.

    As highlited on the website, it becomes especially important for SSO-enabled portals. Such portals have different domains. HTTP Cookies (which are persisting authentication result) do not persist from a domain to domain and it is tricky to propagate authentication information, so that user does not have to log in, separately, at each domain. In this case a common approach is to always authenticate user on one domain and use HTTP redirects to redirect to the SSO domain and then back. I do not want to give a lengthy speech about SSO here and it is really not a place for that, but in case cookies and redirects are not properly handled by a load-test tool, it will have problems with the system and will not be able to properly crawl it _even in the logged-out mode_.

    This is what is meant on our website. Not that login is supported by JCrawler.

    As for login feature. I may be wrong, but I think it is not only a non-important feature but can be very dangerous to allow crawling a web-application with non-guest privileges. If a user does not have any specific privileges, it really does not matter. Guest user cookie and no-privileges user cookie are pretty much the same. Except that user may be able to access personal page, but do you want crawler to go there and mess up everything clicking on "change setting" links? I do not want to even think what damage crawler can do if assigned admin privileges.

    Also, in my limited experience - admin and setting pages are not the most often visited ones and _usually_ performance requirements for them are lower, not that critical.
  12. POST[ Go to top ]

    Writing such kind of tool one must not make too many assumptions about the use developers will make of it.
    I for one, while building rich DHTML apps, use POST not only to submit forms to save, but to call pages too.
  13. POST[ Go to top ]

    Vania,

    that means that JCrawler is not the best tool (or not suitable at all) to performance-test our application. I have had such project, too. Pages were almost never refreshed but content downloaded by MS DHTML doDownload() method. Of course, it needs an absolutely different tool for testing.

    Such DHTML-based applications and applications heavily employing HTTP Post will be very purely indexed by Google. Which sometimes does not matter. The application I mentioned, for example, was a banking system, secured with no Guest entrance and Google had nothing to do there :)

    That's perfectly fine. JCrawler covers a specific sublcass of web-applications and has shown to work well there. There are other kinds of web-applications, which other tools may be suitable for.
  14. POST[ Go to top ]

    Sorry for misspellings
    I meant:
    performance-test your application.

    and

    Such ... applications and applications ... will be very poorly indexed by Google
  15. POST[ Go to top ]

    Irakli,

    thanks for the clarification.
  16. Have you checked The Grinder (http://grinder.sourceforge.net/) before writing your stress tester?
  17. We spent a lot of time trying not to "reinvent the wheel" and find an existing wheel that would help us. There was none.
    wget is your friend, what your tool does is basically equivalent to:
    wget -nv -r -w# -T# -O /dev/null -o logfile url

    wget has lots of other features

    Regards,
    Luis Neves
  18. Luis,

    I admire wget as much as you, probabely, do and still - have to disagree with you. Your example is very good and yet - not quite the same as what JCrawler does.

    JCRawler is multithreaded, wget is not. One may think that putting your command in a crontab with an indicated interval will fire-up wget threads and effectively give the same result.

    Unfortunately it is not true. Here is why (ordered: from minor reasons to more importants):
    1) Using crontab+wget does not seem very user-friendly, is not platform-independent and may require root access.
    2) It will not give you the same level of analyzed data (monitor.log output) that JCrawler does. Merely crawling is not enough, we also need to analyze the results of such process.
    3) Invoking a full-blown application by an operating-system is MUCH heavier task than firing up a new thread in a Java application, hence much less accurate. When we talk about several hundred or even ten such new threads per second, time required to start a new application (wget) is quite significant, in my oppinion, and decreases the accuracy of the experiment.
    4) Each wget thread fired-up that way will traverse the same tree in the site's page hierarchy, which is different from what JCrawler does. JCrawler threads add URLs to be parsed to a shared, thread-safe collection from which new threads fetch them in a FIFO way. This gives different style of testing which, in my oppinion, tests the web application deeper in shorter time effectively "stressing" it more, which should be an intention during the Stress Test, by definition.