Upside down benchingmarking: a discussion from TSSJS Vegas

Discussions

News: Upside down benchingmarking: a discussion from TSSJS Vegas

  1. Kirk Pepperdine has posted "Upside down benchingmarking," a blog entry discussing one of the more interesting (i.e., difficult) puzzlers he presented in Las Vegas at TSSJS - a benchmark in which the results directly contradicted the expected result, and it's kinda fun to see "benchingmarking" used like this. (Just in case you're interested: Kirk is presenting a talk on Concurrency and High Performance at TSSJS 2008 in Prague.)
    In the case of garbage collection, conventional wisdom tells us that the concurrent collector should give us better more consistent response times because it doesn’t have as severe a “stop-the-world” phase as the default collectors do. The trade is; CPU for responsiveness. So as long as we have enough CPU things should be upside right. I can list a number of other examples, benchmarks that show using a Hibernate is faster than using straight JDBC (without the caching and other fancy speed me up features turned on). At face value this doesn’t make sense. What could be faster than a call directly to the JDBC. Adding Hibernate to the mix only lengthens the execution path. Without the caching and other fancy features, using Hibernate should add overhead to the whole process. So what is at work here? The first thing at work is our notion of what is a response time. We view the system as a black box where we throw it some work and some time later we get an answer. Of course the response time is not a simple linear sum of the time it took to complete each step in the work flow we issued. If we white box the system, the first thing that we should notice is that it is a collection of queues. In this model, the response time still is a summation of the individual response times. However these individual response times are determined by a more complex mix of rates of consumption or computing resources moderated by the availability. Simply put, if we have several threads trying to share the CPU then while one is using the CPU the others will have to wait. This is a queue and we all know that queuing adds latency only adds to the response time. Just how much is a function of several variables.

    Threaded Messages (23)

  2. I can list a number of other examples, benchmarks that show using a Hibernate is faster than using straight JDBC (without the caching and other fancy speed me up features turned on). At face value this doesn’t make sense. What could be faster than a call directly to the JDBC. Adding Hibernate to the mix only lengthens the execution path. Without the caching and other fancy features, using Hibernate should add overhead to the whole process. So what is at work here?
    Well, this will hit anyone as sson as you start tuning any app in an appserver. You´ll have queues at least in network, in web server, in servlet container, in ejb container, and in database connection pool. Almost always you want the majority of queing to happen early as possible, i.e. better to have the requests queue in network than waiting for a database connection while holding lots of resources in the other pools. Havent seen proper workload management in any app server yet.
  3. This whole explanation depends on the services have non-linear (unscalable) performance characteristics with respect to the number of requests. It would be a much shorter blog if this were just pointed out in the beginning. It would also be much more interesting to talk about what causes non-linear performance with respect to requests and how to build systems that do scale linearly or approximately so. I've actually worked on a system that had non linear scaling characteristics (far removed from the requests themselves) where throttling attempts were actually exacerbating the issue. The system was a piece of junk so don't get hung up on the specifics (no matter how crappy they seem), I didn't create the monster, I just had to tame it: The context was creating responses to asynchronous requests (orders). The responses were being driven by a database table that was being populated by scheduled process. A second scheduled process would retrieve the new records in the table based on timestamp column. At some point someone added a throttling mechanism, I suppose to relieve some problem downstream (probably because later a queue was being processed LIFO). This mechanism would limit the number of records processed from the table to a specified number. The problem was that when the transaction load was very high, this second process would take longer an longer as the day went on. The reason was that the SQL was extremely complex and it took a while to get the response, normally several minutes (yeah, I know) and this time was dependent on the number of rows remaining to be processed, not on how many were being processed. So if there were 500 rows to be processed and the limit was 500, it might take 30 seconds to get the rows. But if the limit were 100, the query still took 30 seconds, but we would still have 400 rows after the execution plus any more that showed up in the meantime. This caused a downward spiral where the more rows, the slower the query until the process actually went outside it's batch window resulting in less runs per hour, causing more backlog and slower queries. It would be interesting to hear other scenarios that caused non-linear scaling issues and what was done to resolve it. It would be even more interesting to hear about any scenarios where it can't be avoided.

  4. It would be interesting to hear other scenarios that caused non-linear scaling issues and what was done to resolve it. It would be even more interesting to hear about any scenarios where it can't be avoided.
    Indeed but I have difficulty talking about specific cases.
    Straight queues typically resulting a linear degradation of performance. To get non-linear you typically need to hit on some hardware limitation. Of course there are some obvious exceptions.
    Sorry about the length... Regards, Kirk

  5. It would be interesting to hear other scenarios that caused non-linear scaling issues and what was done to resolve it. It would be even more interesting to hear about any scenarios where it can't be avoided.

    Indeed but I have difficulty talking about specific cases.

    Straight queues typically resulting a linear degradation of performance. To get non-linear you typically need to hit on some hardware limitation. Of course there are some obvious exceptions.


    Sorry about the length...

    Regards,
    Kirk
    Actually, it was good article and solidified some things I had gathered over the years. My only beef is that in my browser, the diagrams are unreadable. I have do a 'view image' to see the text on the graphs. Once I did that, everything was clear. Until then I couldn't figure out what you were getting at. One remedy is to just point out at some point that the non-linear scaling is involved. Probably a better solution is to just enlarge the images a little. The other interesting thing about this is that for these non-linear scenarios, there is some optimal request rate that needs to be maintained externally in order to avoid performance degradation. Perhaps implementing some sort of proxy queue in front of this would be a good approach for dealing with these kinds of systems.
  6. The other interesting thing about this is that for these non-linear scenarios, there is some optimal request rate that needs to be maintained externally in order to avoid performance degradation. Perhaps implementing some sort of proxy queue in front of this would be a good approach for dealing with these kinds of systems.
    What you need is workload management. Why doesnt a standard app server provide it? Its been available in select platforms for 30 years....
  7. What you need is workload management.
    Why doesnt a standard app server provide it? Its been available in select platforms for 30 years....
    My off-the-cuff answer would be to just put a message queue in front of the non-linear resource and let it pull requests. This tends to allow the systems optimize themselves. Of course if your resource can't pull on it's own you need to feed it at the right rate. This seems fairly trivial. What does workload management buy me over this approach?
  8. What you need is workload management.
    Why doesnt a standard app server provide it? Its been available in select platforms for 30 years....


    My off-the-cuff answer would be to just put a message queue in front of the non-linear resource and let it pull requests. This tends to allow the systems optimize themselves. Of course if your resource can't pull on it's own you need to feed it at the right rate.

    This seems fairly trivial. What does workload management buy me over this approach?
    One thing I see is people accidently removing the throttle (queue) in front of the non-linear scaling resource without realizing what they are doing. As for the pull queue, IME, they work very well. The only danger is in getting the granularity of the work load right. Too small and you'll create contention by pulling too often. Too large and the system turns batch-like with may not be good for end user response. - Kirk
  9. GC is your lifeguard[ Go to top ]

    My personal favorite approach to better performance management is to have at least one thread in every JVM creating a large amount of memory. This gives me my application a tremendous performance boost especially as the GC collector is my friend helping me by ensuring that I can throttle the work of all those other threads creating contention around other more important resources. When I am even more adventurous I connect all these processes together via a distributed monitor lock on an object managed by a transparent caching product and then with one thread still creating all those cheap objects and making sure my collector is wide awake helping me I can bring every JVM under my control and bending to my will. William
  10. Re: GC is your lifeguard[ Go to top ]

    My personal favorite approach to better performance management is to have at least one thread in every JVM creating a large amount of memory. This gives me my application a tremendous performance boost especially as the GC collector is my friend helping me by ensuring that I can throttle the work of all those other threads creating contention around other more important resources. When I am even more adventurous I connect all these processes together via a distributed monitor lock on an object managed by a transparent caching product and then with one thread still creating all those cheap objects and making sure my collector is wide awake helping me I can bring every JVM under my control and bending to my will.
    That sounds at least a little bit fallacious, perhaps even facetious .. Peace, Cameron Purdy Oracle Coherence: Data Grid for Java and .NET
  11. Re: GC is your lifeguard[ Go to top ]

    My personal favorite approach to better performance management is to have at least one thread in every JVM creating a large amount of memory. This gives me my application a tremendous performance boost especially as the GC collector is my friend helping me by ensuring that I can throttle the work of all those other threads creating contention around other more important resources. When I am even more adventurous I connect all these processes together via a distributed monitor lock on an object managed by a transparent caching product and then with one thread still creating all those cheap objects and making sure my collector is wide awake helping me I can bring every JVM under my control and bending to my will.


    That sounds at least a little bit fallacious, perhaps even facetious ..
    Maybe but it is a sign the William is developing a sense of humor. And I say that is a good thing :-) Kirk
  12. Re: GC is your lifeguard[ Go to top ]

    Hi Kirk, I am glad you replied as I was thinking after reading Cameron's email that I just cannot win no matter what and how I say something. William
  13. Re: GC is your lifeguard[ Go to top ]

    Hi Kirk,

    I am glad you replied as I was thinking after reading Cameron's email that I just cannot win no matter what and how I say something.

    William
    Oh Cam was just yanking your chain also ;-) Kirk
  14. What you need is workload management.
    Why doesnt a standard app server provide it? Its been available in select platforms for 30 years....


    My off-the-cuff answer would be to just put a message queue in front of the non-linear resource and let it pull requests. This tends to allow the systems optimize themselves. Of course if your resource can't pull on it's own you need to feed it at the right rate.

    This seems fairly trivial. What does workload management buy me over this approach?
    The ability to monitor and change how the system reacts to workloads without re-designing the entire system? Perhaps even allowing this to be done by operators rather than someone who can recompile the entire system when the workload suddenly changed? The ability to move load between nodes in the system, when needed? Workload management is a general solution to the problem you solved with a particular queue design.
  15. What you need is workload management.
    Why doesnt a standard app server provide it? Its been available in select platforms for 30 years....


    My off-the-cuff answer would be to just put a message queue in front of the non-linear resource and let it pull requests. This tends to allow the systems optimize themselves. Of course if your resource can't pull on it's own you need to feed it at the right rate.

    This seems fairly trivial. What does workload management buy me over this approach?


    The ability to monitor and change how the system reacts to workloads without re-designing the entire system? Perhaps even allowing this to be done by operators rather than someone who can recompile the entire system when the workload suddenly changed? The ability to move load between nodes in the system, when needed?

    Workload management is a general solution to the problem you solved with a particular queue design.
    Are you answering my question or asking questions?
  16. What you need is workload management.
    Why doesnt a standard app server provide it? Its been available in select platforms for 30 years....


    My off-the-cuff answer would be to just put a message queue in front of the non-linear resource and let it pull requests. This tends to allow the systems optimize themselves. Of course if your resource can't pull on it's own you need to feed it at the right rate.

    This seems fairly trivial. What does workload management buy me over this approach?


    The ability to monitor and change how the system reacts to workloads without re-designing the entire system? Perhaps even allowing this to be done by operators rather than someone who can recompile the entire system when the workload suddenly changed? The ability to move load between nodes in the system, when needed?
    Just to note, the approach I doesn't necessarily need to be impacted upon by these issues. You could build it that way but that's not really the proper approach. Adjusting how workload is managed with queues is mostly a configuration effort. Where I work this is managed by operations. When you are pulling from a queue, it's trivial to move new nodes in and out and route the messages around with basically unlimited flexibility. To be clear, I am talking about message queues including (but no limited to) JMS implementations and not queue objects or structures compiled into a program.
  17. What you need is workload management.
    Why doesnt a standard app server provide it? Its been available in select platforms for 30 years....


    My off-the-cuff answer would be to just put a message queue in front of the non-linear resource and let it pull requests. This tends to allow the systems optimize themselves. Of course if your resource can't pull on it's own you need to feed it at the right rate.

    This seems fairly trivial. What does workload management buy me over this approach?


    The ability to monitor and change how the system reacts to workloads without re-designing the entire system? Perhaps even allowing this to be done by operators rather than someone who can recompile the entire system when the workload suddenly changed? The ability to move load between nodes in the system, when needed?


    Just to note, the approach I doesn't necessarily need to be impacted upon by these issues. You could build it that way but that's not really the proper approach. Adjusting how workload is managed with queues is mostly a configuration effort. Where I work this is managed by operations. When you are pulling from a queue, it's trivial to move new nodes in and out and route the messages around with basically unlimited flexibility.

    To be clear, I am talking about message queues including (but no limited to) JMS implementations and not queue objects or structures compiled into a program.
    Ok, so you have created your own workload management using queues. Thats nice, but wouldnt it be nice if there was a container offering these services for you? I can configure the size of pools in my app server, which is great for tuning to different kind of loads, but what I really want is to be able to prioritize based on business properties, or define rules for which load is to be delayed or rescheduled if the system is overloaded. Anyhow, guess it is off topic...
  18. Ok, so you have created your own workload management using queues. Thats nice, but wouldnt it be nice if there was a container offering these services for you?
    Sure, I guess. I'm not sure what I'm missing exactly. We currently use MQ in combination with other software which allows at least some of what you are talking about, I think.
    I can configure the size of pools in my app server, which is great for tuning to different kind of loads, but what I really want is to be able to prioritize based on business properties, or define rules for which load is to be delayed or rescheduled if the system is overloaded.

    Anyhow, guess it is off topic...
    I don't think it's off-topic. I get the feeling that you have misinterpreted my question as a challenge. I'm really not familiar enough with the type of tool you mention to make an assessment of whether it provides features beyond what I have at my disposal now. Any guidance you would like to provide would be appreciated. Perhaps these features are offered by different products under different names. Are you are asking why isn't there a standard for this kind of thing? I know that WS* (which I detest) attempts to address some of the concerns you mention. Of course, for my two cents, this kind of throttling is a last resort. The preferred solution would be to modify the non-linear service to have nominally linear performance. And that of course is what Cameron's baby is all about, isn't it? But like many other people, I live in a world where we can't always fix the real problem so I think some approach to throttling be it a workload management package or not is a invaluable tool.
  19. workload management[ Go to top ]

    What you need is workload management.
    Why doesnt a standard app server provide it? Its been available in select platforms for 30 years....


    My off-the-cuff answer would be to just put a message queue in front of the non-linear resource and let it pull requests. This tends to allow the systems optimize themselves. Of course if your resource can't pull on it's own you need to feed it at the right rate.

    This seems fairly trivial. What does workload management buy me over this approach?


    The ability to monitor and change how the system reacts to workloads without re-designing the entire system? Perhaps even allowing this to be done by operators rather than someone who can recompile the entire system when the workload suddenly changed? The ability to move load between nodes in the system, when needed?

    Workload management is a general solution to the problem you solved with a particular queue design.
    John, the only limiting factor to this today is the relative complexity of the systems. When systems only had a dozen queues representing load, it was relatively easy to name them and understand the implications of balancing, co-locating, etc. Relatively simple systems todays often will have orders of magnitude more complexity in terms of queues and processing stages, and to achieve reasonable latencies many of those will be internalized (or at least side-effects of other decisions will result in them not being separable, e.g. relocatable). At any rate, while the goal is admirable, the reality is that our tools and processes haven't kept up at all with the growth in complexity of software systems. Peace, Cameron Purdy Oracle Coherence: Data Grid for Java and .NET
  20. Re: workload management[ Go to top ]

    At any rate, while the goal is admirable, the reality is that our tools and processes haven't kept up at all with the growth in complexity of software systems.
    Not entirely true but maybe this was just a counter to my previous posting. There are tools that have tried to tackle the complexity by encouraging the construction and analysis of software execution models to assist in the development and management (through understanding) of such beasts. The need and awareness for such tools has been growing steadily over the last year as customers (and not necessarily the vendors) start to realize that some half baked management dashboard (lipstick on a pig) with HUGE red and green circles or other simply chart junk is not going to cut it. William
  21. Re: workload management[ Go to top ]

    <blockquoteJohn, the only limiting factor to this today is the relative complexity of the systems. When systems only had a dozen queues representing load, it was relatively easy to name them and understand the implications of balancing, co-locating, etc. Relatively simple systems todays often will have orders of magnitude more complexity in terms of queues and processing stages</blockquote> Yeah, I can understand that, especially in the kind of systems that use your software. But I still believe that the vast majority of systems are a lot simpler than this - i.e. network request->servlet request->business logic invocation/(ejb request)->database call. Even for this simple scenario I havent seen any good approach to workload management available yet. Parts of it can be handled but usually depends on platform/os specific features external to the (java based) container. I would like this kind of features and I believe they belong in an "enterprise" platform.

  22. It would be interesting to hear other scenarios that caused non-linear scaling issues and what was done to resolve it. It would be even more interesting to hear about any scenarios where it can't be avoided.

    Indeed but I have difficulty talking about specific cases.

    Straight queues typically resulting a linear degradation of performance. To get non-linear you typically need to hit on some hardware limitation. Of course there are some obvious exceptions.


    Sorry about the length...

    Regards,
    Kirk

    Queues can actually be used to both increase throughput and lower average response times in some fairly common use cases, as they can be used to accumulate work backlogs into batches and so generate fewer (although larger) requests to downstream subsystems.
    Batched database, network, XA transactions, and disk calls, and even context switching to a certain extent, may show an overall speedup as all of these items exhibit an interesting performance curve: they each can consume about the same amount of time and resource overhead to process one small item in a single call as it does to process multiple small items in a single call. An example is JDBC batching.
    Of course batching is no panacea, as it can yield an "upside down" result and even increase response times or even lower throughput. Tom Barnes BEA WebLogic JMS Team
  23. Assumptions are the main cause[ Go to top ]

    My favorite example of where an optimization caused the system to run slower happend to me during my undergrad years. A professor spent several days showing just how great an advance the Fast Fourier Transform (FFT) was over the standard image convolution algorithm. After all of that work, he fired up the new improved version of the program. And it ran slower. A lot slower. Why? Because it was running on a diskless Sun workstation and when it swapped memory, it did so over the network. The standard convolution algorithm generated fewer page faults, even through it did a lot more math, so it ran much faster. The FFT algorithm, when run on a large image, generated so many page faults that it was orders of magnitude slower. So anytime you optimize the wrong thing, you can see this problem. Happens all the time.
  24. Thanks Joe for propagating my fat finger problems. -- Kirk