Discussions

News: "Sun T2000 Sax Sucks" points out what horizontal scalability is

  1. In "Sun T2000 Sax Sucks," blogger Richard Rodger finds that his Sun T2000 is much slower than he expected for a given process, then explains why, and why it's okay in the long run. The short answer is that horizontal scalability doesn't pay off for a single process, but in the ability to run multiple processes without losing performance.

    From the blog:
    And then it hit me. Duh! 32 hardware threads over 4 CPUs (and that's just the minimum config [Editor's note: the Sun T2000 with four CPUs is actually 16 hardware threads, not 32.]). And I was only exercising one of them! My performance test only runs in one thread. You see, the T2000 is designed for throughput. Serving lots of web requests all at the same time. That sort of thing.

    So I run a really brain-dead test. I opened up a whole load of consoles and started the test on all of them, all at the same time.

    Pay Dirt!

    Not a wince. Not a whine. Cool as a breeze. Same speed on all consoles! The T2000 just laughed at me. A few samples from mpstat, and I was a happy camper again. You see, each thread may be a bit on the slow side, but you do have 32 of 'em. So if each one is five times slower, you need five threads to do the same work. But: five into 32 total threads gives you six or so. Which means:

    Six Times Faster! Baby!

    So what you get with the T2000 is a big old scalability lever, and a fairly small performance lever. Same old story really. There's always a trade-off.
    It's an instructive and obvious explanation of what horizontal scalability is: put together slower, less expensive pieces and scale out, providing lots of CPUs to do the work, instead of spending more on a single CPU that can get bogged down.

    [Editor's Note, part II: this does not represent an endorsement of the Sun T2000 by TheServerSide in any way. It's being highlighted because of the usefulness of the example in illustrating scalability.]
  2. Amdahl's Law[ Go to top ]

    http://en.wikipedia.org/wiki/Amdahl's_law

    If the serial part of a workload is 20%, then the maximum it can ever speed up by parallelization is a factor of 5.

    Hardly news though.

    These guys did a bunch of benchmark, and I think you can tell that the processor has enough memory bandwidth to support 32 concurrent threads, which means it does what it's supposed to do (which is not to be taken for granted these days - xeon for example does not scale.)

    http://www.rz.rwth-aachen.de/computing/hpc/hw/niagara.php

    The only gotcha is never to use it for floating point because that will be your bottleneck. It was designed that way.

    Guglielmo

    Enjoy the Fastest Known Reliable Multicast Protocol with Total Ordering

    .. or the World's First Pure-Java Terminal Driver
  3. I've been looking for some sort of benchmark compare beetween Niagara and Intel/AMD type processor. I like what I read in the article. There seems to be about 20% performance when we compare hardware thread vs coolthread. That's enough to call it a success in my book. We have to look at it as a start, a very good start at that. I read that Sun plans a newer version with 64 coolthreads with the same power consumption. There is a whole category of software applications that can benefit from Niagara (Sun's T1 and T2) paradigm. I am thinking of things like multi-agent systems, neural networks and almost all AI type applications. I have only two critisisms: 1 - Sun better make each coolthread have its own floating-point unit with decent performance. 2 - Sun should attempt to boost performance from 20% to 30% or more (any improvement would solidify its position). If Niagara could support floating-point intensive applications, it could be used in massively multi-player games that could use the multiple threads!
  4. If Niagara could support floating-point intensive applications, it could be used in massively multi-player games that could use the multiple threads!

    I suspect that they just don't have the space on the die.

    Guglielmo

    Enjoy the Fastest Known Reliable Multicast Protocol with Total Ordering

    .. or the World's First Pure-Java Terminal Driver
  5. On the Sun sites there is mention of a future Niagara II that will have multiple floating point units.
  6. As a clarification:

    - Adding more concurrent threads (e.g. more CPUs, more cores) to a box is vertical scaling.

    - Adding more boxes is horizontal scaling.

    What the blog is referring to is parallelism, and indirectly Amdahl's law.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  7. 16 threads is the right number. I wrote the entry from memory and I may have been somewhat over-enthusiastic. My Bad. But in any case you can get more T1 CPUs and scale pretty without penalty, so the outcome is the same.
  8. You shoud not multiply by number of hardware threads (16), but by number of processor cores (4).

    4 hardware threads shares single core and single thread can use 100% of a core or 1/4 of total CPU power.

    If single threaded task is 5 times slower on your T2000 then on your Athlon 2800+, this is bad result.

    Nebojsa
  9. i believe for many java server applications, you may expect somewhat almost linear gain with multiple cpu core and threads up to a point. there are many tests related with this..
  10. i believe for many java server applications, you may expect somewhat almost linear gain with multiple cpu core and threads up to a point. there are many tests related with this..

    To test scalability of hardver threads numer, you have to comare single hardware thread per core, two hardware threaeds per core and four hardware threads per core machines with the same number of cores.

    I don't expect almost linear gain in a test like that.

    Nebojsa
  11. If single threaded task is 5 times slower on your T2000 then on your Athlon 2800+, this is bad result.Nebojsa

    Sorry for pointing out the obvious, but - no, this is not bad. From my POV:

    Pros:
    There are a great deal of applications which do not require much processing power for a single thread. Instead, they require some processing power for many threads.

    Cons:
    Yes, this is not a golden hammer. But that's not really a con, is it :)

    Regards,
    Einar
  12. Throughput computing[ Go to top ]

    Wrong.

    Niagara was designed with short pipelines and fast context switching, making it efficient to switch execution to another thread even while the original is waiting for something as relatively short as a main memory access or FPU operation. It basically keeps it's execution units as busy as possible, whereas in a similar situation your 2800+ will spend hundreds of cycles waiting around for main memory.
  13. You shoud not multiply by number of hardware threads (16), but by number of processor cores (4).4 hardware threads shares single core and single thread can use 100% of a core or 1/4 of total CPU power. If single threaded task is 5 times slower on your T2000 then on your Athlon 2800+, this is bad result.Nebojsa

    Depends, if you do webhosting or other serious parallel loads, then these machines are heavens sent.
    If you read the benchmarks, and I read one of those in iX once it comes to real loads in db, webhosting whatever heavy parallel area, these machines blow anything away in any area.

    Now 99% of all business applications nowadays are massively parallel but do not need that much of a processing time, so you know where the core market of those machines is. And funnily it is the core market for servers currently.

    A webhoster having one of those machines and being able to push dozends of users parallel in vms or jails will see them as heavens sent.
  14. Depends, if you do webhosting or other serious parallel loads, then these machines are heavens sent.If you read the benchmarks, and I read one of those in iX once it comes to real loads in db, webhosting whatever heavy parallel area, these machines blow anything away in any area.Now 99% of all business applications nowadays are massively parallel but do not need that much of a processing time, so you know where the core market of those machines is. And funnily it is the core market for servers currently.A webhoster having one of those machines and being able to push dozends of users parallel in vms or jails will see them as heavens sent.

    I have only point out wrong resoning in the article. I don't think T2000 is bad server.

    Nebojsa
  15. A few months ago due to increased usage we had to upgrade our system. The nature of the system - distributed, highly concurrent, with tens of thousands of users. The connection proxy (Java NIO+SSL), required additional resources.

    From all the hardware out there, three types were chosen for serious consideration (all from Sun) and were benchmarked:

    * V210 (2x 1.2GHz UltraSPARCs)
    * X4100 (2x 2.something GHz Opterons, dual core)
    * T2000 (8-core T1, 1GHz)

    The result? As expected, the T2000 aced the tests.
    1st place: T2000, 28k users
    2nd place: X4100, 16k users
    3rd place: V210, 4k users

    Also considering rack space and power consumption requirements, T2000 won. By far.

    Regards,
    Einar
  16. very good mirc mirc mırc mirç mırç mirc indir mırc indir mirc mırc mırç mirc yükle mirc download islami sohbet dini sohbet islami çet islami chat kelebek kelebek sohbet kelebek mirc kelebek indir kelebek script kameralı mirc kameralı chat mirc mırc kameralı sohbet kameralı chat chat chat yap chat sohbet chatsohbet çet çet odaları çet odası sohbet kanalları sohpet sohbet odaları sohbet kanalları yarışma soru cevap sevgili sevgili bul arkadaş arkadaş ara arkadaş bul arkadaşlık bedava sohbet erkek arkadaş bayan arkadaş oto araba mp3 mp3 indir astroloji gazeteler gazete marifetname bedava domain map