Discussions

News: Caching, Parallelism and Scalability

  1. Caching, Parallelism and Scalability (3 messages)

    "You need to be conscious of the fact that workloads will be distributed and your software would need to be written specifically to take advantage of this parallelism. If your software wasn’t written with parallelism in mind in the first place, portions may have to be rewritten. "While making the most of parallelism does involve extra effort, the solution is still workable. It may involve some rewriting, using common approaches such as more piecemeal workloads, producer/consumer design patterns, use of efficient concurrency libraries and inter-thread synchronization mechanisms. "But what about libraries and subsystems outside of your control? Specifically databases that touch disks that inherently involve sequential, serial processing? And there is no way you can change your database vendor’s code. Systems that involve all but the simplest, most infrequent database use would be facing a massive bottleneck thanks to this serialization. Databases are just one common example of process serialization, but there could be others as well. Serialization is the real enemy here, as it undoes any of the throughput gains parallelism has to offer." More....
  2. 'More....' link doesn't seem to work on Firefox and Chrome... at least on my machine. If you are having the same problem, the link is http://java.dzone.com/articles/caching-parallelism-scalability Good stuff Manik. -talip
  3. Specifically databases that touch disks that inherently involve sequential, serial processing?.......Databases are just one common example of process serialization...
    Nonsense unless you are talking about a database management system that (1) only supports one concurrently executing thread/query, (2) the dbms never keeps pages cached in memory, (3) the business transaction semantics forbid the chopping up of the transaction into smaller data access interactions..... Yes there is latency (and more so than memory access) but is this really serialization? Latency prolongs serialization introduced elsewhere. Caches reduce latency but at the end of the day the work still needs to be partitioned by the developer and scheduled & deployed by the grid computing runtime. The hardest part for most people is deciding how best to do this whilst not breaking (irresponsibly) the transaction semantics of the execution and not introducing complexity for a gain which might not be determined until deployment time and even then is subject to massive fluctuations with possible topology/scheduling changes. William
  4. "But what about libraries and subsystems outside of your control? Specifically databases that touch disks that inherently involve sequential, serial processing? And there is no way you can change your database vendor’s code. Systems that involve all but the simplest, most infrequent database use would be facing a massive bottleneck thanks to this serialization. Databases are just one common example of process serialization, but there could be others as well. Serialization is the real enemy here, as it undoes any of the throughput gains parallelism has to offer." Serialization would only allow a single core in a multi-core system to work at a time, and limit usable computing power. And this is made even worse in a cluster or grid. Using a bigger and beefier database server reduces this somewhat, but still does not overcome the problem that as you add more servers to your grid, you still have process serialization going on in your database server.

    Caching is a good thing in many cases. However I will still take issue with the above writings about databases. It was back in the 90's database wendors led by Informix at the time significantly increased performance via multithreading in their engines and very efficient caching implemented in the engine itself. Disk I/O is also definitely not inherently sequential. With a resonable quality database server you parallelise disk I/O over a large number of disks. You fragment at the table level (rows from the same table spread on many disks) very easily and can increase performance significantly. If that isn't enough you can use solide state disks. Some database vendors has also made in memory database servers for increased speed. Some will work transparantly on top of a traditional disk based database server. Other solutions include shared nothing database servers that run on many independent computers (in a cluster/grid). The point here isn't that caching between the database server and the application isn't interesting. On the contrary they are often the very best solution. The point is that there are also other solutions that should be considered and the database server based solutions has the advantage of not impacting your application at all. You might still be able to play a little :-) And the costs may not be all that different from the caching solution.