Discussions

News: TSS Asks: Scaling advice for a CMS?

  1. TSS Asks: Scaling advice for a CMS? (11 messages)

    Nick Lothian says that he's tried everything including "waving a dead chicken over the servers" trick to optimize his content application. Some comments suggest optimizing the SQL used in the CMS, which isn't an option, apparently; he's already using caching; what would you suggest?
    In the past I’ve fixed a problem like this by using curl to make a static copy of the site and some mod_rewrite magic to redirect visitors. In this case that’s unlikely to work, because there is just enough dynamic content to make it more trouble that it is worth. The obvious solution is a rewrite, but that isn’t going to happen, and I don’t want to be doing any more patching of the *&$!#@ CMS. The only thing I can think of is to use Pound as a load balancer, with a second copy of the CMS taking over the content generation when the first one crashes and restarts. I think that will work, but it is kind of a band-aid solution and comes with a whole set of its own problems. For example, it doesn’t look like running two copies of the CMS off the same database will work, so we’ll need to replicate the database. Then we’ll need to make sure the content updates go to the correct CMS/database combination.. etc.. etc.
    One crucial factor is that he doesn't mention the CMS involved, of course - if he was using a JCR-compatible CMS, he could probably export the content and import it into another CMS that had better performance. Even that might not be an option in many environments, though (where software purchasing or management might prevent switching the CMS software.) So what would you suggest? Does anyone have any good suggestions?

    Threaded Messages (11)

  2. So what would you suggest?
    First thing that comes to my mind is a facade and creating custom indices that are used to store static content or (serialized, prepared, combined) objects (of complex content types). That can be done e.g. by using Lucene (or Hadoop if you need the Eierlegende-Wollmilchsau).
    Does anyone have any good suggestions?
    I think, there is no better solution than indices and 'pre-compiled' content types beside just buying evil RAM machines... Or?
  3. First thing that comes to my mind is a facade and creating custom indices that are used to store static content or (serialized, prepared, combined) objects (of complex content types). That can be done e.g. by using Lucene (or Hadoop if you need the Eierlegende-Wollmilchsau).
    That's a very similar system to one my group is just wrapping up doing. However instead of using Lucene we use an Oracle table which stores CLOBs of XML serialized data mapped to & from domain Objects using JAXB. We did look at Lucene but the Oracle solution was slightly easier to implement and faster to load when using SQL*Loader. The lookup column is indexed with a plain old fashioned Oracle index. Current indicators show a 3x speed increase over our original multiple DB query method; which is nice :). Of course adopting a solution like this means you have to start thinking how do you re-populate an existing entry & would this be just as bad as the current situation you are in. You could always have the above solution & have a concept of stale data where threads/processes run over the index & repopulate stale fields but this also could get horribly complicated.
  4. Difficult[ Go to top ]

    Hard to really comment with so little information. Hard to really do anything without modifying the CMS, or rewriting the application. At this level you're basically stuck fixing things at the server level, like the DB or caching proxies. Also depends on the DB you're using. I mean, at the crux this sounds like a DB scaling problem. Your DB is hammered. So, how do you fix that. 1) optimize the queries. If you can't change the SQL generated for your queries, then it's a matter of running EXPLAIN (or whatever) to see if playing games with indexes can improve the problem. 10's of DB hits shouldn't be that big a deal, unless they're horrible DB hits. Find the horrible hits and try to fix those. 2) speed up the DB. Vertical scaling, more HP for the DB, more RAM perhaps, better drives, fatter pipes. You know, money. Can your DB fit in RAM? How big is it really? One of the problems with DB based CMSs is that when queries go back, all the content can bloat the rows. Ideally none of the actual "content" is stored in a table that actually has to be queried (like "give me all stories authored by Mark Twain"). All of the attributes should be on something else POINTING to the content, because you don't want to suck in that content data just to query the meta data. Of course, none of this is in your control either. 3) Replicate the DB. Ouchies. Can be VERY ugly. But, again, depending on the database, it need not be THAT ugly. If the CMS DB is used solely for content delivery and not necessarily other things (like activity logs, and other transaction processing), then you're really just replicating content and you can probably manage a "home brewed" replication system through triggers on key tables. You could have the content authors running against an internal system, make their changes, test it, etc., and then run a process to push the content to the live system(s). This also makes your CMS DBs more "read only" on the production tier. If the CMS is the "all in one" all powerful "One CMS to rule them all", then it's kind of difficult to pull that apart. 4) What's your vendor say when you call up screaming about their CMS crashing on you?
  5. ... optimizing the SQL used in the CMS, which isn't an option ... he's already using caching
    The obvious solution is a rewrite, but that isn’t going to happen, and I don’t want to be doing any more patching of the *&$!#@ CMS.
    ... it doesn’t look like running two copies of the CMS off the same database will work...
    Keep on "waving a dead chicken over the servers".
  6. Two out of three[ Go to top ]

    You know the old problem: Cheap, fast, reliable - you can only have two out of three. First, you need to find what is the bottleneck. How do you know it is the database? If it is the DB, try in the following order: - Explain queries and try to optimize access paths, indexes, query optimizer hints, etc. DBA for your DBMS required. - Throw more memory at it, consider separate DB server with fast connection (cluster?) - Better disks and disk arrangement. Have you tried the usual separation of table spaces and logs and may be even more and faster disks? Too much, too expensive? See above. One possible thing you could do to achieve all of it, is find a way to measure the current status in terms of performance and serve static cached material if the server is overloaded and serve the real dynamic version when the server has breathing room. Not easy to do! K
  7. Re: Two out of three[ Go to top ]

    +1, Don't want to optimize parts who are responsible for 2% performance degradation. How hard is it to write a test case to stress the CMS and measure scalability uses? the author says in his blog:
    The specific problem isn’t performance - it’s stability. After a some hours running the site just stops responding. We’re currently trying to figure out the exact cause of that via stack dumps, but with hundreds of threads it is a difficult process.
    I'm guessing a memory leak or even a connection leak. Anyway, guesses aren't worth much if they aren't confirmed. A test case that recreates the exact same problem would give the chance both to identify the problem without using the production app and test the solution.
  8. CMS APIs indeed consist most of the time (in my experience) of a bunch of (badly written) jsp tags, which are not well matched in a MVC model. This can make it difficult to cache code when you don't want to clutter your JSPs with a lot of caching code. We used OSCache for caching jsp fragments (like the navigation bar) and Spring AOP Cache (usage ) for caching objects that are returned from the CMS API (I admit, I do use my own components...). We wrapped the Servlet for the CMS API with Springs ServletWrappingController so we had more control over the HTTP headers used for caching (cacheSeconds=3600) and the mapping for each url (eg. /imgs/ was dynamic from the CMS but could be cached, so we add a HTTP cache header). We also introduced Squid as a reverse proxy and reworked some of the dynamic parts so they are fetched with AJAX, which makes the pages better cacheable. The problem is cache expiration, which we fixed by using the flushcaches tag from OSCache and the HTTP Purge method which is available in Squid. If you have many CSS and javascript files, it is also good to put them in one file and compress them, since javascript files are not fetched in parallel.
  9. Caching queries[ Go to top ]

    I worked in a fully custom built CMS that had grown itself into severe performance problems. The solution used there was to introduce caching at several levels: - Introducing a new layer between the app and the jdbc, which cached SQL query results using the SQL statement as key. Cache lifetime was just a few minutes, but since in this system a significant part of the SQL queries were often repeatedly used (end user requests for page headers, footers, frequently-used pages) this offloaded the Oracle DB significantly - Introducing a HTML cache on top: Caching the generated HTML on disk for a few minutes or hours also gave significant gains, but the disk space used was significant of course. Today, I suspect one would use functionality in the container or web server/proxy for this. - Using the Akamai caching network on the outside. Akamai is a very powerful service, caching content in "invisible" caching servers around the globe, using the DNS to direct end users to the "nearest" server. Caching time was set differently for different URLs (or URL patterns) to adjust for different degrees of "dynamicity" (the really dynamic URLs, which were supposed to give different responses every time, were not cached in the distribution network at all. This worked very well in our case, but at the expense of complicated publishing of new material (how to purge the relevant subset of the cache for material that need urgent publication, e.g. error corrections?). And of course, the more dynamic the web site is in nature, the less appropriate is caching.
  10. The specific problem isn’t performance - it’s stability. After a some hours running the site just stops responding. We’re currently trying to figure out the exact cause of that via stack dumps, but with hundreds of threads it is a difficult process.
    -> * check jvm memory settings, * use Xloggc to monitor gc activity, * check servlet engine thread pool settings, * check servlet engine's db pool size, * use e.g. jamon to monitor your db activity / SQLs, * check stack dumps for locking-problems
  11. This thread clearly highlights want is the underlying problem to this production problem - the offering and acting of unqualified advice without first clearly defining the problem. The lack of data offered up in the blog entry shows that the team themselves are acting as blindly as the rest of the advisors here. This underlining issue plagues all phases in the typical (software) application life cycle from design through to deployment - developers start with a set of ad hoc solutions and guess which one best matches the symptoms simplifying (or ignoring) the problem domain to suit the solution(s). The following blog entry shows the typical activities involved in software performance engineering. http://blog.jinspired.com/?p=38 You first create a software execution model so that defines the actually underlying processing steps across tiers, layers, and components. For each use case and corresponding steps a resource consumption table is created for the average and worst case scenarios. This can be constructed within a day using a professional software engineering solution. With this model one can already identify many of the low hanging fruit such as excessive chattiness across process and component boundaries. Following on from this a system execution model is created that (defines) models the behavior of each element of the software execution model in relation to workload patterns and volumes, focusing on their impact in terms of the systems overall performance, scalability and reliability. The system execution model will normally highlight resource contention and capacity problems. A system execution model can easily be created by using a professional software load testing tool along with performance monitoring & management solution. Most developers will shy away from the above because it feels like too much work though in fact they will spending much more time diagnosing performance problems in production with very little knowledge of the software's behavior outside their own workstation or lab. Performance problems are common with many CMS solutions because of the level of customization afforded to the customers. Both the software execution model and system execution model are subject to change at a site whereas most product engineering teams and consultants deal only with the system execution model (system stats monitoring). The problem is the impact of any change is not assessed (managed) in terms of performance and resource capacity management. Customers rarely create a performance profile before and after each change to the software - such changes rarely fall under proper change management control. We can offer up a number of fish to this person to taste but I think it would be much more efficient and beneficial in the long term to teach them how to fish for themselves. regards, William
  12. This thread clearly highlights want is the underlying problem to this production problem - the offering and acting of unqualified advice without first clearly defining the problem. The lack of data offered up in the blog entry shows that the team themselves are acting as blindly as the rest of the advisors here.
    That's true. But the following advice is totally useless:
    This underlining issue plagues all phases in the typical (software) application life cycle from design through to deployment - developers start with a set of ad hoc solutions and guess which one best matches the symptoms simplifying (or ignoring) the problem domain to suit the solution(s). You first create a software execution model so that defines the actually underlying processing steps across tiers, layers, and components. For each use case and corresponding steps a resource consumption table is created for the average and worst case scenarios. This can be constructed within a day using a professional software engineering solution. With this model one can already identify many of the low hanging fruit such as excessive chattiness across process and component boundaries. Following on from this a system execution model is created that (defines) models the behavior of each element of the software execution model in relation to workload patterns and volumes, focusing on their impact in terms of the systems overall performance, scalability and reliability. The system execution model will normally highlight resource contention and capacity problems. A system execution model can easily be created by using a professional software load testing tool along with performance monitoring & management solution.
    as it fails especially with this constraint:
    The obvious solution is a rewrite, but that isn’t going to happen, and I don’t want to be doing any more patching of the *&$!#@ CMS.
    So, the only correct answer is this:
    Keep on "waving a dead chicken over the servers".
    Listen to your customer! ;-)