Confio Case Study: Wait-Time Methodology

Java Development News:

Confio Case Study: Wait-Time Methodology

By Confio

01 Aug 2006 | TheServerSide.com

In complex, multi-tiered, Web/Java/database applications, resolving the issues that adversely impact performance is often the easy part. What is truly challenging to the teams of DBAs, sys admins, and application owners involved is the ability to pinpoint the root cause of the problem and bring together team members who can reach beyond their own silo to address and rapidly resolve the issue.

Customer Description: GSI Commerce

GSI Commerce is a leading provider of e-commerce solutions that enable retailers, branded manufacturers, entertainment companies and professional sports organizations to operate e-commerce businesses. On an outsourced basis, GSI Commerce provides a comprehensive, integrated and centralized e-commerce platform, which includes technology, logistics and customer care and marketing services.

As a service-based business supporting thousands of simultaneous transactions, response times are critical to GSI Commerce, its e-commerce retailing partners, and in turn, to the partners' online customers. GSI Commerce uses proprietary technology for its core e-commerce engine. This engine is built on a J2EE application driven by an Oracle database. Customer applications are hosted on this common, three-tiered Web/Java/Oracle infrastructure. To the base solution, GSI Commerce adds customized applications and services to meet the unique needs of each partner's e-commerce business.

Customer Challenge: One-Week Deadline

Before Thanksgiving 2005, GSI Commerce noticed that a nightly process for updating its merchandise databases was not performing as needed. This is a critical operation for the company as this process updates the product information that is made available to online shoppers. It was determined that GSI Commerce's recent implementation of an Oracle upgrade was at the root of the issue. Customer service level agreements were in jeopardy. With the year's peak online shopping days fast approaching, the GSI Commerce database administration (DBA) team had one week to find and fix the issue.

DBAs at GSI Commerce had used standard Oracle tools in attempting to find the problems, but these traditional approaches did not provide enough detail to isolate the exact cause of the bottlenecks. Attention was focused on a four-node Oracle RAC cluster running the merchandising database, but the specific queries causing the delays could not be identified.

Ignite Solution

"I was given a week to fix the critical customer situation. In 24 hours, I had Ignite installed and found and fixed the problem." David Park, Sr. Oracle DBA.

To get to the answer, GSI Commerce turned to Ignite for Oracle, from Confio Software, which identified the specific SQLs and the I/O bottlenecks causing the delays. Ignite demonstrated exactly why the RAC configuration was contributing to the slowdown, and it allowed the DBAs to change the Oracle configuration to accommodate the application, rather than forcing a long and expensive set of development changes to the e-commerce application.

How we did it

Delays were identified by Ignite as being tied to excessive disk reads from the merchandising application. Ignite showed how the increased Wait-Time in the database was directly tied to these read operations. To solve the problem, the technical team needed to know which queries and user operations caused those reads. Ignite identified the exact queries that were causing the problem. Surprisingly, the database was not taking advantage of caching, and Ignite showed that the queries in question were resulting in disk reads. Based on this info, the DBA increased the memory available to these queries. As a result, the Wait-Time dropped dramatically, without having to push the problem back to development and modify the SQL queries.

Additionally, DBAs were alerted to abnormal bottlenecks in the application, all pointing to a database delay. Ignite showed immediately that the delays were tied to one specific SQL, a query that rarely accumulated any Wait-Time in typical usage. Using this specific clue, DBAs learned that the J2EE application owners had just flushed all of the appliation caches, causing the apps to execute a high volume of Oracle queries which the DBAs identified as a problem that could not have been detected using conventional tools.

Finally, the DBAs ran all of the prescribed procedures in Oracle documentation to try and isolate the causes of slow responses, reaching a cache-hit ratio of 99% without any measurable improvement in application performance. Ignite identified that the Oracle block layout was the real problem and recommended an increase in block size. By doubling block size, the number of read operations was cut dramatically, along with the Wait-Time.

Results

After tuning the RAC configuration to accommodate the I/O demands of the identified SQLs, the merchandising batch jobs returned to normal operation. GSI Commerce's platform barely missed a beat and was ready to handle the peak online shopping season for the online stores it operates on behalf of more than 50 partners.

Not only did Confio Ignite allow GSI Commerce to meet its immediate service-level commitments, Ignite enabled the DBA team to complete the upgrade to Oracle RAC and reduce the number of processors assigned to the merchandising database.

"Without the use of Ignite, solving the performance issues and completing the upgrade would have been extraordinarily difficult for our team to do on its own, particularly when you consider the severe time constraints we faced. Confio's Ignite quickly pinpointed the issues and helped us get up to optimal performance for peak season. Ignite is now a major part of our performance and monitoring strategy and helps us maintain the best possible user experience for our partners and customers," said John McGivern, VP IT Operations, GSI Commerce.

About Confio

Confio develops tools that improve the performance and service levels of Oracle databases and the J2EE applications that depend on them. Confio tools are built on industry best-practice Wait-Time methods, and are designed to be the fastest to install and fastest to deliver results. To learn more about Confio Software or Ignite please email info@confio.com or see www.confio.com.