Hewlett-Packard conducts a wide variety of business in every country in the world. Suppliers, channel partners, and many other companies help us design, manufacture, market, sell, and support our products to customers in five separate but often overlapping segments. When you consider that almost every permutation of that last sentence has a web site associated with it, what becomes obvious is that it takes a large, well-organized group of people to deal with the massive scale of sites that must be managed. HP.com’s architecture is a huge conceptual jigsaw puzzle assembled by a variety of Chief Architects and Solution Architects, each with specific areas of expertise, and all with common goals and processes that bind us together.
How do you get thousands of people to construct over 1,000 web sites in such a way that they all work together and consistently carry the same branding? What processes give individual groups room to be creative without introducing hundreds of different ways of doing things, increasing the overall cost to HP along the way? Which technologies are being utilized to promote reusable pieces of functionality?
The Guiding Light: HP IT Transformation
HP IT is undergoing arguably the largest transformation in the history of the industry. When we started this journey in 2005, we were managing over 4000 applications among 1200 active IT projects. We had 85 data centers sprinkled around the world and only 30% of IT spending was managed by the IT organization. Less than 50% of our IT spending went to new innovation as we dealt with massive maintenance headaches, and IT employees were spread across more than 100 different work locations. HP’s overall IT spending was roughly 4% of the company’s revenue.
At the end of the process, there will be around 1500 applications and 500 active projects. HP hardware in 6 data centers will host our infrastructure. 100% of the IT budget is already managed by the IT organization that uses 30 core locations worldwide. Once the data center moves have been completed, these improvements will allow 80% of our IT resources to work on innovations that affect HP’s bottom line, all at a cost of 1.8% of the company revenue.
Longer term, a leading edge portfolio management process helps us planwhere our IT spending will go. Heavily focused on return on investment analysis, this process measures all projects using the same set of financial criteria and lets us discover what programs will deliver the highest business benefit versus alternatives. A nice side effect is that such planning encourages reuse where possible because it significantly reduces the spending needed to achieve certain functionality.
This transformation not only serves as a humungous case study for how customers can best use HP products, emulate our processes, and adopt our organizational and financial models, it also establishes the parameters under which HP.com’s architecture must function. How do you cut IT cost while increasing efficiency? Perhaps surprisingly, it all starts with governance.
Governance That Actually Saves Money
At no point in any project has any software developer ever said, “Oh boy, I get to have my project reviewed by a governance council today!” It is, however, a necessary evil to have governance in place so that those 1000 different websites do not have 1000 different ways of doing things. Without it the solutions, in sum, becomes too expensive. Not that there has to be one way of doing things, but there have to be a manageable number of ways. To this end, teams of seasoned architects across HP IT maintain a master list of components that are deemed most beneficial on various topics of software development. IDEs, web services frameworks, and AJAX libraries are among the pieces reviewed and documented by these experts.
For HP, the business benefit of having a manageable number of solution components is that we enjoy great economies of scale on licensing costs with third parties. Given the number of sites within the HP.com domain, we are able to negotiate volume pricing on functionality such as middleware, databases, and web servers. If each site was allowed to pick whatever technology mix it wanted without bound and was left to acquire licenses individually, the cost across the company would increase dramatically. Instead, by limiting the choices, more opportunities arise for centrally managed, wider-scoped licensing and overall costs decrease. This translates into huge cost savings on the order of several million dollars.
How do you enforce these choices? By having a limited number of data centers to which the development teams have no direct access. HP had 85 data centers around the world that are being consolidated into 6. It is far easier to manage what goes into a smaller number of data centers, so before a release can be published on a set of machines in these environments, it must undergo the scrutiny of a review, during which compliance with the published standards are checked by the various governing bodies. No compliance, no release. Although seemingly strict, the process is necessary to ensure consistency in the solutions and to enable financial gains in the licensing.
Web Site Flavors and Sample Services
With the set of choices now limited, different combinations of commonly needed patterns can be sculpted into platforms and services. Getting price breaks on licensing will only cut IT spending so far. In order to achieve the next order of savings magnitude, reuse must be the enabler.
But, at what places in the architecture is this reuse best suited? In order to answer that, you need to be able to classify the kinds of offerings you need and then you can build platforms that jumpstart development for each category.
Web sites with an “hp.com” domain name fall into three general categories, as depicted around the outside of the diagram (in order of complexity):
- Publishing – Generated HTML that is either static or dynamic per visitor based on a few key parameters (locale, customer segment, etc.). Authoring tools are used by business partners to change content independent of software releases so that alterations can be made without enlisting the help of IT.
- Portals – Highly personalized experiences based on heavy authorization rules applied to individual pages or even snippets of pages. Portlet technologies are used here to facilitate these needs.
- Web Applications – The grey area in between the first two where more dynamic content than Publishing is being generated, often based on form input, but not as varied as Portals. Most of the direct sales sites fall into this classification bin.
At either end of the spectrum, the HP.com organization offers platforms on top of which internal partners can quickly build solutions. Where possible, integrations with key services are provided at the platform level (shown in blue). In the web application category, a variety of technologies are used and service consumption is typically up to the development team working on a particular site given that the variety of needs in that space tends to be too wide to provide a general-purpose platform that meets every possible need.
The bottom center of the diagram depicts a partial list of services utilized by a variety of sites that fall into the various categories, including (but not limited to):
- Authentication – Single sign-on across all HP.com sites.
- Authorization – Rules that bind users into groups based on a variety of factors, which can then be used to decide what content to render.
- Search – Indexing and querying services across all sites with the ability to drill into specific sites.
- Product Catalog – Access into the HP product catalog, whose contents can vary greatly based on locale and customer segment.
An example of mapping individual sites to this model is as follows:
The HP.com home page is comprised of static pages translated for a variety of locales. It does not require any authentication or authorization, but does need to publish changes - in some cases several times a week depending upon what marketing campaigns may be going on in different parts of the world. As such, it is a perfect candidate for the Publishing Platform, which provides integration with HP.com Search automatically.
The Home/Home Office storefront (aka shopping.hp.com) offers a similar product catalog browsing experience to all users. Its content is highly dependent on data in the Product Catalog back end service and much form input is required during the check out process where a purchase is completed. Unsuitable for either Publishing or Portal Platforms, it relies on an HP-enhanced Struts code base to serve its community. It maintains its own integration with the general-purpose services it needs (shown in green).
The Global Partner Portal helps collaborating channel partners who sell HP products hand in hand with HP marketing, sales and support teams. Built on top of HP.com Shared Portal Platform it benefits from ready-to-use services such as site administration, HP.com single sign-on, user profile integration, rule-based user groupings that drive site customization and personalization, and content integration. This enables a highly personalized experience for a wide variety of user roles ranging from site administrators, support agents, and HP internal users to delegated partner administrators and normal users.
In Depth: HP’s User Profile Service
HP websites serve suppliers, channel partners, and other companies that assist in various portions of the product lifecycle in addition to customers in different market segments. Given the different needs of those audiences, it is next to impossible to create a schema that captures profile information for web site visitors in a single place. Yet, there are occasions where information from multiple data stores might be a beneficial. Asking each site to maintain connections with every data store it might use becomes costly as the number of sites multiplies. In order to solve this problem for web user profile data store access, HP created an internal User Profile Service (UPS) so that end sites only need to maintain a connection to a single entity which then aggregates access to the various back ends.
UPS offers a SOAP interface to any client front end that needs to read user profile information and an administration tool (built using Struts) for making configuration changes to the system. HP’s Customer Identity Management System is the external user single-sign on solution, and when calling this interface, UPS expects to be passed the User ID of the user in question along with the name of the profile the caller is interested in. UPS maintains a list of these profiles, essentially a list of items and what their sources are.
For example, the sample file above shows a single profile (“Profile1”) that pulls attributes from three separate data sources. Multiple profiles can be defined in this same way to customize the data requested by individual sites.
The data sources get plugged into UPS using a Java interface called an Adapter. In order to achieve high availability requirements, it became necessary to add Adapters without brining down the entire service. To achieve this, each Adapter gets packaged in its own EAR file and has an associated startup EJB. Within this startup code, information about the Adapter is inserted into the JNDI tree so that the UPS core can find it when referenced in one of the profile defining XML files.
Within a particular Adapter, a variety of mechanisms might be used to connect to the actual user profile data store, which must index information based on User ID making it essentially a master foreign key across all back ends. It is this layer that insulates the client applications from maintaining connections individually. In the configuration of a specific adapter, it identifies what items it offers so that profiles can be constructed based on the elements it is capable of providing. With this structure, then, UPS provides user facing web sites with a broad spectrum of user profile information while maintaining a single interface, even as additional back end data stores are made available.
In Depth: HP’s Shared Portal Platform
One key platform within the HP.com domain is the Shared Portal Platform (SPP), which is used for highly personalized experiences which utilizes portlet technology.
At the top of the stack sits the Portal Server. This is third party software that provides site management, site navigation management, page creation tools, security, and portlet provider management. Content is provided to the Portal Server by a variety of Portlets, which utilize WSRP to facilitate the communication between the two. In an effort to insulate ourselves from Portal Server vendor lock-in, things like authentication, user grouping, and interactions with other sites via punch-outs are handled by a set of Common Web Services that both the Portal Server and the Portlets interact with over a SOAP interface.
Using the administrative tools of the Portal Server, sites are created, pages are added to that site, and pages are tied together with navigation. The tools allow administrators (typically a business user as opposed to an IT resource) to specify what end user groups (created in the Common Web Services based on the user’s aggregated profile and imported into the Portal Server) are entitled to view which pages or which content on specific pages. This creates a user experience that varies greatly depending upon the entitlements a specific user has.
The Portlets providing content to the pages, which are usually created using Spring Portlet MVC, use an HP-enhanced version of WSRP4J to remote themselves. For more personalized content, the Portlets can configure the extended WSRP4J to pass UPS data to it or make SOAP calls to the Common Web Services to access other pieces of data. This information can then be used in conjunction with whatever back end it may be interacting with to create personalized rendering.
When constructed in this way, this portal architecture gives HP a number of key features:
- Through the Portlets and the various standards governing them, content generation is completely independent of the front end Portal Server, making it extremely portable.
- Once content is exposed through a Portlet and a particular profile data store inserted into UPS, a business user can construct groups based on the profile, create site pages, place content on those pages, and apply authorization rules based on the groups – all without IT intervention or code release.
- By having one infrastructure that serves multiple sites, content across sites is more easily reusable.
The size and scale of HP.com’s architecture mirrors that of Hewlett-Packard’s business. Company leaders have placed a priority on an efficiently run IT operation, and those ideas trickle down to technology leaders as well as individual software engineers worldwide. Governance enforced through entry in a small number of data centers relative to the number of IT assets narrows the scope of choices available to development teams, but doing so reduces licensing costs through volume pricing. Services provide functionality that spans different website functionality categories. Platforms for each site classification speed time to market by pre-integrating with commonly used services in addition to offering core functionality sites of certain types typically needed.
What’s next for HP.com? As the data center moves complete, more assets will be consolidated onto the main platforms to further reduce the overall maintenance load. The freed resources will focus on adding even more value to HP’s variety of businesses. As new initiatives emerge, code generation techniques like JRuby and Grails are being investigated to increase developer productivity while preserving the established JVM infrastructure. Together, the combination of these architectural approaches, with heavy influence from the HP IT transformation and a portfolio management process focused on benefit delivery, gives HP.com a model that is as sustainable an innovative as it is cost effective.
Pete Johnson is the Chief Architect of HP.com, where he has worked with over 400 engineers worldwide. Pete has presented at the HP World trade show and written for HP Professional magazine. Active in HP’s intellectual property initiatives, he was awarded a patent in 2002 with five more patents pending in the US, primarily relating to web technologies. He blogs about how improved non-technical skills can accelerate an engineering career at http://nerdguru.net.