Java Development News:
Corporate Data Centers, a soon to be extinct species?
By Billy Newport
01 May 2001 | TheServerSide.com
Increased network bandwidth coupled with the adoption of open standards such as J2EE or the Linux API for enterprise application development will lead to the IT-less corporation. Open standards will decouple applications from the platform they run on, which will increase the competition between hardware platform providers. Increased network bandwidth will allow companies to access their data center remotely. These two factors can foster the creation of a new market where corporations outsource their data centers to companies that can use economies of scale to provide application hosting solutions for cheap. This article will show you how the 'IT-less' corporation will emerge, and how mainframes or large partitionable Unix servers will play a key role in the 'IT-less' corporation.
Most large enterprises currently have their own data centers and hardware support staff. The data center holds all the servers needed to run the company's applications: databases, application servers, custom applications, and messaging servers. The support staff is responsible for backups and hardware maintenance such as fixing failed disks, adding more storage, processors, or memory. Thus support costs increase as the number of servers in a data center increase. Keeping data center operating costs low is a key goal of every corporation seeking to improve its bottom line. Two key strategies for reducing data center operating costs are server and storage consolidation.
Given the fact that we pay a fixed cost per machine, it may be more cost effective to run more than one application on a server. A larger server is required when running more than one application per server. Although a larger server costs more, the support savings over 2 to 3 years can make the larger server more attractive.
The high cost of large server machines, which can cost in excess of $1 million US, requires a big commitment from a company. Sometimes application teams are nervous about sharing a machine and want exclusive root access. To address these fears, companies like Sun and IBM are selling machines that can run multiple independent Unix installations concurrently while dynamically allocating hardware resources such as CPUs and memory among the Unix installations.
Multiple Unix installations per machine provide advantages to application teams because it allows them to easily scale the hardware to growing needs. Without server consolidation an application team will run the application on a small box (i.e. at most 4 CPUs). If the application grows to need more CPU power then we would need to upgrade from a small to a larger box. By using server consolidation, it may be possible to simply allocate a spare processor on the large server to the application partition on the box.
Anybody who has tried to order Unix servers lately will know the lead times. It can take a few weeks for the boxes to arrive. If your company uses consolidated servers then it may be possible to allocate some spare processors and build your server in a couple of days; this should cost less than buying your own server and supporting it.
The problem with server consolidation is that the potential cost savings available are only possible for companies with large enough data centers who can invest the capital and make use of it. Such installations can easily cost millions of dollars. Smaller companies may lack the skills or the money to build a 'professional' data center based on server consolidation.
Equipment recycling problems
A company that buys a lot of small servers for specific applications has a problem when these applications grow. It's more difficult to recycle a low-end server (<4 CPUs) than a medium or high-end server (> 4 CPUs).
When a company needs to upgrade an application it usually purchases a larger server or tries to re-deploy the application on a larger box that has spare capacity. Consequently, the existing boxes can become a problem if they cannot be reused. For this reason, companies tend to avoid buying low-end boxes. . As a rule of thumb, the smallest box to consider should be able to expand to 4 processors or more.
J2EE can help with recycling smaller boxes
If you running a J2EE application then the chances are that it can scale horizontally. This provides flexibility when scaling up a J2EE application. It should not be necessary to re-deploy the application on a larger box because we can just add another box running the application to get more performance. This flexibility should allow companies and outsource service providers to leverage the smaller servers in the data centers. However, per machine support costs would need to be rationalized for this to be viable. The data center may need to be occasionally rationalized and machines may need to be consolidated in order to make savings on support and floor space.
A big advantage with J2EE is that you have a choice. You can choose to scale horizontally or vertically. If you scale horizontally you can recycle your smaller servers; however, these servers will still cost you more to operate than the big servers. If the support costs become too high, you should be able to consolidate smaller boxes to larger servers and reduce your support costs. The horizontal/vertical scaling choice gives you a lot of flexibility in terms of the hardware topology (cluster type) that you deploy and is a big advantage of application servers. Server applications should be designed with this in mind. You won't get this for free just because you are using a J2EE server. You'll need to look at how you can develop your application so that it can take advantage of these features. You'll still need distributed systems experts available to your development team to be sure the application will really scale vertically or horizontally.
Problems with Storage Consolidation
Most large storage servers currently have limitations on how many hosts can be attached. These boxes are very expensive, starting at 500K USD, moving up to 3-4M USD. Ideally, you attach your storage server to a SAN and then attach your application servers (boxes, not J2EE) to the SAN using a fiber port. While you can attach as many servers to the SAN as your SAN network allows, a storage server will have a limit of how many hosts it can support. This is a big disadvantage of a SAN.
Before, SANs storage servers were attached to servers directly using SCSI channels. The SAN allows servers to connect to the storage server via the SAN rather than directly, potentially removing the need for X SCSI channels on the storage server for X servers, which gives more flexibility in terms of the location of storage server/application servers: They don't need to be co-located anymore.
Now, we just plug the storage server into the SAN and then attach as many servers as we want to the SAN. However, there is still a limit to the number of servers that can be attached to a storage server via the SAN (typically between 16 and 64). This causes a problem.
Suppose we had a data center with a lot of small Unix servers or PC servers. They are all probably configured in pairs for HA (high availability). Typically each application doesn't need a huge amount of disk space (1-20GB). If the storage server can only handle 20 servers then we're spreading the cost of a 2 million dollar box among 20 low-end (read low budget potentially) applications. The groups that are responsible for those applications will not accept this cost. It costs 100K for the hardware plus the support staff. The storage server group may require them to 'rent' a 100Gb of space. If the application only needs 5Gb then they won't go with the SAN. They will probably rent more rack space, and buy a storage rack populated with disks and attached with fiber to the servers. This is likely to be more cost effective than going the SAN route.
A storage server is cost effective when fully populated with storage. The cost per gigabyte drops as you attach more storage. The storage server cost is fixed and as we attach more disks to it the overhead cost of the server is shared among more disks. However, if we attach 2 or 4 terabytes of storage to the storage server and can only use 64 hosts then each host needs to 'rent' 62Gbs of space. This is probably too much storage for most applications.
The point here is that due to the attached server limit of storage servers and their high cost, the only way to get them to pay for themselves is to use fewer larger hosts that need more storage. Running multiple applications on a larger box drives us towards this goal. If we have fewer servers on the SAN, and if each server uses more storage by running more applications, the SAN becomes a good idea and helps us lower the costs of supporting the application/hardware infrastructure.
Why not outsource The Data Center?
Suppose current box makers such as IBM or Sun decided they are going to build large high end data centers filled with big computers and SAN servers. With such a setup IBM or Sun could host your entire data center from afar. You get full root access to your server and can install whatever software you want on the box, add users …etc. Sun or IBM would be responsible for the hardware side of things.
Hosting from a distance requires a very fast network connection between your company and the data center. Fast network connections are possible today. The network traffic would be encrypted using VPN (virtual private network) technology to ensure data security.
You would use a web page to login to their data center and then 'order' a server with X CPU power, Y storage and Z memory. The data center provider would then configure their SAN to provide the storage, configure a large server to create a new partition with the number of processors you need, allocate the new SAN volumes to the box, and then install the operating system and assign an IP address and host name. The next day you receive an email that your 'machine' is ready to go and is guaranteed not to be down for more than 30 minutes a year. Since (y)Your storage is mirrored or uses RAID-5 for safety, you don't need to worry about disk spares or mirroring, etc; it's taken care of. You can order more complex configurations such as pairs of machines configured in a high availability pair (database servers for example) over the web.
This is a very flexible approach and means that even small companies can leverage the savings made possible by building their server infrastructure at their own pace with much lower capital investment than if they were to build their own data centers. In order for the outsourcing to work the following problems need to be worked out.
Data Center Vendors may need to repurchase servers from clients
Companies that already have data centers may not be tempted if outsourcing the boxes means throwing away the existing server boxes and/or storage already purchased. The outsource vendor may need to purchase their boxes from the customer to offset the potential loss. These repurchased boxes can be either resold or integrated into the data center. Only large boxes are probably suitable for integration although J2EE based applications are being built to scale horizontally (run on multiple boxes). This may allow J2EE applications to be hosted on farms of smaller boxes (2 or 4 processors).
Cost of WAN is a crucial factor
If a company agrees to shutdown its existing data center and move all the applications to an off site center, then the WAN between the data center and the company's corporate LAN needs to be as good as a LAN in terms of bandwidth, latency, and reliability.
If the cost of this can be lowered to the point where it doesn't offset the savings that outsourcing brings then the whole approach becomes viable. If the WAN cost is too high then it may significantly weaken the financial case for outsourcing the hardware.
Data Center Hosting is different than Web Hosting where in some cases the web server runs with no or limited connectivity to the corporate LAN. If a company puts its entire server infrastructure on the other side of a WAN then it needs to be fast reliable and cheap.
A cost effective wan can be built using a thin client solution like Citrix. Here, the client applications run on the data center and users on the corporate LAN use Citrix to access these applications. The applications themselves run on large Unix or Windows servers in the data center and are co-located with the servers that are also in the data center. This further lowers the costs for the client, as they no longer need to maintain PCs on everyone's desktop and the PC desktops can be managed more cost effectively.
Company Storage Security is a significant problem
I think it's one thing for a company to outsource an email system but it's an entirely different matter for an investment bank to outsource its core software and databases. Outsourcing vendors need to assure the client regarding security concerns and they must do this convincingly. Security is probably the main inhibiting factor in this scenario developing for now.
Full Commercial Application Hosting
Full data center hosting is not the only way for outsourcing a company's IT needs. With Full Commercial Application Hosting you just tell the hosting company, I want an Oracle or a Sybase database with this much disk space and this much processor/memory power. They then create such a database on a box at their data center and give you administrator access to the database.
You don't get to choose which box or operating system hosts the database. You are only interested in the SLA they provide. It shouldn't matter to you what platform is hosting the database.
I think this is IBM's long-term strategy. They bought Sequent to get high-end Intel server technology. They can now build Intel boxes with 64 PIII processors. That's a lot of processor power and in terms of 'bang for your buck' , it is probably lower than any other platform.
IBM is building the new version of AIX (5L/Monterray) so that it can run on Itanium, or RS/6000s. AIX 5L will also run Linux software and will support the Linux API. RS/6000s and Intel servers are the most price competitive Unix platforms available. Companies selling these application-hosting services want the cheapest hardware platform that can deliver their SLA with the software. It doesn't matter what the operating system or hardware is so long as they can deliver it at a competitive price. Large Intel boxes or large RS/6000s such as S80s or future revisions running applications such as Oracle may give IBM a competitive advantage over Sun in this market where Sun's branding is arguably less important. So long as the vendor can give you Oracle at the required level of performance with the required level of reliability then it doesn't matter whether it's running on Linux, Windows NT, CPM, or Solaris.
Sun is being very quiet on the subject of running recompiled Linux applications on Solaris through a similar Linux layer. This is attractive for companies that have applications running on Solaris that can't be ported for cost or other reasons. They can start using Linux as their development platform and then move from Solaris to a lower cost platform at a later point.
The Importance of Linux for Application Hosting
Linux's hardware platform neutrality is important in application hosting as it allows Unix applications to be written and then deployed (by recompilation) on a choice of platforms. This is a key point for the application hosting companies. They want to be able to run these applications on the most cost competitive hardware platform regardless of whether it's Sun, IBM or Intel. They really don't care so long as it's price competitive.
Linux based software is also important for companies developing applications as they know that they will be able to choose the most cost effective platform for their applications and still be able to change their choice in the future. This flexibility gives them the option to start using a data center and move to a new, cheaper Linux platform, if it becomes available and if it makes financial sense to do so.
Linux API rather than kernel may be best Linux asset
The current Linux kernel has problems running on very large boxes (>4 CPUs). This may not be so important, because IBM (AIX5L) has said that it will support the Linux API on these new operating systems. This is just an API wrapper that allows Linux software to be easily ported to these platforms. The underlying kernel will be a normal AIX kernel that does scale up on large boxes.
The Linux kernel may eventually scale up on these large boxes but right now, it looks like the hardware vendors are looking at supporting the Linux API mainly on these boxes. Sun seems to be the exception here. They only support Linux on their Intel Solaris offering. They are using an emulator that will run a binary Linux application on an Intel Solaris box but they don't support this technology on Sparc and they also don't plan to support the Linux API or support Linux porting tools (a la AIX5) to ease porting or to allow applications to be written to the Linux API and built on Solaris. I think this shows the difference in thinking between Sun and IBM in the Unix space. Sun, the leader, is trying to prevent in-roads by keeping you locked in to Solaris where as IBM, the underdog, is eagerly supporting the Linux API to try and gain share by being open etc.
Application Server- based Applications are suitable for Data Centers
If companies start to deploy applications on J2EE servers then data center outsourcing becomes more appealing. Applications written to the J2EE specification should be easier to port to a different platform that also supports J2EE. A customer can develop a J2EE application using application server A and then deploy it to their outsourced data center that may be running a different vendor's J2EE server. Currently deploying the same application on J2EE different J2EE app servers is difficult for the following reasons.
- J2EE specification is still maturing. There are still lots of features that are missing from the specification.
- J2EE vendors may not be keeping up with the J2EE specification.
Implementing the specification is crucial for portability. IBM has been guilty of not doing this in the past. WAS 4.0 should help them in this regard but people are already saying what about EJB 2.0 support, will IBM support EJB 2.0 a year later than the others or not.
If, and it is an 'if', the J2EE specification matures and becomes widely adopted by companies then this should help the outsourcing phenomenon. Linux will help from an infrastructure point of view. Middleware providers (databases, messaging, application servers etc) would write to the Linux API making the applications portable. ISVs and companies doing custom applications would use J2EE for their new eBusiness applications. This should port well to 'value' platforms so long as the J2EE specification is adequate. This is obviously a best-case scenario, however.
Linux/390 and VM/390
IBMs VM technology on 390s is tailor-made for an outsourced, 'many applications on large boxes' type model. Their VM technology is probably the best on the market. IBM currently recommends buying a 390 box and then consolidating your Unix machines on it. Run each UNIX installation in its own VM and configure the VM to guarantee a set of resources for it. The 390 VM has a feature called WLM that allows the VM, almost second by second, to allocate resources for your application. This level of sophistication is still beyond the Unix vendors who are attempting to do the same thing, but they usually just allow you to add a complete processor permanently to a partition and remove it later; however, this is not as convenient as the VM approach.
The trouble with the IBM approach is that it doesn't address a large percentage of the potential market. It's attractive to the following types of companies:
- Linux ISPs.
They can use it instead of a large farm of Intel Servers so long as they can recompile the software for Linux/390. A 390 configured like this with a SAN should let them do this at a lower cost than having a large farm of Intel boxes.
- Existing 390 shops that want to use Linux software.
If you already have a 390 then this may work for you. You keep running your old apps on 390 and using Linux VMs for your Linux applications. This only works for software that can be converted to Linux/390.
If you just want to run Apache, Perl, cgi or mysql/postgres or any software that is open source then it'll probably work, simply by recompiling the source on the 390. However, if you need Oracle, Sybase or other commercial Linux software then you may not be able to get a version compiled for Linux/390. You then enter the situation where you also need Solaris hardware etc and this may eat in to the savings that running the 390 was bringing. It may still be viable but it's less attractive than running everything on 390.
I heard some people suggest that companies could port their existing Solaris software to Linux/390 and then consolidate the applications from their existing Suns to a 390 box. This may be possible but the non-trivial costs of porting the software may put a big dent into your potential savings and reduce the viability of the scenario. If software vendors start supporting Linux as their Unix version then this porting should be eased dramatically.
IBM is looking at leveraging the huge investment that it has made in the 390 technology. But, they'll only be able to do this if Linux takes off for server applications and third party software is ported to Linux/390. Alternatively, J2EE technology may allow OS/390 to host applications that in the past would have required extensive porting in order for them to run on 390.
Another issue with Linux/390 is that it's a Linux kernel. The 390 has a lot of features in hardware and in VM but the Linux/390 kernel currently doesn't make use of these features. As a result you don't get the same level of reliability with Linux/390 when compared to a native mainframe application.
Sun isn't standing still
Of course, Sun has also seen the writing on the wall. They have been very busy trying to bolster Solaris so that it also has mainframe like features such as multiple dynamic partitions on a single large server. Currently, IBM is in front if you use 390 boxes but for Unix boxes it's currently a game of leapfrog: Solaris versus AIX. IBM's advantage over Sun is that if the platform doesn't matter then IBM currently offers a price/performance benefit over Sun especially when you consider that IBM hedged its bets on a choice of platforms (Itanium, Pentium 4 or RS/6000) with which it can compete against Sun in this respect. This is the essence of IBM's support of Linux.
Of course, Sun will argue that you need all of the Solaris APIs to build real applications that take advantage of the high-end features of Solaris. IBM says the same thing. These APIs are the vendor's way of locking you in. What we really need is a standard API and I think Linux could succeed where the various Unix standardization bodies have failed in the past.
This is where the Linux groups are making a mistake IMHO. They should be looking at adding high-end features to the Linux API rather than Sun through JSRs add high-end features to J2EE. I really think the future of serverside Linux is tied to its API and how well the API can be extended to support high-end features that are found on proprietary Unixs such as Solaris/AIX.
Full Custom Application Hosting
Here, the company has written applications internally that it wants to run on outsourced hardware. Typically this software will be specific to a platform such as Solaris, AIX or Windows. The data center vendor then needs to provide an infrastructure to support that platform in a cost effective manner for customers requiring that platform. Farms of Solaris and AIX boxes can share the SAN infrastructure for cost savings also. The vendors may just have to provide these separate platforms in any case. The 'best' case for the vendor is that it only needs a single platform. Linux is the key to platform choice since Linux software can be readily ported to whatever that platform the ISP chooses.
Market Places for IT services
Once application hosting can have an SLA specified then you could imagine Internet market places where a company looks for bids for outsourcing its hardware needs. The RFQ would include the applications, the outsourcing term and the required SLA for the applications. The vendor that can offer the lowest cost and meet the SLA gets the job. Such a marketplace has the potential to further lower costs for clients as data center vendors compete against each other on price to get the work. The contracts could be renegotiated periodically as costs dictate.
Once the vendor is hosting your applications on its hardware then the next step may be complete outsourcing including development. Large vendors such as IBM or companies with offshore development labs may be able to maintain and develop new applications for you cheaper than you can.
The Biggest Market for Storage and Processor Outsourcing
The home market is the biggest for storage and processor outsourcing. Digital media is becoming more and more popular. Digital Images, Audio (MP3s) and more importantly Digital Video require a lot of disk space. As consumers start to purchase digital content such as music and movies in electronic form, they will need plenty of storage space, and backing up this 'property' will be problematic for most consumers.
Consumers may also use Tivo type live TV recording and again, storing these recordings of live shows will require a lot of storage. If this is ever to compete with VHS cassettes then cheap, easy-to-use storage is a requirement.
Software for managing this digital property (movies, music, photo albums,etc) is also required. This software could be provided as a bundled service with the storage. Consumers may view such a service as just another utility like electricity or water. These services will probably be provided by cable TV companies or broadband companies. The cable companies may be the direct customer of the data center, buying processor and storage in bulk to resell to consumers. Selling directly to the consumer may be too expensive for storage providers.
So, at the end we arrive at the 'IT-less' company. All IT is outsourced to a vendor that can do it cheaper through economies of scale. The clients concentrate on their business and use the partner for all IT needs. Linux is a key enabler in terms of cost. Linux allows applications to be readily ported to the most cost effective platform. One can imagine tiers of vendors.
Imagine if EMC, IBM and Sun stopped selling boxes directly to the customers. Instead EMC will rent you gigabytes or terabytes of storage accessible over the Internet. SAN over IP technologies may help encourage this approach. Equally, IBM and Sun can rent you processor power that you also access over the Internet. You tell them they can get the storage from the vendor you bought it from. A company may contract directly with them and get access to their boxes over a very fast Internet.
Another value-added second tier may provide the application hosting services. They buy computing power from IBM or Sun and add value by providing the administration or development staff depending on the level of outsourcing.
The 'IT-less' company could be a very different world indeed. Your processing power may be located in the US, storage in Canada and administrators in Mexico. It sounds far fetched but if internet bandwidth increases and costs continue to fall then this may be possible in the next few years. The following points are key to the emergence of the IT-less company.
- The Internet's growth and capacity for growth must increase at a huge pace if the IT-less 'vision' is to become a reality.
- Hardware companies will become service companies.
- Companies should start using J2EE and Linux standards for developing new applications. This decision will give them the choice of platform when the time arrives to outsource their applications.
- J2EE and Linux need to be extended to support APIs to allow these applications to be built portably without losing the high-end value offered by proprietary operating systems.
- Linux's kernel has a limited role unless it can be proven to be competitive with Solaris and AIX on these large boxes (vertical scalability, dynamic partitioning, etc).
- IT staff will become less common in normal companies.
- The location of your IT assets becomes unimportant. Fast, reliable and guaranteed low cost bandwidth allows this to happen.
- Business Analysts will become the geeks of normal companies.
- Households/cable companies may become the biggest users of such a service.
|AIX||IBMs version of the Unix operating system.|
|API||Application Programming Interface. A set of routines used by a developer for a specific purpose.|
|Citrix||This is an extension for Windows servers that allows application running on the server to be displayed on a low cost network computer or PC over a network. It is similar in function to X Windows on Unix machines. See http://www.citrix.com for more information.|
|CPU||Central Processing Unit. This is a heart of a computer, common examples are Intel processors, Sun SPARCs and IBM/Motorola Power PCs.|
|EJB||Enterprise Java Bean. This is a component that is hosted by a J2EE application server. It is a form of distributed object.|
|IMHO||In my humble opinion.|
|ISV||Independent Software Vendor. This is a company that writes software applications.|
|IT||Information Technology. A general term for technology systems|
|J2EE||Java 2 Enterprise Edition. This is a Java based standard for application servers.|
|JSR||These are the way extensions to the Java platform are discussed, specified and then later implemented|
|LAN||Local area network. A network that usually connects devices to a common network within a building. This is usually rated at 10 or 100 Mbits (million bits) per second. Gigabit speeds are also now available.|
|Linux||An open source version of the Unix operating system|
|OS/390||An operating system for IBMs 390 mainframes. This usually runs under the VM operating system.|
|RAID||Redundant Arrays of Inexpensive Disks. This is a technique for clustering cheap disks so that they offer improved performance and reliability at a lower cost.|
|RFQ||Request for Quote. This is a document that requests a price from a provider for a particular service.|
|SAN||Storage Area Network. This is usually a fiber (2Gb/sec) based network to which high speed storage devices and servers are attached. The SAN connects the attached servers to the attached storage.|
|SCSI||Small Systems Control Interface. This was once the dominant mechanism for attaching drives to servers or workstations. It is now being replaced by SAN technology.|
|SLA||Service Level Agreement. This is a contract between a provider and a customer where the details of the service to be provided are detailed. It may include maximum downtime per year, per incident, penalties, maximum response times for applications etc. It will also include pricing for any services being provided.|
|Solaris||Suns version of the Unix operating system.|
|VM||An operating system for 390 mainframes that runs other operating systems in parallel on the same physical box. Resources on the box are assigned (dynamically) to these operating systems by administrators.|
|WAN||Wide area networks. A network that spans locations. The speed of this is typically lower than a LAN and the internet is the best known example of a WAN. But, speed are increasing and the costs of such connections are dropping.|
|WLM||Workload Management. This describes how to control the execute of a job so that it is completed by a certain time. WLM on a mainframe is used to guarantee a level of performance to the owner of a job that must be executed.|