Jakarta Commons Online Bookshelf: NET

Java Development News:

Jakarta Commons Online Bookshelf: NET

By Vikram Goyal

01 Sep 2005 | TheServerSide.com

Jakarta Commons Net is a feature rich protocol factory that allows developers to write applications that require high level protocol access. This excerpt from the Jakarta Commons Online Bookshelf module 3 explains the building blocks of these protocols and then discusses the NET API, with a few protocol API examples.

The Net component brings together implementations for a diverse range of Internet protocols. It’s a feature-rich component and had been around a long time before it was open-sourced through Jakarta Commons. (It was originally built by ORO, Inc.)

Most programmers have to deal with only a subset of the vast array of Internet protocols. The ones that are used most often include Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), and, perhaps, the mail protocols: Simple Mail Transfer Protocol (SMTP) and Post Office Protocol (POP3). However, several lesser-known protocols are also in use. The Net component features both the well known and the not so well known protocols in an easy-to-use interface.

In this chapter, we’ll explore the Net component and look at the protocols it makes available. This will help you understand the basics of these protocols before you begin using them. Next, we’ll examine the structure of the Net component and explore its API. Finally, we’ll use all this information to develop a multiprotocol handler that uses each of the protocols.

Getting to know the protocols

A protocol, in the real world, is a way something should be done. By adhering to a protocol, you’re following a strict procedure for doing certain things—for example, you shouldn’t take the last helping of dessert without asking a dinner guest if they’d like to have it!

In the computer world, a protocol is a previously agreed upon way for two machines to exchange information with each other. Without a protocol to guide and define how machines talk to each other, there would be anarchy and confusion. A protocol may determine several things: how the two machines will handshake, how the transmitting machine will initiate transfer of data, what the format of the data will be, what the recipient machine will do to indicate that it has received the data, what the recipient machine will do to indicate an error condition, and so on.

TCP and UDP: the building blocks

Before we talk about the protocols covered by the Net component, let’s discuss the protocols that are the building blocks of these protocols. A discussion of network technologies is incomplete without a basic understanding of the TCP and UDP protocols, which are the low-level protocols that act as the message carriers for the high-level, application-specific protocols.

The Transmission Control Protocol (TCP) is specified in rfc793 (See module 2, section 2.1.1, for the definition of an RFC). It’s a reliable, connection-oriented, complex protocol:

  • It’s reliable because data sent by TCP can’t be lost: The protocol marks all packets of information that it transmits with a sequence number. This allows for retransmission of packets that go missing, because the receiving end can ask for those packets by looking up the sequence numbers of the packets it receives.
  • It’s connection-oriented because a connection must be established between machines before data can be exchanged between them.
  • It’s complex because it requires error correction and retransmission policies built in at the protocol layer itself.

The User Datagram Protocol (UDP) is specified in rfc768. It’s a nonreliable, connectionless, simple protocol:

  • It’s nonreliable because packets sent via UDP may or may not reach their destination: The packets may get lost along the route due to incorrect addressing or checksum errors.
  • It’s connectionless because each packet is self-contained with the source host and destination host address. No direct connection stream is established between the source and the destination machines.
  • It’s simple because it doesn’t require a connection, and no error checking or retransmission of packets is involved. This also means that if an application-level protocol is using UDP as the underlying protocol for transmission of data, it must be ready to accept loss of data or handle error checking and retransmission of lost data on its own.

It helps to think of TCP as a continuous phone line communication channel and UDP as a standard postal letter communication channel. With TCP, a continuous full duplex (two-way) channel is established before any data is exchanged, similar to numbers being dialed before a phone call begins. With UDP, a letter is marked with a destination address and is posted; it will probably reach its destination, but it may not.

Both TCP and UDP are protocols on the Transport layer of the standard TCP/IP four-layer model, which is shown in figure 3.1. For this reason, these protocols are low-level protocols as compared to, for example, HTTP, which is considered a high level protocol.

 Figure 3.1 The four-layer TCP/IP model, showing the different levels of protocols

High- and low-level protocols

Whenever an application-level protocol is used, it’s likely to be running on top of other low-level protocols. Consider HTTP. Although this protocol is designed to transfer information between a web browser and a web server, it relies on an underlying protocol (TCP) to gather together the packets to be transmitted to the server. HTTP (built into the web browser) at the client end talks with TCP on the client’s machine. TCP, in turn, talks to a protocol at a still lower level (IP), again on the client’s machine. This protocol talks across the physical layer to the server and transmits the packets.

Conceptually, the web browser’s HTTP is taking with the HTTP on the web server end. Physically, the information goes through a series of protocols before being transmitted. A protocol at the top end of the chain is a high-level protocol—it’s the one making all the initial requests (if you ignore the user). The protocols it talks to in order to complete the user requests are low-level protocols.

Note that HTTP, TCP, and IP are software protocols; they’re conceptual entities. For data transmission to occur, it requires the presence of a hardware protocol. Thus, continuing our example, IP talks with the local hardware/network combination to transmit the information. This combination is an actual physical or wireless entity that transmits the information over network channels. Examples of these network entities—which are hardware protocols—include Ethernet, AppleTalk, 802.11g, and so on.

The protocols covered by the Net component are all high-level protocols like HTTP. They’re called application-level protocols because they’re designed to help applications on different machines to transfer information. These protocols don’t deal with issues that primarily involve creating and transferring packets, error recovery, and checksums; lower-level protocols such as TCP and IP (which aren’t covered by the Net component) take care of these tasks.

The Net API

The Net API contains all the classes that make up the Net component. No external classes or libraries are required in order to use it.

Most of the classes in the API are divided into two groups: One group deals with TCP-based protocols and the other with UDP-based protocols. Figure 3.4 shows a static UML diagram that depicts the classes in the TCP group. Similarly, figure 3.5 shows a diagram for the UDP group classes. These diagrams and the classes they contain are discussed in later sections.

The overall picture

As you can see from the diagrams, all protocol implementation classes derive from an abstract class: SocketClient (for TCP) or DatagramSocketClient (UDP-based protocols). These abstract classes provide the basic operations required for connecting to remote hosts, setting timeouts, accessing input and output streams, and closing connections, for TCP; and opening local ports and setting timeouts, for UDP. However, you don’t create instances of these sockets yourself; the sockets are created using factory classes. You can either let the SocketClient and DatagramSocketClient classes manage their factories internally, which they do by using the default factory classes; or you can create your own SocketFactory to custom-implement the way the sockets to remote hosts are created. The default implementations of the SocketFactory interface, DefaultSocketFactory and DefaultDatagramSocketFactory, are simple wrapper classes around the Java-supplied Socket and DatagramSocket classes.

Since all protocol implementation classes derive from either the SocketClient class or the DatagramSocketClient class, they inherit the common methods of connecting and disconnecting from remote servers for TCP and opening and closing sockets for UDP, whichever protocol it may be. (Note that since UDP-based protocols are connectionless, the socket that is being opened is on your machine.)

A typical TCP protocol implementation class operates as shown here:

XXXProtocol.connect(remoteHost);
--- do some protocol specific work ---
XXXProtocol.disconnect();

Figure 3.4 TCP-based static class structure for the Net API

 Figure 3.5 UDP-based static class structure for the Net API

Both the connect and disconnect methods are inherited from the SocketClient class.

Similarly, a typical UDP protocol implementation class operates like this:

XXXUDPClient.open();
--- do some protocol specific work ---
XXXUDPClient.close();

Again, the DatagramSocketClient class defines the open and close methods.

This way, the implementation classes don’t have to worry about connection and disconnection methods. All they need to do is to look after implementing the basics of their protocol.

The Net API structure

The Net API was designed around the principle of giving basic access to the low-level programmer and giving broad-based, easy abstractions to the high-level API user. Thus the API is designed around the packages: You can either access the low-level classes directly, for any protocol; or work around these classes and use the higher-level abstractions. Most developers fall in the second category of users and use the low-level classes only when they want to tweak the parameters more than the default implementations.

The low-level class is named after the protocol it implements. For example, SMTP is implemented in the low-level mode in the class SMTP in the package org.apache.commons.net.smtp. As we said before, you probably won’t ever need to call or work on this class; you’ll almost always use the high-level SMTPClient in the same package.

Note: Note the Client at the end of the protocol name that defines this class. All high-level protocol implementations have Client added to the end of the class name. This distinguishes between the low- and high-level classes for a protocol. Only complex protocols like POP3, NNTP, FTP, TFTP, SMTP, and Telnet have this distinction, though. The simpler protocols, like CharGen and Daytime, have only low-level classes called CharGenTCPClient and DaytimeTCPClient, respectively.

Almost all protocol implementations have their own package in the Net component API. Table 3.1 lists each protocol and the corresponding package(s) in which its implementation classes are found.

Table 3.1 Package information for protocols supported by the Net API

Protocol

Package

BSD rlogin

org.apache.commons.net.bsd

FTP

org.apache.commons.net.ftp , org.apache.commons.net.ftp.parser

NNTP

org.apache.commons.net.nntp

POP3

org.apache.commons.net.pop3

SMTP

org.apache.commons.net.smtp

Telnet

org.apache.commons.net.telnet

TFTP

org.apache.commons.net.tftp

All other protocols

org.apache.commons.net

Support and utility

org.apache.commons.net.io , org.apache.commons.net.util

In the next few sections, we’ll look at some of the complex protocol package structures, as shown in table 3.1. Protocols like CharGenTCPClient are self-explanatory after you’re familiar with the complex ones.

org.apache.commons.net.bsd

This package contains the following three classes, which implement the rlogin protocol discussed earlier:

  • RExecClient—The base class, which provides functionality for executing commands on BSD Unix, equivalent to the rexec() command.
  • RCommandClient—Extends RExecClient and provides facilities for executing rcmd(). This command allows remote trusted hosts to connect to a Unix server without requiring explicit authorization. An added constraint on the rcmd() utility restricts client machines to connect to the server only between port numbers 512 to 1023. The RCommandClient class enforces this restriction internally without your having to specify it.
  • RLoginClient—A simple extension of RCommandClient. It logs in using the trust relationship established in the superclass.

Notice that since all three classes’ names contain the word Client, the classes can be used individually at a higher level.

org.apache.commons.net.ftp and org.apache.commons.net.ftp.parser

The FTP classes have been divided into two packages as an enhancement of the older FTP implementation of Net. The new package org.apache.commons.net.ftp.parser contains classes that implement the FTPFileListParserImpl abstract class. These classes make it easier to handle different file lists. (We’ll talk about this later.)

The main package org.apache.commons.net.ftp contains the FTP and FTPClient classes, which are the low-level and high-level implementations, respectively, for the FTP protocol. In addition to these two classes, there are several classes that supplement FTP behavior. The FTPCommand and FTPReply classes are constant classes that contain FTP command and reply codes, respectively, as constant integers.

In addition, the FTPReply class contains several convenience methods that allow you to test and group the status of the reply code sent by an FTP server. Thus, if you wanted to test whether the reply sent by the server was preliminarily positive, you’d use the method boolean isPositivePreliminary(int reply). The complete list of methods is as follows:

  • isPositivePreliminary(int reply)—The server sends a positive preliminary reply only if part of the original request has been completed and the user should expect another reply before sending a new command. All reply codes that start with 1 are examples of such a reply.
  • isPositiveCompletion(int reply)—Determines whether the server sent a reply indicating that the original command was successfully completed. All reply codes that start with 2 indicate that the request was successfully completed.
  • isPositiveIntermediate(int reply)—Determines the success of part of a series of commands sent to a server. The reply codes for such responses start with 3. For example, the server would send such a reply code for the USER command (if the user specified by this command was accepted by the server). The server would still wait for the user to send the PASS command.
  • isNegativeTransient(int reply)—Indicates that the original command couldn’t be executed but that if the user sent the command again, it would probably succeed. These reply codes start with 4.
  • isNegativePermanent(int reply) —Indicates that the original command couldn’t be executed and that retrying the command is also likely to fail. All reply codes beginning with 5 are examples of such a reply.

A file on the FTP server is represented using the FTPFile class. This class is an abstraction of the way a file could be represented over several servers and contains methods that let you gather meta information about the file. This information represents the attributes of a file as it’s stored on the server. For example, there are methods that allow you to gather the size of the file or its name or group. You can also get information about the permissions associated with the file. More information may be available from some FTP servers about the files on the server, so you may want to subclass the FTPFile class and add those enhancements.

But how does the Net component get information about these files on the server? The Net component runs on the client side, and nothing in rfc959 allows an FTP server to disclose information about individual files. The Net component overcomes this problem by asking for a list of files in a directory by using the FTP command LIST and then parsing the reply from the server to create an array of individual FTPFile classes that more or less represent individual files on the server. This is where the parser package comes into picture.

When the Net component was initially released, the FTP package contained only a single parser called DefaultFTPFileListParser. This parser implemented the FTPFileListParser interface, which contains a single method:

public FTPFile[] parseFileList(java.io.InputStream listStream)throws java.io.IOException

The DefaultFTPFileListParser class is suitable for most cases. However, using the parseFileList method of this default parser led to problems in cases where there were a number of files in a single directory: Because this default parser created FTPFile objects for each file in a directory, there were memory and performance issues with large listings. To resolve this problem, a new set of parsers was implemented. These parsers don’t create FTPFile objects until they’re utilized by the end user. Instead, a whole directory listing is represented using the FTPFileList class, which contains methods that let you extract or iterate over the files. In addition, a new interface called FTPFileEntryParser was defined. To maintain backward compatibility, FTPFileEntryParser was implemented using a class called FTPFileListParserImpl. This class implements the old interface ( FTPFileListParser) and the new one ( FTPFileEntryParser) and provides default functionality for the old interface. The new parsers are collected in the org.apache.commons.net.ftp.parser package, and all of them extend the FTPFileListParserImpl class.

Finally, there is an exception class called FTPConnectionClosedException, which is dedicated to catching premature or unexpected closing of connections. This exception extends the IOException class and is thrown when the FTP server closes the connection due to inactivity. To make sure this exception isn’t thrown, you should regularly use the method sendNoOp(). This method sends the NOOP command to the remote server, keeping the connection active; it does nothing else.

org.apache.commons.net.nntp

This package contains classes for the NNTP implementation. The classes in this package are as follows:

  • NNTP—A low-level NNTP implementation that provides direct access to NNTP commands
  • NNTPClient—A high-level NNTP implementation that accesses the NNTP class and provides convenience methods for use by developers
  • NNTPCommand—Contains NNTP command constants
  • NNTPReply—Contains NNTP reply constants
  • NewsgroupInfo—Provides information about a newsgroup
  • NewGroupsOrNewsQuery—A utility class that issues queries for new groups and new news
  • ArticlePointer —Contains information about the location of newsgroup articles
  • SimpleNNTPHeader —Lets you create message headers
  • NNTPConnectionClosedException —An exception that is thrown when the server sends a 400 error code, indicating a closed connection for a command the client is sending to it

While using this package, you’ll almost always work with the higher-level NNTPClient class. The low-level class provides convenience methods to send various NNTP commands directly; for example, the newgroups(date, time, GMT, distributions) method lets you send the NEWGROUPS command to a server and returns the reply code as a constant. There are similar methods for all the NNTP commands.

The NewsgroupInfo class represents information about a newsgroup. Methods in this class allow you to get the count of articles in a newsgroup ( getArticleCount()), get the first and last article numbers ( getFirstArticle(), getLastArticle()), get the newsgroups name ( getNewsgroup()), and determine whether posting of articles in this group is allowed ( getPostingPermission()). A NewsgroupInfo instance is created as the result of invoking either the listNewsgroups() or listNewNewsGroup(NewGroupsOrNewsQuery query) method on the NNTPClient class.

NewGroupsOrNewsQuery is a handy class because it abstracts the tasks of working out times, dates, and group names and leaves you to specify the date from which you want to retrieve the new groups or new messages. Recall that NNTP lets you retrieve new groups and messages by issuing the NEWGROUPS and NEWNEWS commands. However, these commands must be issued with respect to a timeline from which a group or message is considered new. Further, you may want to narrow the search to groups starting with a particular name. The NewGroupsOrNewsQuery class helps formulate such queries by allowing you to specify the date and group (called the distribution) you want to search for new groups or messages. Here’s an example of how you would use this:

NewsGroupsOrNewsQuery query = 
  new NewsGroupsOrNewsQuery(new GregorianCalendar(99, 3, 3), false);
query.addDistribution("comp");
NewsgroupInfo[] newsgroups = client.listNewgroups(query);

This code lists all newsgroups that are more recent than midnight of March 3, 1999 and that contain the prefix comp.

The ArticlePointer class is a small structure that contains information about the location of an article. It’s used as a result holder when the STAT command is issued; this command returns the message (article) number and the message’s unique ID.

The SimpleNNTPHeader class is used when an article (message) needs to be posted to a group and the appropriate headers for the article must be created in a format that is acceptable to the group. It contains methods to add the newsgroup to which the message is being posted and any other headers that may be required. A SimpleNNTPHeader instance is created by passing in the From address and the subject of the message being posted.

See module 2, section 2.1.1, for the definition of an RFC.