Java Development News:

Book Review: Algorithms in Java

By Joseph Ottinger

01 Jun 2008 |

Algorithms in Java, by Robert Sedgewick, is a Java-based version of a curriculum written in 1983 by Professor Sedgewick. It's currently in two volumes: the first covers fundamentals, data structures, sorting, and searching, and the second volume covers graph algorithms. Both use idiomatic Java for all code examples, and provide a thorough walkthrough of the underlying concepts covered.

The first volume addresses such things as:

  • Terms ("What is an algorithm? What does connectivity mean? What makes something a set? What does it mean to find something?")
  • Data structures (familiar ones like arrays, linked lists, and even strings, followed up by trees)
  • Sorting (Quicksort, merge sort, two-way merging, priority queues, radix sorting, and special sorts)
  • Searching (... in which we meet more data structures, like BSTs, balanced trees, specialized hashes, and a data structure called a "Trie.")

In Java, many of these don't "feel" relevant. After all, Java has a canonical concept of what a Set is, or a List, or a Map; it even has specialized implementations of these such as TreeSet, HashMap, LinkedList, et al. You have implementations of sorting algorithms as part of the API, without having to do anything to get them. With recent versions of Java, they're even largely typesafe.

However, based on conversations online about what set methods are available, it doesn't seem like everyone knows about these, and when they do, they don't know the cost of the Collections.sort() method, for example. All they know is that it's usually "good enough."

Being "good enough" is excellent, really, and serves as a testament to Java and the runtime library's coders. However, if "good enough" isn't actually good enough, where do you turn?

To books like Algorithms in Java, of course. Here, you'll learn exactly how to write your own balanced trees - which happen to be just like TreeMap, as it turns out. (Look at the TreeMap source code; it's an implementation of a red-black tree, which is a very efficient storage mechanism.) However, there are cases where a TreeMap isn't the best choice - and neither is HashMap, which uses a backing array to store the elements. In those cases - where you need performance most - the "good enough" API makes things much harder, unless you know the storage algorithms Sedgewick describes.

The algorithms Dr. Sedgewick describes are actually fairly well known. It's not that hard to look up a Patricia Trie, for example, nor is it especially difficult to find an implementation in Java - but using an implementation you don't understand has implications in and of itself. The value Dr. Sedgewick provides is in offering you the theory behind each data structure, and offering some benchmarks in terms of what the data structures offer you.

(For example, the red-black tree implementation - the implementation offered by the Collections API in TreeMap - is the fastest general-case tree. Nice, eh?)

The second volume - which covers graph theory - is a little more esoteric. The first volume offers you more concrete uses, while graphs tend to be harder to grasp outside of direct visualization - and the graphs here are thought of as connected nodes, not pie charts and the like. Here, graphs are referring to things like best-path discovery, cycle detection, etc.

In every case, co-author Michael Schidlowsky has written Java implementations for the concepts Dr. Sedgewick describes, so readers will not only come away with an explanation of a Patricia Trie, but code that implements the Trie itself. The same goes for every other data structure, sorting algorithm, graph mechanism in the books.

The only issue with the code is that while it's generic, it doesn't use Java generics - which means that some of the type safety offered by generics is lost in translation. That doesn't mean that you couldn't add generics to the code, but it's something to keep in mind, if you've gotten spoiled by the cleaner Java 5 syntax.

All told, this series is one of the best series on actual code around. While thinking about sorting or data storage may not be the sexiest or most appealing thought at first, when you dig under the covers of how you organize your own data, the concepts expressed and explained by these volumes are invaluable.

Related Topics: JSRs and APIs, VIEW ALL TOPICS