7 tips to choose the right Java library
Your application is only as secure and reliable as the external libraries you use. Here's a list of the top 7 things to consider when choosing a software library for your project.
Java's a great language to work with, for a lot of reasons. One of the strongest is its ecosystem -- if you need a library to do something, there's a good chance that someone else published a library for that function.
That said, choosing a library is rarely that simple. To me, there are seven factors to consider when selecting a library.
How to choose a software library
Here are the seven top criteria to consider when choosing a software library for your project:
- Function and form
- Amount of effort saved
- Deployment size
An eighth consideration, search engine optimization, is mostly a passive consideration for library consumers but is vital for library authors.
Let's step through these one by one.
Function and form
The first consideration about the library's function is, "Does this library do what it says on the tin?" The secondary consideration is, "Is it a good fit for what I actually want?" Put another way, make sure the library performs its function in a way that fits your needs.
If your goal is to parse freeform dates -- for example, an input of "tomorrow" should yield a LocalDate that matches tomorrow's date -- there's a simple pass/fail rule. Find a library that takes a string as input (i.e., "tomorrow") and passes back something that can be used as a LocalDate. If the current library doesn't meet your basic requirements, you can find a better option out there.
There's a level of effort to consider here as well. Imagine that the aforementioned date parsing library converts "tomorrow" into a date string instead of a date, but that string can be consistently parsed by LocalDate. It might be acceptable, but the level of work is higher than if the library returned a date directly.
If the library passes back something that you can nudge into the data type you want - for example, instead of a LocalDate, it returns a string that LocalDate can parse -- it might be acceptable to keep using that library.
If there's a bigger jump from what is to what should be, look for a different library or patch the one you have and submit a pull request to the author.
Another aspect closely related to function and form is the level of documentation available for the given library.
The "right level" of documentation is hard to qualify because it depends on the complexity of the library's tasks. A library that converts "tomorrow" into a date might have limited documentation that says little more than, "Here's how to install the library."
By comparison, when you work with the Spring Data project you want an in-depth manual, perhaps a book. The subject matter is entirely nontrivial, and users must understand a lot of corner cases.
Also, consider how and for whom the library's documentation is written. It might target developers of the library, and focus on internals and Big-O notations of data structure complexity. As users we might care about that a little, but we probably want to know more about how to use the library, not develop it.
If you can't easily use the library, keep looking for another one that not only fits your use case but also how you think about your problem.
Another very important consideration is the library's license. Again, there's no simple metric for a "good" license. Some open source licenses are commonly accepted as being "good," but that doesn't mean they satisfy your legal requirements. I am not a lawyer -- I offer assumptions as conventional wisdom -- but in the real world that's simply not good enough.
If library licensing matters to you and your company, consult a lawyer. A judge will not accept "... but I read an article" as legal justification.
The licenses for the libraries you select should be "permissive enough" for the given circumstances. In general, permissive licenses that fit most peoples' needs are the Apache License v2.0, BSD and MIT licenses. Others also may work, but with more risk. The LGPL is also usually fine, but in my experience lawyers get hives at its mention. The GPL tends to be a viral (bad) license in its intent, even if Java's deployment model somewhat mitigates its harmful nature.
Again, if you have a question about licensing -- and you should -- ask a lawyer.
Amount of effort saved
I've cited the example of parsing relative dates partly because the amount of effort expended to fulfill a requirement is unknown. How would you parse "tomorrow" as tomorrow's date? How much work would it take to address that as a requirement?
Compare the base level of effort it will take to satisfy a requirement, to the effort to find a library wherein someone else has already done it. A library search is fast if you know common search terms library authors might use. However, you also must check out the API and whether it does what it says on the tin, and confirm the library license is suitable for your use.
Most people just search for a library and use it, and evaluate the other metrics for suitability later. Ask yourself how hard it would be to actually implement what you need yourself. If literally all you require is that "tomorrow" is a date offset from today, why not use a simple "if" and check for the word?
That's not to downplay the usefulness of libraries that have more general application than this simple and specific use case. If all you need is to take an object and write it into a database, consider if Hibernate or Spring Data costs you more than it provides.
Scope refers to how well a library's purpose matches your needs. If you need a cache, for example, Google's Guava library has a good one. However, Guava's scope is a lot wider than just a cache -- you also get a host of features in a single library, including collections, graphs, concurrency utilities, string utilities and more. And it's certainly battle-tested. I view Guava as one of few generally useful libraries that I include by default for personal projects.
That said, Guava and all its functionality may not be right for every project. I'm familiar enough with Guava that if I need a cache, for example, I create the structure and don't think twice, which saves a lot of effort in the long run.
That doesn't justify a cargo-cult approach to programming. Consider how well a library's functions actually solve what you need. Even if I can create what I need with Guava without much thought, there might be a more suitable solution for a specific project.
Another factor to choose a library is what you run when you need help. Consider various support options for a given library:
- There might be a company that created the library, as you find with Spring or Hibernate.
- You can also turn to the project's source, such as request support on a GitHub issues page. Or, consider a sort of bug tracker.
- Some companies might offer support, but this competes with the project authors. Their employees have expertise but aren't part of the maintaining bodies for the projects in question.
- A community of like-minded users could offer real-time support, through communication platforms such as Discord, Slack or IRC.
- On social networks such as Stack Overflow, Reddit or Quora, you can leave a message and hope someone responds eventually.
Beyond these options, you're looking at working with no support at all. Anything you need, you're responsible to figure out. A bug report might create a response, but realistically you're on the hook for your usage of the code if the other support networks are unavailable or don't pan out.
In all these cases, a library with an active, vibrant community available for support has a tangible advantage over one that does not.
Deployment size refers to the actual amount of disk space taken up by a library's deployment, along with its transitive dependencies. If the library is 100 KB, but it needs a set of transitive dependencies such as Spring, the actual impact of that library is Spring plus the 100 KB.
In Java this is a relatively small concern, because the JVM's deployed size is likely to be larger than the dependencies. However, if you deploy into resource-constrained environments, such as server clusters with limited disk space or embedded devices (as does your humble author), deployment size is a factor to consider.
This list of seven criteria is not complete, of course, nor is it prescriptive. Each project and project owner relies on goals and measurements to determine the importance of each facet. Use this list as a workable starting point to determine if and how much each aspect should matter.