How to create secure Java software: A talk with Black Duck's Tim Mackey
In TheServerSide’s ongoing coverage of developing secure Java software, I spoke recently with Tim Mackey, the IT evangelist for Black Duck Software. The conversation was interesting enough to pull some quotes into several articles that ran on TSS, including the following two on secure open source software and secure microservices and containers:
The hidden threat lurking in an otherwise secure software stack
Do microservices and containers simplify the task of software security?
Given the fact that the interview is being pilfered for quotes in various articles on creating secure Java apps, I figured it would be worthwhile to provide a full transcript of the interview, as Mackey’s insights provide greater value when they are heard in a larger context, and not simply as small quotes within a smaller story.
An interview with Black Duck’s Tim Mackey
Cameron McKenzie: When it comes to open source software, open source governance, and creating secure Java applications, one of the organizations I like to talk to is Black Duck Software. I was fortunate the other day to have Tim Mackey on the phone. He is the IT evangelist at Black Duck Software, and the last time I spoke to him, he was heading to the Red Hat Summit back in May of 2017, I believe.
And so, the first thing I wanted to know was when you went to that Red Hat Summit, what were some of the things that you saw, some of the trends that you picked out, and some of the overall themes that you saw at the conference that you thought were interesting that were indicative of what’s gonna be trends for 2017 and 2018?
|Learn Apache Struts 2 Quickly|
Here’s how to learn the fundamentals of Struts 2.5:
Follow these steps and you’ll be an Apache Struts 2 expert in no time!
Tim Mackey: So the big hot topic was OpenShift and what people can do to increase their adoption of container technologies in a very agile manner in a production scenario. So it’s gone from being the, “Well, we’re not really certain how this is gonna play out, and we’re not certain exactly how we’re gonna tame this thing down, because it’s moving so quickly” to Red Hat actually having something that is nice, solid, scalable, and has enough of all the pieces that an enterprise would want to hear to say, “Let’s talk. You guys got somethin’ here. Let’s go figure it out.”
So that was the big buzz for me. And that validated something that we’ve been working on for, let’s call it the last six months, which was taking our core technologies and bringing it into a container environment, and the end objective of that being to dramatically shorten the time from security incident to remediation. So, for example, the Canada Revenue Agency got hit by the Apache Struts vulnerability about a month ago, and it took them the better part of a week to actually figure it out. If they had had a solution like what we offer, we would’ve already been able to say, “Look, this is the application stack that you’re impacted by with this. Go figure out what your remediation is based on governance requirements for that app.”
Containers, microservices and secure Java software
Cameron McKenzie: Now, there’s no debating the fact that microservices and containers are the big trend nowadays, but doesn’t the whole idea of having hundreds of microservices deployed to hundreds of containers create a software management nightmare?
Tim Mackey: You’re looking at it through the lens of an existing data center operator, so it’s a completely valid question. And one of the beauties of the way that containerization has progressed is it’s taken the whole cloud templating model that the Amazon and Microsoft with Azure, and the OpenStacks and the CloudStacks of the world have really gotten behind, and said, “Okay, let’s have a golden master for a container, and that container itself should have just enough operating environment to actually be useful.”
So, if you’re talking about an Apache web server, maybe you’re talking about Apache Tomcat, so that’s got some Java in there. That’s got some user space components, but because Docker takes the user space and separates it from the kernel, I don’t have any of the kernel components in there, so I’ve got something that’s already smaller from an attack surface perspective. And because these containers can spin up very quickly, and by extension, spin down very quickly, if I need to patch them, then I can very easily build a rolling upgrade that is minimally disruptive.
Cameron McKenzie: So what does one of these rolling upgrades look like? Is it achieved through a Groovy script that knows where all the containers are running and what ports they’re running on, or is it done through an open source container orchestration tool?
Tim Mackey: So, if we were talking say two years ago, your scenario for the Groovy script would be the way that you would approach it. You would have created your own orchestration management paradigm. So, if I look at it from an OpenShift perspective, for example, when I decide that I’m going to create an application, that application’s gonna to have a container image, and I’m going to decide how many replicas that I’m going to have in there. So, basically, how big is my web farm, for example. And I’m going to be able to scale that up and scale that down. Part of the deployment configuration spec is what I should do in the event that something changes. So, if I update that container image, what should I do? One of the scenarios is that I can go and move a percentage of my existing containers over to that new image, maybe 50%, and do an A/B test. Maybe I can go and put a single container out there, and have it as a test so I’m not disturbing the system. That would be a canary scenario.
Once I’m happy with the results, I can go flip a switch, and then they all automatically roll, because one of the tenets of a microservice is that it’s stateless. And then I can scale it any way that I need to, and if it happens to fail, well, another one will pick up over there, as opposed to over here. And so that’s one of the values that you get when you’re trying to patch in a containerized microservice scenario. So, once you do get that security vulnerability disclosed and you identify where it is, you’re now in a position to test it pretty effectively. You go and vet it and put it in as a “Golden Master”, and you’re able to roll that out in production pretty quickly.
The hybrid cloud and software security
Cameron McKenzie: Now, native tools like OpenStack, organizations can bring the public cloud home. They can create hybrid clouds. They can install cloud computing software in their local data center. Does that reduce the security risk, or does that create new, unforeseen security risks?
Tim Mackey: So, what it does is it creates a documentation and communications, I’m gonna use the word “burden.” It may not be the right word, but it’s obligation. So if, for example, I’m going and deploying everything in AWS, I have trusted that Amazon is doing exactly the right things. I’m trusting my engineers to have gone and figured out…and I apologize for the background noise…that they understand how the networking components are configured, that the VPNs are set up correctly, the subnets are all in the right place, snapshotting is happening, all of those kinds of things are occurring the way that they should, and that Amazon is managing that infrastructure appropriately.
Once I move to OpenStack and bring it in-house, now I have to take on those responsibilities. So, effectively, when I offshored it, as it were, existing data center operations, into an AWS environment, by going to OpenStack, I’m bringing it back in. I’m now regaining control, and ownership, and responsibility for the entire infrastructure, as opposed to just an app running on a piece of virtualized infrastructure.
Tim Mackey: So, Node is a funny environment. I liken it to the Wild West with Wyatt Earp. There’s a lot of change that’s happening. You’re not necessarily completely aware of where the packages are coming from. A package could change multiple times per day. You have dependencies in place, and in the Node world, you’ve got npm to kind of make the world smooth over a little bit, and so npm’s kind of like Wyatt Earp. He’s gonna impose a certain set of rules on the environment, but he’s not aware of everything that’s going on. And so the real challenge from a security perspective is, because things are moving rapidly, a fork of a component, like say a data grid, becomes something that could be forked many, many, many, many times, and when the security issue is raised against some point in the intermediate stream of forks, it becomes a lot more difficult to actually recognize where the vulnerable code exists and where it doesn’t exist. And so that’s where a solution like what we have in our ability to monitor containers really starts to shine, because we’ve been designed around the whole…this is how open source behaved.
Cameron McKenzie: So how does Black Duck Software, working with open source software, and as you say, your knowledge of how open source software behaves, give you guys a competitive advantage in the industry?
Tim Mackey: So, the poster child for me is OpenSSL. So, if you look at Black Duck’s history, we’ve been talking about license compliance since inception. And, along comes Heartbleed and it attacks the world. When I do talks, I regularly say, “I assert that everyone in this room knows exactly what they were working on and exactly how painful it was when Heartbleed hit, because that was the wakeup call for the industry.”
One of the things that people don’t realize about the way that security in open source works is that if you take OpenSSL, for example, Heartbleed was associated with one commit ID to another commit ID. There was a range that was involved in it. And one of the aspects of open source is that I can fork. And if you look at GitHub today, and the OpenSSL project, you’ll see something like 14, 15, 17, 1800 different forks of OpenSSL have occurred. So that’s a scenario where there are at least that many derivatives of an OpenSSL implementation out there, but the security vulnerability will only ever be reported against that mainline version.
So, if you’ve taken OpenSSL, you’ve forked it, maybe you’ve removed a cipher suite, added a cipher suite, embedded it into your set of dependencies that then move on to somebody else who’s maybe modified it ever so slightly. And the process repeats, and you end up with it embedded in a different application stack. Maybe it’s part of the base image for your container. You might not be aware that you are in a vulnerable state unless you understand how open source actually works, and that you’re monitoring not just for a vulnerability against a mainline component, but could actually work off of the derivative branches.
Black Duck Software’s Hub
Cameron McKenzie: Now you offer an open source solution to help organizations plug security and governance and policy holes. What exactly is Hub, and how does it help to create secure Java software?
Tim Mackey: So, the application itself is called Hub, and the Hub application is…it has three main components to it. The first component is our knowledge base. The knowledge base is something that we’ve been creating over our entire history, so call it a dozen or so years. And it was designed specifically around the types of behaviors that someone will exhibit in an open source world. It was created at a point in time when corporate intellectual property attorneys were really quite uncertain about what GPL meant for them. And so, we had to figure out a way to track a given license, wherever it might be in a code base, and however that may have entered. And so that was one of the design tenets for that knowledge base.
So, the knowledge base today contains, practical purposes, the entire history of modern open source computing. It’s weighing in at a little over 500 terabytes in size, and has 9,000 different data sources coming into it. So it’s a very, very rich environment. And to give you an idea of the types of data sources, the entirety of GitHub counts as one data source. The entirety of the National Vulnerability Database from NIST counts as another data source. The entirety of Bugtraq counts as another data source. Red Hat Errata is another one, and usually if I sit down and talk to people, I can come up with like 20, 25 different obvious data sources that go in, and we’re adding more on a continual basis.
That is a piece of componentry that few people do want to be in the business of hosting, so we host that. That lives in our datacenter and we manage that, and we update that in real time. So as activity happens in the world of open source, it’s being updated within the knowledge base directly. The Hub application itself is something that is installed on premises [inaudible 00:12:39] at the customer site, and that could be in an AWS environment. That could be in Azure. That could be in their own colo, in OpenStack for that matter. It really doesn’t matter. That’s where the UI is, that’s where the API endpoint is.
And the third component that’s in the required set is our scan client. The scan client itself ends up being embedded in some form of workflow. So, that might be in a Jenkins pipeline for continuous integration. [inaudible 00:13:08] Jenkins put it in Bamboo, for example. If you want to have information appear directly in the developer’s console, that could be Microsoft TFS, that could be inside of Eclipse. We can move right as well, and that’s what we’ve done on the Red Hat side. We have Docker in the middle, so we’ve got a pretty broad spectrum of activity that we can account for. It’s not necessarily just a case of monitoring, so in the Jenkins example, we’re injecting ourselves into the build cycle. We’ll see the source code that is part of that build. So we’ll be able to scan that, we’ll be able to fingerprint it, the fingerprints go up to the Hub server. The hub server matches against the knowledge base, and comes back with a set of answers around things related to license compliance, security compliance, and operational risk. And we don’t need to scan it ever again, because we already know what’s in there, unless of course something changes, in which case you can do another scan.
If the outside world changes, so a new security disclosure happens, or new versions of components are released, we’re able to generate a set of notifications, the most common of which would be to create a JIRA ticket, so there’s an actual active workflow that the developers, the operations teams, the Agile or Scrum product owner, the entire stack of an agile workflow are aware of the state of security for their entire application as they move along. So that’s kind of us in a slightly long-winded nutshell.
Cameron McKenzie: So as we wrap things up here, I’ll give you the last word. What’s the message that you’d like to push out to the Agile enterprise computing community?
Tim Mackey: The big message that I would like to be able to give to the audience is that today open source is kind of the way the world works, but having a complete awareness of the open source components that you’re dependent upon requires a little bit of work. And so that’s where a solution like what Black Duck has, starts to really shine. Once you actually have that bill of materials, that inventory as it were, you’re in a much better position to assess whether or not the risk to the organization is something that is acceptable, that it’s increasing or decreasing in a way that is meeting the business objectives, and that when external events happen, like new security disclosures are put forth, that you’re in a much better shape to be able to identify where you should start remediation, so that it’s not just this, “Oh, great. We’ve just had a new vulnerability. Let’s start the fire drill again and react accordingly.” So you can actually start building this into process.
Cameron McKenzie: So to find out more about Black Duck and Hub, head over to blackducksoftware.com, and if you want to follow Tim Mackey on Twitter, his handle is @TimInTech. And you can follow me too: @cameronmcnz.