Get started Bring yourself up to speed with our introductory content.

This history of GitHub and Java's role in it

Ruby played a big role in the history of GitHub, but Java now plays a bigger part. At Oracle Code One, GitHub engineering manager Rafer Hazen provided plenty of reasons why.

You wouldn't think that the implementation of GitHub would be all that hard. After all, the Git tool itself was...

developed with distributed collaboration in mind, and it comes with a variety of built-in features that facilitate secure, network-based interactions over SSH and HTTP. Git was successfully implementing network-based, distributed version control years before vendors like GitLab, GitHub or Bitbucket entered the scene.

So how hard could it be for an organization such as GitHub to take the existing Git software, install it in the cloud and make it available as a SaaS offering? Surely the only tough technical decision ever made in the history of GitHub was what to name the company's mascot.

A brief history of GitHub

According to Rafer Hazen, GitHub's data pipelines engineering manager, it wasn't as easy as it seems. "GitHub started as a fairly conventional Rails app, but we quickly found ourselves needing to process background jobs to offload things that were too slow or too computationally expensive to handle during a request," Hazen said. To address this need, GitHub created the Redis-backed Ruby library known as Resque.

The next big technical challenge to arise in the history of GitHub was how to support webhooks, a mechanism that allows developers to respond to events that occur within a GitHub repository. For example, if the DevOps lead wants to kick off a Jenkins CI job every time code is merged into the master branch, a GitHub webhook is the way to do it.

"To support webhooks, we needed a queuing layer that could scale. It had to scale across many hosts, and it had to support a highly parallel workload," Hazen said. This need begot the creation of the event-driven, open source, asynchronous I/O server known as Kestrel.

As GitHub grew, the need for data warehousing technology emerged. "We needed to be able to ingest high volume event streams, things such as page views or fetches to GitHub repositories," Hazen said. So GitHub built an analytics pipeline around Apache Kafka.

GitHub message
From Rafer Hazen's Oracle Code One keynote presentation, GitHub loves Java.
But how do we build [this single system], and what do we build it with?
Rafer Hazendata pipelines engineering manager, GitHub

Which brings us to what is happening today in the history of GitHub. GitHub has cleverly solved a number of important problems using a variety of different technologies, but going forward, these systems must be unified. How best to achieve this is a question that can't be taken lightly.

"We wanted a single system that could power analytics and the data warehouse loads, but also act as a message bus or a back-end queuing system to power our applications," Hazen said, "But how do we build it, and what do we build it with?"

Given their long history with the language, not to mention their extensive in-house expertise, the first thought would be that they'd develop it using Ruby. Or perhaps due to its recent acquisition by Microsoft, a .NET implementation might be more enticing than ever. But in the end, GitHub decided to develop the project using the language Sun Microsystems created over twenty years ago. GitHub went with Java.

Master Git and GitHub fundamentals

Interested in learning more about Git, GitHub and how these exciting new tools drive DevOps transitions? Then take advantage of the following resources:

Citing its ability to handle parallel workloads and manage concurrency, along with its ability to achieve high levels of performance, Hazen asserted that the Java programming language was the right tool to use when building data infrastructure. The history of GitHub includes a variety of different languages and technologies, but the future is heavily vested in Java. It's a truth that all of the attendees at Oracle Code One likely already knew, but it's always reassuring to see growing companies in highly competitive spaces reinforcing what we already know about the programming language we continue to use every day.

This was last published in October 2018

Dig Deeper on Application scalability and Java HPC

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

GitHub decided to go with Java when implementing Project Hydro. In which situations might you stay away from Java when you integrate infrastructure?
Cancel

-ADS BY GOOGLE

SearchCloudApplications

SearchSoftwareQuality

SearchHRSoftware

SearchSAP

SearchERP

DevOpsAgenda

Close