peshkova - Fotolia
You wouldn't think that the implementation of GitHub would be all that hard. After all, the Git tool itself was developed with distributed collaboration in mind, and it comes with a variety of built-in features that facilitate secure, network-based interactions over SSH and HTTP. Git was successfully implementing network-based, distributed version control years before vendors like GitLab, GitHub or Bitbucket entered the scene.
So how hard could it be for an organization such as GitHub to take the existing Git software, install it in the cloud and make it available as a SaaS offering? Surely the only tough technical decision ever made in the history of GitHub was what to name the company's mascot.
A brief history of GitHub
According to Rafer Hazen, GitHub's data pipelines engineering manager, it wasn't as easy as it seems. "GitHub started as a fairly conventional Rails app, but we quickly found ourselves needing to process background jobs to offload things that were too slow or too computationally expensive to handle during a request," Hazen said. To address this need, GitHub created the Redis-backed Ruby library known as Resque.
The next big technical challenge to arise in the history of GitHub was how to support webhooks, a mechanism that allows developers to respond to events that occur within a GitHub repository. For example, if the DevOps lead wants to kick off a Jenkins CI job every time code is merged into the master branch, a GitHub webhook is the way to do it.
"To support webhooks, we needed a queuing layer that could scale. It had to scale across many hosts, and it had to support a highly parallel workload," Hazen said. This need begot the creation of the event-driven, open source, asynchronous I/O server known as Kestrel.
As GitHub grew, the need for data warehousing technology emerged. "We needed to be able to ingest high volume event streams, things such as page views or fetches to GitHub repositories," Hazen said. So GitHub built an analytics pipeline around Apache Kafka.
Rafer Hazendata pipelines engineering manager, GitHub
Which brings us to what is happening today in the history of GitHub. GitHub has cleverly solved a number of important problems using a variety of different technologies, but going forward, these systems must be unified. How best to achieve this is a question that can't be taken lightly.
"We wanted a single system that could power analytics and the data warehouse loads, but also act as a message bus or a back-end queuing system to power our applications," Hazen said, "But how do we build it, and what do we build it with?"
Given their long history with the language, not to mention their extensive in-house expertise, the first thought would be that they'd develop it using Ruby. Or perhaps due to its recent acquisition by Microsoft, a .NET implementation might be more enticing than ever. But in the end, GitHub decided to develop the project using the language Sun Microsystems created over twenty years ago. GitHub went with Java.
Master Git and GitHub fundamentals
Interested in learning more about Git, GitHub and how these exciting new tools drive DevOps transitions? Then take advantage of the following resources:
- The five basic Git commands every software developer needs to know
- Want to back out of a change? Choose between Git revert and Git reset
- Like something on another branch? Here's how to cherry-pick a commit
- Learn how to configure Git and integrate with Jenkins
- Test your knowledge with this Git quiz and DevOps interview questions
Citing its ability to handle parallel workloads and manage concurrency, along with its ability to achieve high levels of performance, Hazen asserted that the Java programming language was the right tool to use when building data infrastructure. The history of GitHub includes a variety of different languages and technologies, but the future is heavily vested in Java. It's a truth that all of the attendees at Oracle Code One likely already knew, but it's always reassuring to see growing companies in highly competitive spaces reinforcing what we already know about the programming language we continue to use every day.