Introduction

Apache Flink is an open source platform which is a streaming data flow engine that provides communication, fault-tolerance and data-distribution for distributed computations over data streams. Flink is a top level project of Apache. Flink is a scalable data analytics framework that is fully compatible to Hadoop. Flink can execute both stream processing and batch processing easily.

Apache Flink was started under the project called Stratosphere. In 2008 Volker Markl formed the idea for Stratosphere and attracted other co-principal Investigators from HU Berlin, TU Berlin and the Hasso Plattner Institute Potsdam. They jointly worked on a vision and had already put the great efforts on open source deployment and systems building. Later on several decisive steps had been so that the project can be popular in commercial , research and open source community. A commercial entity named this project as Stratosphere. After applying for Apache incubation in April 2014 Flink name was finalized. Flink is a german word which means swift or agile.

Why Flink?

The key vision for Apache Flink is to overcome and reduces the complexity that has been faced by other distributed data driven engines. It is achieved by integrating query optimization, concepts from database systems and efficient parallel in-memory and out-of-core algorithms, with the MapReduce framework. As Apache Flink is mainly based on streaming modal, Apache Flink iterates data by using streaming architecture. The concept of iterative algorithm is tightly bounded in to Flink query optimizer. Apache Flinkā€™s pipelined architecture allows processing the streaming data faster with lower latency than micro-batch architectures (Spark).

Read the complete article >>