News Stay informed about the latest enterprise technology news and product updates.

How to perform streaming transformation operations in Apache Spark.

Through this Apache Spark
Transformation Operations tutorial, you will learn about various Apache
Spark streaming transformation operations with example being used by
Spark professionals for playing with Apache Spark
Streaming concepts. You will learn the Streaming operations like Spark
Map operation, flatmap operation, Spark filter operation, count
operation, Spark ReduceByKey operation, Spark CountByValue operation
with example and Spark UpdateStateByKey operation with example that will
help you in your Spark jobs.

Introduction to Apache Spark Streaming Transformation Operations

Before we start learning the various Streaming operations in Spark, let us revise Spark Streaming concepts.

Below are the various most common Streaming transformation functions being used in Spark industry:

a. map()

Map function in Spark passes each element of the source DStream through a function and returns a new DStream

Spark map() example

val conf = new SparkConf().setMaster("local[2]") .setAppName("MapOpTest")
val ssc = new StreamingContext(conf , Seconds(1))
val words = ssc.socketTextStream("localhost", 9999)
val ans = words.map { word => ("hello" ,word ) }    // map hello with each line
ans.print()
ssc.start()    // Start the computation
ssc.awaitTermination()    // Wait for termination
}

b. flatMap()

FlatMap function in Spark is similar to Spark map function, but in
flatmap, input item can be mapped to 0 or more output items. This
creates difference between map and flatmap operations in spark.

Spark FlatMap Example

val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))    // for each line it split the words by space
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
wordCounts.print()

c. filter()

Filter function in Apache Spark returns selects only those records of
the source DStream on which func returns true and returns a new DStream
of those records.

Spark Filter function example

val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
val output = words.filter { word => word.startsWith("s") }    // filter the words starts with letter“s”
output.print()

d. reduceByKey(func, [numTasks])

When called on a DStream of (K, V) pairs, ReduceByKey function in
Spark returns a new DStream of (K, V) pairs where the values for each
key are aggregated using the given reduce function.

Spark reduceByKey example

val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
wordCounts.print()

Read the complete article>>

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

-ADS BY GOOGLE

SearchCloudApplications

SearchSoftwareQuality

SearchFinancialApplications

SearchSAP

SearchManufacturingERP

Close