Spark Measure package: proof-of-concept tool for measuring Spark performance metrics
This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer
The list buffer is then transformed into a DataFrame for analysis
Stage Metrics: collects and aggregates metrics at the end of each stage
Task Metrics: collects data at task granularity
Use modes:
Interactive mode from the REPL
Flight recorder mode: records data and saves it for later processing
Supported languages:
The tool is written in Scala, but it can be used both from Scala and Python
Example usage for stage metrics:
val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark)
stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)
for task metrics:
val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark)
spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show()
val df = taskMetrics.createTaskMetricsDF()
To use in flight recorder mode add:
--conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics
Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis
Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity
Use modes: Interactive mode from the REPL Flight recorder mode: records data and saves it for later processing
Supported languages: The tool is written in Scala, but it can be used both from Scala and Python
Example usage for stage metrics: val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark) stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)
for task metrics: val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark) spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show() val df = taskMetrics.createTaskMetricsDF()
To use in flight recorder mode add: --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics
Created by [email protected], March 2017