when the application stops, serialize the content of stageMetricsData into a file and/or print to stdout
when the application stops, serialize the content of stageMetricsData into a file and/or print to stdout
This methods fires at the end of the stage and collects metrics flattened into the stageMetricsData ListBuffer Note all times are in ms, cpu time and shuffle write time are originally in nanosec, thus in the code are divided by 1e6
This methods fires at the end of the stage and collects metrics flattened into the stageMetricsData ListBuffer Note all times are in ms, cpu time and shuffle write time are originally in nanosec, thus in the code are divided by 1e6
FlightRecorderStageMetrics - Use Spark Listeners defined in stagemetrics.scala to record task metrics data aggregated at the Stage level, without changing the application code. The resulting data can be saved to a file and/or printed to stdout.
Use: by adding the following configuration to spark-submit (or Spark Session) configuration --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics
Additional configuration parameters: --conf spark.sparkmeasure.outputFormat=<format>, valid values: java,json,json_to_hadoop default "json" note: json and java serialization formats, write to the driver local filesystem json_to_hadoop, writes to JSON serialized metrics to HDFS or to an Hadoop compliant filesystem, such as s3a
--conf spark.sparkmeasure.outputFilename=<output file>, default: "/tmp/stageMetrics_flightRecorder" --conf spark.sparkmeasure.printToStdout=<true|false>, default false. Set to true to print JSON serialized metrics to stdout.