package spark
- Alphabetic
- Public
- Protected
Type Members
- class DatasetFromStream extends AnyRef
- class DatasetMapper[T] extends Serializable
A generic class for mapping and transforming datasets from text-based input to structured data of type
T
using Apache Spark.A generic class for mapping and transforming datasets from text-based input to structured data of type
T
using Apache Spark.This class enables parsing and conversion of raw textual data into Spark
Dataset
s, applying a user-defined mapping function with error handling for missing or malformed data.- T
the type of the elements in the resulting Spark
Dataset
.
- class DatasetParser[T] extends AnyRef
The
DatasetParser
class is a utility designed to parse structured data and create a SparkDataset
of a specific typeT
.The
DatasetParser
class is a utility designed to parse structured data and create a SparkDataset
of a specific typeT
. It relies on an implicit Spark session and an implicitTableParser
for transforming raw data into aTable
representation before converting it into a SparkDataset
. This class provides a safe and reusable way to load and parse datasets from resources such as URLs.- T
the type of the elements in the resulting
Dataset
. It should have an implicit Encoder available for serialization. Usage example:implicit val spark: SparkSession = SparkSession.builder.appName("DatasetParser").master("local[*]").getOrCreate() import spark.implicits._ // TableParser instance for the specific data type implicit val movieTableParser: StringTableParser[Table[Movie]] = implicitly[MovieTableParser] val parser = new DatasetParser[Movie]() parser.createDataset[DatasetParser[_]]("movie_metadata.csv") match { case Success(ds) => ds.show(10) case Failure(error) => throw error }
Value Members
- object DatasetMapper extends App with Serializable
The
DatasetMapper
object is the entry point for mapping and processing datasets using Spark.The
DatasetMapper
object is the entry point for mapping and processing datasets using Spark. It leverages theDatasetMapper
generic class to parse and transform text-based data into structured datasets. This object is specifically tailored to process data of typeMovie
using the provided parser and configuration.It sets up the necessary implicit Spark session and encoders, and executes the main logic for dataset transformation.
Functionality Overview: - Initializes a Spark session configured for local execution. - Configures the
DatasetMapper
to use theMovieDatabase
parser for processing rows of movie data. - Defines default handling for missing or malformed data using theMovie.missing
value. - Invokes the dataset processing logic with a specified input file containing raw text data, displaying the first 20 rows of the resulting dataset.Note: This object is designed to run as a standalone Spark application to demonstrate dataset mapping functionality.
- object DatasetParser extends App
The
DatasetParser
object demonstrates how to parse and process a CSV file containing movie metadata into a structured dataset using Spark and an implicitStringTableParser
.The
DatasetParser
object demonstrates how to parse and process a CSV file containing movie metadata into a structured dataset using Spark and an implicitStringTableParser
. It showcases capabilities to handle parsing, error management, and displaying results.This object extends the
App
trait, making it directly executable as a Scala program. - object MovieDatabase
The MovieDatabase object provides utility functions and constants for handling movie metadata.
The MovieDatabase object provides utility functions and constants for handling movie metadata. It serves as a central repository for defining parsers and file paths required for processing movie-related data.
This object includes mechanisms for parsing movie data and manages access to a CSV file containing movie metadata.