Packages

package spark

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. Protected

Type Members

  1. class DatasetFromStream extends AnyRef
  2. class DatasetMapper[T] extends Serializable

    A generic class for mapping and transforming datasets from text-based input to structured data of type T using Apache Spark.

    A generic class for mapping and transforming datasets from text-based input to structured data of type T using Apache Spark.

    This class enables parsing and conversion of raw textual data into Spark Datasets, applying a user-defined mapping function with error handling for missing or malformed data.

    T

    the type of the elements in the resulting Spark Dataset.

  3. class DatasetParser[T] extends AnyRef

    The DatasetParser class is a utility designed to parse structured data and create a Spark Dataset of a specific type T.

    The DatasetParser class is a utility designed to parse structured data and create a Spark Dataset of a specific type T. It relies on an implicit Spark session and an implicit TableParser for transforming raw data into a Table representation before converting it into a Spark Dataset. This class provides a safe and reusable way to load and parse datasets from resources such as URLs.

    T

    the type of the elements in the resulting Dataset. It should have an implicit Encoder available for serialization. Usage example:

    implicit val spark: SparkSession = SparkSession.builder.appName("DatasetParser").master("local[*]").getOrCreate()
    import spark.implicits._
    
    // TableParser instance for the specific data type
    implicit val movieTableParser: StringTableParser[Table[Movie]] = implicitly[MovieTableParser]
    
    val parser = new DatasetParser[Movie]()
    parser.createDataset[DatasetParser[_]]("movie_metadata.csv") match {
      case Success(ds) => ds.show(10)
      case Failure(error) => throw error
    }

Value Members

  1. object DatasetMapper extends App with Serializable

    The DatasetMapper object is the entry point for mapping and processing datasets using Spark.

    The DatasetMapper object is the entry point for mapping and processing datasets using Spark. It leverages the DatasetMapper generic class to parse and transform text-based data into structured datasets. This object is specifically tailored to process data of type Movie using the provided parser and configuration.

    It sets up the necessary implicit Spark session and encoders, and executes the main logic for dataset transformation.

    Functionality Overview: - Initializes a Spark session configured for local execution. - Configures the DatasetMapper to use the MovieDatabase parser for processing rows of movie data. - Defines default handling for missing or malformed data using the Movie.missing value. - Invokes the dataset processing logic with a specified input file containing raw text data, displaying the first 20 rows of the resulting dataset.

    Note: This object is designed to run as a standalone Spark application to demonstrate dataset mapping functionality.

  2. object DatasetParser extends App

    The DatasetParser object demonstrates how to parse and process a CSV file containing movie metadata into a structured dataset using Spark and an implicit StringTableParser.

    The DatasetParser object demonstrates how to parse and process a CSV file containing movie metadata into a structured dataset using Spark and an implicit StringTableParser. It showcases capabilities to handle parsing, error management, and displaying results.

    This object extends the App trait, making it directly executable as a Scala program.

  3. object MovieDatabase

    The MovieDatabase object provides utility functions and constants for handling movie metadata.

    The MovieDatabase object provides utility functions and constants for handling movie metadata. It serves as a central repository for defining parsers and file paths required for processing movie-related data.

    This object includes mechanisms for parsing movie data and manages access to a CSV file containing movie metadata.

Ungrouped