Packages

class DatasetParser[T] extends AnyRef

The DatasetParser class is a utility designed to parse structured data and create a Spark Dataset of a specific type T. It relies on an implicit Spark session and an implicit TableParser for transforming raw data into a Table representation before converting it into a Spark Dataset. This class provides a safe and reusable way to load and parse datasets from resources such as URLs.

T

the type of the elements in the resulting Dataset. It should have an implicit Encoder available for serialization. Usage example:

implicit val spark: SparkSession = SparkSession.builder.appName("DatasetParser").master("local[*]").getOrCreate()
import spark.implicits._

// TableParser instance for the specific data type
implicit val movieTableParser: StringTableParser[Table[Movie]] = implicitly[MovieTableParser]

val parser = new DatasetParser[Movie]()
parser.createDataset[DatasetParser[_]]("movie_metadata.csv") match {
  case Success(ds) => ds.show(10)
  case Failure(error) => throw error
}
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DatasetParser
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new DatasetParser()(implicit arg0: Encoder[T], sparkSession: SparkSession, tableParser: TableParser[Table[T]])

    sparkSession

    an implicit SparkSession used for managing Spark operations.

    tableParser

    an implicit TableParser instance that defines how to parse the raw data into a Table[T].

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @HotSpotIntrinsicCandidate() @native()
  6. def createDataset[U](name: String)(implicit arg0: ClassTag[U]): Try[Dataset[T]]

    Creates a Spark Dataset of type T by parsing a given resource specified by its name.

    Creates a Spark Dataset of type T by parsing a given resource specified by its name. This method utilizes an implicit TableParser to parse the raw data into a structured table and subsequently converts the parsed table into a Spark Dataset. It handles potential errors using Try.

    U

    the class type associated with locating the resource. A ClassTag for U is required.

    name

    the name of the resource to be loaded, such as a file name in the classpath.

    returns

    a Try[Dataset[T]], indicating success with the parsed dataset or failure with an appropriate error.

  7. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  8. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  9. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  10. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  14. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  15. def parse(url: URL)(implicit codec: Codec): Try[Dataset[T]]

    Parses a given URL to create a Spark Dataset of type T.

    Parses a given URL to create a Spark Dataset of type T. Utilizes an implicit Codec to handle potential encoding issues during parsing and an implicit TableParser to interpret the raw data into a structured table. The table is then converted into a Spark Dataset and wrapped in a Try to handle any potential errors.

    url

    the URL pointing to the resource to parse.

    codec

    an implicit Codec used to decode the content of the resource.

    returns

    a Try[Dataset[T]], containing the parsed Spark Dataset on success or an exception on failure.

  16. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  17. def toString(): String
    Definition Classes
    AnyRef → Any
  18. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  19. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  20. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from AnyRef

Inherited from Any

Ungrouped