class DatasetParser[T] extends AnyRef
The DatasetParser
class is a utility designed to parse structured data and create a Spark Dataset
of a specific type T
.
It relies on an implicit Spark session and an implicit TableParser
for transforming raw data into a Table
representation
before converting it into a Spark Dataset
. This class provides a safe and reusable way to load and parse datasets from resources such as URLs.
- T
the type of the elements in the resulting
Dataset
. It should have an implicit Encoder available for serialization. Usage example:implicit val spark: SparkSession = SparkSession.builder.appName("DatasetParser").master("local[*]").getOrCreate() import spark.implicits._ // TableParser instance for the specific data type implicit val movieTableParser: StringTableParser[Table[Movie]] = implicitly[MovieTableParser] val parser = new DatasetParser[Movie]() parser.createDataset[DatasetParser[_]]("movie_metadata.csv") match { case Success(ds) => ds.show(10) case Failure(error) => throw error }
- Alphabetic
- By Inheritance
- DatasetParser
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new DatasetParser()(implicit arg0: Encoder[T], sparkSession: SparkSession, tableParser: TableParser[Table[T]])
- sparkSession
an implicit
SparkSession
used for managing Spark operations.- tableParser
an implicit
TableParser
instance that defines how to parse the raw data into aTable[T]
.
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @HotSpotIntrinsicCandidate() @native()
- def createDataset[U](name: String)(implicit arg0: ClassTag[U]): Try[Dataset[T]]
Creates a Spark
Dataset
of typeT
by parsing a given resource specified by its name.Creates a Spark
Dataset
of typeT
by parsing a given resource specified by its name. This method utilizes an implicitTableParser
to parse the raw data into a structured table and subsequently converts the parsed table into a SparkDataset
. It handles potential errors usingTry
.- U
the class type associated with locating the resource. A
ClassTag
forU
is required.- name
the name of the resource to be loaded, such as a file name in the classpath.
- returns
a
Try[Dataset[T]]
, indicating success with the parsed dataset or failure with an appropriate error.
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- def parse(url: URL)(implicit codec: Codec): Try[Dataset[T]]
Parses a given URL to create a Spark
Dataset
of typeT
.Parses a given URL to create a Spark
Dataset
of typeT
. Utilizes an implicitCodec
to handle potential encoding issues during parsing and an implicitTableParser
to interpret the raw data into a structured table. The table is then converted into a SparkDataset
and wrapped in aTry
to handle any potential errors.- url
the
URL
pointing to the resource to parse.- codec
an implicit
Codec
used to decode the content of the resource.- returns
a
Try[Dataset[T]]
, containing the parsed SparkDataset
on success or an exception on failure.
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)