Class

com.salesforce.op.readers

CSVReader

Related Doc: package readers

Permalink

class CSVReader[T <: GenericRecord] extends DataReader[T]

Data Reader for CSV data. Each CSV record will be automatically converted to an Avro record using the provided schema.

Linear Supertypes
DataReader[T], ReaderKey[T], Reader[T], ReaderType[T], Serializable, Serializable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CSVReader
  2. DataReader
  3. ReaderKey
  4. Reader
  5. ReaderType
  6. Serializable
  7. Serializable
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CSVReader(readPath: Option[String], key: (T) ⇒ String, schema: String, options: CSVOptions = CSVDefaults.CSVOptions, timeZone: String = CSVDefaults.TimeZone)(implicit arg0: ClassTag[T], wtt: scala.reflect.api.JavaUniverse.WeakTypeTag[T])

    Permalink

    readPath

    default path to data

    key

    function for extracting key from avro record

    schema

    avro schema. Note dateTime fields should be of type Long and will be automatically converted to unix timestamps in millis

    options

    CSV options

    timeZone

    timeZone to be used for any dateTime fields

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def fullTypeName: String

    Permalink

    Full reader input type name

    Full reader input type name

    returns

    full input type name

    Definition Classes
    ReaderType
  10. def generateDataFrame(rawFeatures: Array[OPFeature], opParams: OpParams = new OpParams())(implicit spark: SparkSession): DataFrame

    Permalink

    Generate the Dataframe that will be used in the OpPipeline calling this method

    Generate the Dataframe that will be used in the OpPipeline calling this method

    rawFeatures

    features to generate from the dataset read in by this reader

    opParams

    op parameters

    spark

    spark instance to do the reading and conversion from RDD to Dataframe

    returns

    A Dataframe containing columns with all of the raw input features expected by the pipeline

    Definition Classes
    DataReaderReader
  11. def generateRow(key: String, record: T, rawFeatures: Array[OPFeature]): Option[Row]

    Permalink
    Attributes
    protected
    Definition Classes
    DataReader
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. final def getFinalReadPath(params: OpParams): String

    Permalink

    Default method for extracting the path used in read method.

    Default method for extracting the path used in read method. The path is taken in the following order of priority: readerPath, params

    returns

    final path to use

    Attributes
    protected
    Definition Classes
    DataReader
  14. def getGenStage[I](f: OPFeature): FeatureGeneratorStage[I, _ <: FeatureType]

    Permalink
    Attributes
    protected[com.salesforce.op]
    Definition Classes
    Reader
  15. final def getReaderParams(opParams: OpParams): Option[ReaderParams]

    Permalink

    Default method for extracting this reader's parameters from readerParams in OpParams

    Default method for extracting this reader's parameters from readerParams in OpParams

    opParams

    contains map of reader type to ReaderParams instances

    returns

    ReaderParams instance if it exists

    Definition Classes
    ReaderType
  16. final def getSchema(rawFeatures: Array[OPFeature]): StructType

    Permalink

    Derives DataFrame schema for raw features.

    Derives DataFrame schema for raw features.

    rawFeatures

    feature array representing raw feature-data

    returns

    a StructType instance

    Attributes
    protected
    Definition Classes
    DataReader
  17. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  18. final def innerJoin[U](other: DataReader[U], joinKeys: JoinKeys = JoinKeys()): JoinedDataReader[T, U]

    Permalink

    Inner join

    Inner join

    U

    Type of data read by right data reader

    other

    reader from right side of join

    joinKeys

    join keys to use

    returns

    joined reader

    Definition Classes
    Reader
  19. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  20. final def join[U](other: DataReader[U], joinType: JoinType, joinKeys: JoinKeys = JoinKeys()): JoinedDataReader[T, U]

    Permalink

    Join readers

    Join readers

    U

    Type of data read by right data reader

    other

    reader from right side of join

    joinType

    type of join to perform

    joinKeys

    join keys to use

    returns

    joined reader

    Attributes
    protected
    Definition Classes
    Reader
  21. val key: (T) ⇒ String

    Permalink

    function for extracting key from avro record

    function for extracting key from avro record

    Definition Classes
    CSVReader → ReaderKey
  22. final def leftOuterJoin[U](other: DataReader[U], joinKeys: JoinKeys = JoinKeys()): JoinedDataReader[T, U]

    Permalink

    Left Outer join

    Left Outer join

    U

    Type of data read by right data reader

    other

    reader from right side of join

    joinKeys

    join keys to use

    returns

    joined reader

    Definition Classes
    Reader
  23. final def maybeRepartition(data: Dataset[T], params: OpParams): Dataset[T]

    Permalink

    Function to repartition the data based on the op params of this reader

    Function to repartition the data based on the op params of this reader

    data

    dataset

    params

    op params

    returns

    maybe repartitioned dataset

    Attributes
    protected
    Definition Classes
    DataReader
  24. final def maybeRepartition(data: RDD[T], params: OpParams): RDD[T]

    Permalink

    Function to repartition the data based on the op params of this reader

    Function to repartition the data based on the op params of this reader

    data

    rdd

    params

    op params

    returns

    maybe repartitioned rdd

    Attributes
    protected
    Definition Classes
    DataReader
  25. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  26. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  27. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  28. val options: CSVOptions

    Permalink

    CSV options

  29. final def outerJoin[U](other: DataReader[U], joinKeys: JoinKeys = JoinKeys()): JoinedDataReader[T, U]

    Permalink

    Outer join

    Outer join

    U

    Type of data read by right data reader

    other

    reader from right side of join

    joinKeys

    join keys to use

    returns

    joined reader

    Definition Classes
    Reader
  30. def read(params: OpParams = new OpParams())(implicit spark: SparkSession): Either[RDD[T], Dataset[T]]

    Permalink

    Function which reads raw data from specified location to use in Dataframe creation, i.e.

    Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns either RDD or Dataset of the type specified by this reader. It can be overwritten to carry out any special logic required for the reader (ie filters or joins needed to produce the specified reader type).

    params

    parameters used to carry out specialized logic in reader (passed in from workflow)

    spark

    spark instance to do the reading and conversion from RDD to Dataframe

    returns

    either RDD or Dataset of type T

    Definition Classes
    CSVReaderDataReader
  31. final def readDataset(params: OpParams = new OpParams())(implicit sc: SparkSession, encoder: Encoder[T]): Dataset[T]

    Permalink

    Function which reads raw data from specified location to use in Dataframe creation, i.e.

    Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns a Dataset of the type specified by this reader.

    params

    parameters used to carry out specialized logic in reader (passed in from workflow)

    sc

    spark session

    returns

    Dataset of type T

    Definition Classes
    DataReader
  32. val readPath: Option[String]

    Permalink

    default path to data

    default path to data

    Definition Classes
    CSVReaderDataReader
  33. final def readRDD(params: OpParams = new OpParams())(implicit sc: SparkSession): RDD[T]

    Permalink

    Function which reads raw data from specified location to use in Dataframe creation, i.e.

    Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns a RDD of the type specified by this reader.

    params

    parameters used to carry out specialized logic in reader (passed in from workflow)

    sc

    spark session

    returns

    RDD of type T

    Definition Classes
    DataReader
  34. val schema: String

    Permalink

    avro schema.

    avro schema. Note dateTime fields should be of type Long and will be automatically converted to unix timestamps in millis

  35. final def subReaders: Seq[DataReader[_]]

    Permalink

    All the reader's sub readers (used in joins)

    All the reader's sub readers (used in joins)

    returns

    sub readers

    Definition Classes
    DataReaderReader
  36. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  37. val timeZone: String

    Permalink

    timeZone to be used for any dateTime fields

  38. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  39. final def typeName: String

    Permalink

    Short reader input type name

    Short reader input type name

    returns

    short reader input type name

    Definition Classes
    ReaderType
  40. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  41. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  42. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  43. implicit val wtt: scala.reflect.api.JavaUniverse.WeakTypeTag[T]

    Permalink

    Reader type tag

    Reader type tag

    Definition Classes
    CSVReader → ReaderType

Inherited from DataReader[T]

Inherited from ReaderKey[T]

Inherited from Reader[T]

Inherited from ReaderType[T]

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped