Class

org.apache.spark.sql.execution.datasources

HadoopFsRelation

Related Doc: package datasources

Permalink

case class HadoopFsRelation(location: FileIndex, partitionSchema: StructType, dataSchema: StructType, bucketSpec: Option[BucketSpec], fileFormat: FileFormat, options: Map[String, String])(sparkSession: SparkSession) extends BaseRelation with FileRelation with Product with Serializable

Acts as a container for all of the metadata required to read from a datasource. All discovery, resolution and merging logic for schemas and partitions has been removed.

location

A FileIndex that can enumerate the locations of all the files that comprise this relation.

partitionSchema

The schema of the columns (if any) that are used to partition the relation

dataSchema

The schema of any remaining columns. Note that if any partition columns are present in the actual data files as well, they are preserved.

bucketSpec

Describes the bucketing (hash-partitioning of the files by some column values).

fileFormat

A file format that can be used to read and write the data in files.

options

Configuration used when reading / writing data.

Linear Supertypes
Serializable, Serializable, Product, Equals, FileRelation, BaseRelation, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. HadoopFsRelation
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. FileRelation
  7. BaseRelation
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new HadoopFsRelation(location: FileIndex, partitionSchema: StructType, dataSchema: StructType, bucketSpec: Option[BucketSpec], fileFormat: FileFormat, options: Map[String, String])(sparkSession: SparkSession)

    Permalink

    location

    A FileIndex that can enumerate the locations of all the files that comprise this relation.

    partitionSchema

    The schema of the columns (if any) that are used to partition the relation

    dataSchema

    The schema of any remaining columns. Note that if any partition columns are present in the actual data files as well, they are preserved.

    bucketSpec

    Describes the bucketing (hash-partitioning of the files by some column values).

    fileFormat

    A file format that can be used to read and write the data in files.

    options

    Configuration used when reading / writing data.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. val bucketSpec: Option[BucketSpec]

    Permalink

    Describes the bucketing (hash-partitioning of the files by some column values).

  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. val dataSchema: StructType

    Permalink

    The schema of any remaining columns.

    The schema of any remaining columns. Note that if any partition columns are present in the actual data files as well, they are preserved.

  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. val fileFormat: FileFormat

    Permalink

    A file format that can be used to read and write the data in files.

  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  12. def inputFiles: Array[String]

    Permalink

    Returns the list of files that will be read when scanning this relation.

    Returns the list of files that will be read when scanning this relation.

    Definition Classes
    HadoopFsRelationFileRelation
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. val location: FileIndex

    Permalink

    A FileIndex that can enumerate the locations of all the files that comprise this relation.

  15. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. def needConversion: Boolean

    Permalink

    Whether does it need to convert the objects in Row to internal representation, for example: java.lang.String to UTF8String java.lang.Decimal to Decimal

    Whether does it need to convert the objects in Row to internal representation, for example: java.lang.String to UTF8String java.lang.Decimal to Decimal

    If needConversion is false, buildScan() should return an RDD of InternalRow

    Definition Classes
    BaseRelation
    Since

    1.4.0

    Note

    The internal representation is not stable across releases and thus data sources outside of Spark SQL should leave this as true.

  17. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. val options: Map[String, String]

    Permalink

    Configuration used when reading / writing data.

  20. val partitionSchema: StructType

    Permalink

    The schema of the columns (if any) that are used to partition the relation

  21. def partitionSchemaOption: Option[StructType]

    Permalink
  22. val schema: StructType

    Permalink
    Definition Classes
    HadoopFsRelationBaseRelation
  23. def sizeInBytes: Long

    Permalink

    Returns an estimated size of this relation in bytes.

    Returns an estimated size of this relation in bytes. This information is used by the planner to decide when it is safe to broadcast a relation and can be overridden by sources that know the size ahead of time. By default, the system will assume that tables are too large to broadcast. This method will be called multiple times during query planning and thus should not perform expensive operations for each invocation.

    Definition Classes
    HadoopFsRelationBaseRelation
    Since

    1.3.0

    Note

    It is always better to overestimate size than underestimate, because underestimation could lead to execution plans that are suboptimal (i.e. broadcasting a very large table).

  24. val sparkSession: SparkSession

    Permalink
  25. def sqlContext: SQLContext

    Permalink
    Definition Classes
    HadoopFsRelationBaseRelation
  26. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  27. def toString(): String

    Permalink
    Definition Classes
    HadoopFsRelation → AnyRef → Any
  28. def unhandledFilters(filters: Array[Filter]): Array[Filter]

    Permalink

    Returns the list of Filters that this datasource may not be able to handle.

    Returns the list of Filters that this datasource may not be able to handle. These returned Filters will be evaluated by Spark SQL after data is output by a scan. By default, this function will return all filters, as it is always safe to double evaluate a Filter. However, specific implementations can override this function to avoid double filtering when they are capable of processing a filter internally.

    Definition Classes
    BaseRelation
    Since

    1.6.0

  29. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from FileRelation

Inherited from BaseRelation

Inherited from AnyRef

Inherited from Any

Ungrouped