case class HadoopFsRelation(location: FileIndex, partitionSchema: StructType, dataSchema: StructType, bucketSpec: Option[BucketSpec], fileFormat: FileFormat, options: Map[String, String])(sparkSession: SparkSession) extends BaseRelation with FileRelation with Product with Serializable
Acts as a container for all of the metadata required to read from a datasource. All discovery, resolution and merging logic for schemas and partitions has been removed.
- location
A FileIndex that can enumerate the locations of all the files that comprise this relation.
- partitionSchema
The schema of the columns (if any) that are used to partition the relation
- dataSchema
The schema of any remaining columns. Note that if any partition columns are present in the actual data files as well, they are preserved.
- bucketSpec
Describes the bucketing (hash-partitioning of the files by some column values).
- fileFormat
A file format that can be used to read and write the data in files.
- options
Configuration used when reading / writing data.
- Alphabetic
- By Inheritance
- HadoopFsRelation
- Serializable
- Serializable
- Product
- Equals
- FileRelation
- BaseRelation
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
HadoopFsRelation(location: FileIndex, partitionSchema: StructType, dataSchema: StructType, bucketSpec: Option[BucketSpec], fileFormat: FileFormat, options: Map[String, String])(sparkSession: SparkSession)
- location
A FileIndex that can enumerate the locations of all the files that comprise this relation.
- partitionSchema
The schema of the columns (if any) that are used to partition the relation
- dataSchema
The schema of any remaining columns. Note that if any partition columns are present in the actual data files as well, they are preserved.
- bucketSpec
Describes the bucketing (hash-partitioning of the files by some column values).
- fileFormat
A file format that can be used to read and write the data in files.
- options
Configuration used when reading / writing data.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
- val bucketSpec: Option[BucketSpec]
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
- val dataSchema: StructType
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- val fileFormat: FileFormat
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
inputFiles: Array[String]
Returns the list of files that will be read when scanning this relation.
Returns the list of files that will be read when scanning this relation.
- Definition Classes
- HadoopFsRelation → FileRelation
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val location: FileIndex
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
needConversion: Boolean
Whether does it need to convert the objects in Row to internal representation, for example: java.lang.String to UTF8String java.lang.Decimal to Decimal
Whether does it need to convert the objects in Row to internal representation, for example: java.lang.String to UTF8String java.lang.Decimal to Decimal
If
needConversion
isfalse
, buildScan() should return anRDD
ofInternalRow
- Definition Classes
- BaseRelation
- Since
1.4.0
- Note
The internal representation is not stable across releases and thus data sources outside of Spark SQL should leave this as true.
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val options: Map[String, String]
- val overlappedPartCols: Map[String, StructField]
- val partitionSchema: StructType
- def partitionSchemaOption: Option[StructType]
-
val
schema: StructType
- Definition Classes
- HadoopFsRelation → BaseRelation
-
def
sizeInBytes: Long
Returns an estimated size of this relation in bytes.
Returns an estimated size of this relation in bytes. This information is used by the planner to decide when it is safe to broadcast a relation and can be overridden by sources that know the size ahead of time. By default, the system will assume that tables are too large to broadcast. This method will be called multiple times during query planning and thus should not perform expensive operations for each invocation.
- Definition Classes
- HadoopFsRelation → BaseRelation
- Since
1.3.0
- Note
It is always better to overestimate size than underestimate, because underestimation could lead to execution plans that are suboptimal (i.e. broadcasting a very large table).
- val sparkSession: SparkSession
-
def
sqlContext: SQLContext
- Definition Classes
- HadoopFsRelation → BaseRelation
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- HadoopFsRelation → AnyRef → Any
-
def
unhandledFilters(filters: Array[Filter]): Array[Filter]
Returns the list of Filters that this datasource may not be able to handle.
Returns the list of Filters that this datasource may not be able to handle. These returned Filters will be evaluated by Spark SQL after data is output by a scan. By default, this function will return all filters, as it is always safe to double evaluate a Filter. However, specific implementations can override this function to avoid double filtering when they are capable of processing a filter internally.
- Definition Classes
- BaseRelation
- Since
1.6.0
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()