Object/Class

org.apache.spark.sql.execution.datasources.parquet

ParquetFileFormat

Related Docs: class ParquetFileFormat | package parquet

Permalink

object ParquetFileFormat extends Logging with Serializable

Linear Supertypes
Serializable, Serializable, Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ParquetFileFormat
  2. Serializable
  3. Serializable
  4. Logging
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. val apacheParquetLogger: Logger

    Permalink
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  11. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  12. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  13. final def isDebugEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  14. final def isInfoEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  15. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  16. final def isTraceEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  17. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  18. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  19. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  20. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  21. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  22. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. def mergeMetastoreParquetSchema(metastoreSchema: StructType, parquetSchema: StructType): StructType

    Permalink

    Reconciles Hive Metastore case insensitivity issue and data type conflicts between Metastore schema and Parquet schema.

    Reconciles Hive Metastore case insensitivity issue and data type conflicts between Metastore schema and Parquet schema.

    Hive doesn't retain case information, while Parquet is case sensitive. On the other hand, the schema read from Parquet files may be incomplete (e.g. older versions of Parquet doesn't distinguish binary and string). This method generates a correct schema by merging Metastore schema data types and Parquet schema field names.

  30. def mergeSchemasInParallel(filesToTouch: Seq[FileStatus], sparkSession: SparkSession): Option[StructType]

    Permalink

    Figures out a merged Parquet schema with a distributed Spark job.

    Figures out a merged Parquet schema with a distributed Spark job.

    Note that locality is not taken into consideration here because:

    1. For a single Parquet part-file, in most cases the footer only resides in the last block of that file. Thus we only need to retrieve the location of the last block. However, Hadoop FileSystem only provides API to retrieve locations of all blocks, which can be potentially expensive.

    2. This optimization is mainly useful for S3, where file metadata operations can be pretty slow. And basically locality is not available when using S3 (you can't run computation on S3 nodes).

  31. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  32. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  33. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  34. val parquetLogger: Logger

    Permalink
  35. def readSchemaFromFooter(footer: Footer, converter: ParquetSchemaConverter): StructType

    Permalink

    Reads Spark SQL schema from a Parquet footer.

    Reads Spark SQL schema from a Parquet footer. If a valid serialized Spark SQL schema string can be found in the file metadata, returns the deserialized StructType, otherwise, returns a StructType converted from the MessageType stored in this footer.

  36. val redirectParquetLogsViaSLF4J: Unit

    Permalink
  37. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  38. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  39. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  41. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped