Trait

com.coxautodata.waimak.storage

FileStorageOps

Related Doc: package storage

Permalink

trait FileStorageOps extends AnyRef

Contains operations that interact with physical storage. Will also handle commit to the file system.

Created by Alexei Perelighin on 2018/03/05

Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FileStorageOps
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def atomicWriteAndCleanup(tableName: String, compactedData: Dataset[_], newDataPath: Path, cleanUpBase: Path, cleanUpFolders: Seq[String]): Unit

    Permalink

    During compaction, data from multiple folders need to be merged and re-written into one folder with fewer files.

    During compaction, data from multiple folders need to be merged and re-written into one folder with fewer files. The operation has to be fail safe; moving out data can only take place after new version is fully written and committed.

    E.g. data from fromBase=/data/db/tbl1/type=hot and fromSubFolders=Seq("region=11", "region=12", "region=13", "region=14") will be merged and coalesced into optimal number of partitions in Dataset data and will be written out into newDataPath=/data/db/tbl1/type=cold/region=15 with old folder being moved into table's trash folder.

    Starting state:

    /data/db/tbl1/type=hot/region=11 .../region=12 .../region=13 .../region=14

    Final state:

    /data/db/tbl1/type=cold/region=15 /data/db/.Trash/tbl1/region=11 .../region=12 .../region=13 .../region=14

    tableName

    name of the table

    compactedData

    the data set with data from fromSubFolders already repartitioned, it will be saved into newDataPath

    newDataPath

    path into which combined and repartitioned data from the dataset will be committed into

    cleanUpBase

    parent folder from which to remove the cleanUpFolders

    cleanUpFolders

    list of sub-folders to remove once the writing and committing of the combined data is successful

  2. abstract def globTablePaths[A](basePath: Path, tableNames: Seq[String], tablePartitions: Seq[String], parFun: PartialFunction[FileStatus, A]): Seq[A]

    Permalink

    Glob a list of table paths with partitions, and apply a partial function to collect (filter+map) the result to transform the FileStatus to any type A

    Glob a list of table paths with partitions, and apply a partial function to collect (filter+map) the result to transform the FileStatus to any type A

    A

    return type of final sequence

    basePath

    parent folder which contains folders with table names

    tableNames

    list of table names to search under

    tablePartitions

    list of partition columns to include in the path

    parFun

    a partition function to transform FileStatus to any type A

  3. abstract def listTables(basePath: Path): Seq[String]

    Permalink

    Lists tables in the basePath.

    Lists tables in the basePath. It will ignore any folder/table that starts with '.'

    basePath

    parent folder which contains folders with table names

  4. abstract def mkdirs(path: Path): Boolean

    Permalink

    Creates folders on the physical storage.

    Creates folders on the physical storage.

    path

    path to create

    returns

    true if the folder exists or was created without problems, false if there were problems creating all folders in the path

  5. abstract def openParquet(path: Path, paths: Path*): Option[Dataset[_]]

    Permalink

    Opens parquet file from the path, which can be folder or a file.

    Opens parquet file from the path, which can be folder or a file. If there are partitioned sub-folders with file with slightly different schema, it will attempt to merge schema to accommodate for the schema evolution.

    path

    path to open

    returns

    Some with dataset if there is data, None if path does not exist or can not be opened

    Exceptions thrown

    Exception in cases of connectivity

  6. abstract def pathExists(path: Path): Boolean

    Permalink

    Checks if the path exists in the physical storage.

    Checks if the path exists in the physical storage.

    returns

    true if path exists in the storage layer

  7. abstract def readAuditTableInfo(basePath: Path, tableName: String): Try[AuditTableInfo]

    Permalink

    Reads the table info back.

    Reads the table info back.

    basePath

    parent folder which contains folders with table names

    tableName

    name of the table to read for

  8. abstract def sparkSession: SparkSession

    Permalink
  9. abstract def writeAuditTableInfo(basePath: Path, info: AuditTableInfo): Try[AuditTableInfo]

    Permalink

    Writes out static data about the audit table into basePath/table_name/.table_info file.

    Writes out static data about the audit table into basePath/table_name/.table_info file.

    basePath

    parent folder which contains folders with table names

    info

    static information about table, that will not change during table's existence

  10. abstract def writeParquet(tableName: String, path: Path, ds: Dataset[_]): Unit

    Permalink

    Commits data set into full path.

    Commits data set into full path. The path is the full path into which the parquet will be placed after it is fully written into the temp folder.

    tableName

    name of the table, will only be used to write into tmp

    path

    full destination path

    ds

    dataset to write out. no partitioning will be performed on it

    Exceptions thrown

    Exception can be thrown due to access permissions, connectivity, spark UDFs (as datasets are lazily executed)

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  14. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  16. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  17. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  18. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped