Object

com.coxautodata.waimak.filesystem

FSUtils

Related Doc: package filesystem

Permalink

object FSUtils extends Logging

Created by Alexei Perelighin on 23/10/17.

Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FSUtils
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  13. def keepNotPresent[O](fs: FileSystem, inParentFolder: Path, toTest: Seq[O])(getObjectPath: (O) ⇒ Path): Seq[O]

    Permalink

    Check is objects in the toTest collection can be mapped to existing folders in the HDFS and returns objects that have not been mapped into the HDFS folder yet.

    Check is objects in the toTest collection can be mapped to existing folders in the HDFS and returns objects that have not been mapped into the HDFS folder yet.

    Implementation is quite efficient as it uses HDFS PathFilter and does not use globs or full lists that could be quite big.

    For example: Inputs: 1) HDFS folder inParentFolder contains partition folders, one per day from 2017/01/01 to 2017/03/15 2) toTest is a suggested range of dates from 2017/03/10 to 2017/03/20

    Output: 1) list of dates from 2017/03/16 to 2017/03/20

    inParentFolder

    - HDFS folder that contains folders that could be mapped to tested objects

    getObjectPath

    - maps hdfs path to tested object

    returns

    objects from toTest that could not be mapped into the folder inParentFolder via function getObjectPath

  14. def listPartitions(fs: FileSystem, folder: String): Seq[(String, String)]

    Permalink

    Lists Hive partition column name and its value, by looking into the folder.

    Lists Hive partition column name and its value, by looking into the folder.

    returns

    (PARTITON COLUMN NAME, VALUE)

  15. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  16. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  17. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  18. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  19. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  20. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  21. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  22. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. def moveAll(fs: FileSystem, subs: Seq[String], fromPath: Path, toPath: Path): Boolean

    Permalink

    Moves all sub-folders in fromPath into toPath.

    Moves all sub-folders in fromPath into toPath. If a folder exists in the destination, it is overwritten. It uses and efficient approach to minimise the number of call to HDFS for checks and validations which could add significant amount of time to the end to end execution.

    fs

    - current hadoop file system

    subs

    - sub folders to move, usually thsese are folders in the staging folder

    fromPath

    - parent folder in which sub folders are

    toPath

    - into which folder to move the subs folders, if any already exist, then need to be overwritten

  27. def moveOverwriteFolder(fs: FileSystem, toMove: Path, toPath: Path): Boolean

    Permalink

    Moves toMove into toPath.

    Moves toMove into toPath. Parent folder of the toPath is created if it does not exist

    fs

    - FileSystem which can be HDFS or Local.

    toMove

    - full path to the folder to be moved.

    toPath

    - full path to be moved into, includes the folder name itself.

    returns

    true if move was successful.

  28. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  29. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  31. def removeFolder(fs: FileSystem, folder: String): Unit

    Permalink

    Deletes folder with all of its content, if it does not exist than does nothing.

  32. def removeSubFoldersPresentInList(fs: FileSystem, folder: Path, subs: Seq[String]): Boolean

    Permalink

    Check if there are any existing folders with the same name in the path and removes them.

    Check if there are any existing folders with the same name in the path and removes them. The main benefit is that it performs checks in one round-trip to HDFS which in case of day zero scenarios could take a lot of time.

    folder

    - parent folder in which to check for existing sub-folders

    subs

    - names to check, if the name is not present, than ignore it, if present, remove it

    returns

    - true if everything was fine

  33. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  34. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  35. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped