com.twitter.scalding.commons.source

PailSource

object PailSource extends Serializable

The PailSource enables scalding integration with the Pail class in the dfs-datastores library. PailSource allows scalding to sink 1-tuples to subdirectories of a root folder by applying a routing function to each tuple.

SEE EXAMPLE : https://gist.github.com/krishnanraman/5224937

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. PailSource
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  15. final def notify(): Unit

    Definition Classes
    AnyRef
  16. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  17. def sink[T](rootPath: String, targetFn: (T) ⇒ List[String], validator: (List[String]) ⇒ Boolean)(implicit cmf: ClassManifest[T], injection: Injection[T, Array[Byte]]): PailSource[T]

    Alternate sink construction Using implicit injections & classmanifest for the type

  18. def sink[T](rootPath: String, targetFn: (T) ⇒ List[String], validator: (List[String]) ⇒ Boolean, mytype: Class[T], injection: Injection[T, Array[Byte]]): PailSource[T]

    A Pail sink can also build its structure on the fly from a couple of functions.

  19. def sink[T](rootPath: String, structure: PailStructure[T]): PailSource[T]

    Generic version of Pail sink accepts a PailStructure.

  20. def sink[T](rootPath: String, targetFn: (T) ⇒ List[String])(implicit cmf: ClassManifest[T], injection: Injection[T, Array[Byte]]): PailSource[T]

    the simplest version of sink - THE MOST COMMON USE CASE specify exactly 2 parameters rootPath - the location ie.

    the simplest version of sink - THE MOST COMMON USE CASE specify exactly 2 parameters rootPath - the location ie. Where do you want your Pail to reside ? targetFn - the partition function ie. How do we create Pail subdirectories out of your input space ?

    SEE EXAMPLE : https://gist.github.com/krishnanraman/5224937

  21. def source[T](rootPath: String, validator: (List[String]) ⇒ Boolean, subPaths: Array[List[String]])(implicit cmf: ClassManifest[T], injection: Injection[T, Array[Byte]]): PailSource[T]

    Alternate Pail source construction - specify 3 params, rest implicit

  22. def source[T](rootPath: String, validator: (List[String]) ⇒ Boolean, mytype: Class[T], injection: Injection[T, Array[Byte]], subPaths: Array[List[String]]): PailSource[T]

    The most explicit method to construct a Pail source - specify all 5 params

  23. def source[T](rootPath: String, structure: PailStructure[T], subPaths: Array[List[String]]): PailSource[T]

    Generic version of Pail source accepts a PailStructure.

  24. def source[T](rootPath: String, subPaths: Array[List[String]])(implicit cmf: ClassManifest[T], injection: Injection[T, Array[Byte]]): PailSource[T]

    the simplest version of source - THE MOST COMMON USE CASE specify exactly 2 parameters rootPath - the location ie.

    the simplest version of source - THE MOST COMMON USE CASE specify exactly 2 parameters rootPath - the location ie. Where does your Pail reside - its root directory ? subPath - the location ie. Where does your Pail reside - its subdirectories ? eg. Say your data resides in foo/bar, foo/obj, foo/ghj If you care about obj & ghj, the rootPath = "foo", subPaths = Array(List("obj"), List("ghj")) Notice that subPaths != Array(List("obj", "ghj")) - this would fail. Every subdirectory goes in its own list.

    SEE EXAMPLE : https://gist.github.com/krishnanraman/5224937

  25. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  26. def toString(): String

    Definition Classes
    AnyRef → Any
  27. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped