Alternate sink construction Using implicit injections & classmanifest for the type
A Pail sink can also build its structure on the fly from a couple of functions.
Generic version of Pail sink accepts a PailStructure.
the simplest version of sink - THE MOST COMMON USE CASE specify exactly 2 parameters rootPath - the location ie.
the simplest version of sink - THE MOST COMMON USE CASE specify exactly 2 parameters rootPath - the location ie. Where do you want your Pail to reside ? targetFn - the partition function ie. How do we create Pail subdirectories out of your input space ?
SEE EXAMPLE : https://gist.github.com/krishnanraman/5224937
Alternate Pail source construction - specify 3 params, rest implicit
The most explicit method to construct a Pail source - specify all 5 params
Generic version of Pail source accepts a PailStructure.
the simplest version of source - THE MOST COMMON USE CASE specify exactly 2 parameters rootPath - the location ie.
the simplest version of source - THE MOST COMMON USE CASE specify exactly 2 parameters rootPath - the location ie. Where does your Pail reside - its root directory ? subPath - the location ie. Where does your Pail reside - its subdirectories ? eg. Say your data resides in foo/bar, foo/obj, foo/ghj If you care about obj & ghj, the rootPath = "foo", subPaths = Array(List("obj"), List("ghj")) Notice that subPaths != Array(List("obj", "ghj")) - this would fail. Every subdirectory goes in its own list.
SEE EXAMPLE : https://gist.github.com/krishnanraman/5224937
The PailSource enables scalding integration with the Pail class in the dfs-datastores library. PailSource allows scalding to sink 1-tuples to subdirectories of a root folder by applying a routing function to each tuple.
SEE EXAMPLE : https://gist.github.com/krishnanraman/5224937