



package typed

  1. Public
  2. All

Type Members

  1. class BijectedSourceSink[T, U] extends TypedSource[U] with TypedSink[U]

  2. trait CoGroupable[K, +R] extends HasReducers with HasDescription with Serializable


    Represents something than can be CoGrouped with another CoGroupable

  3. trait CoGrouped[K, +R] extends KeyedListLike[K, R, CoGrouped] with CoGroupable[K, R] with WithReducers[CoGrouped[K, R]] with WithDescription[CoGrouped[K, R]]

  4. abstract class CoGroupedJoiner[K] extends Joiner

  5. case class ComputedValue[T](toTypedPipe: TypedPipe[T]) extends ValuePipe[T] with Product with Serializable

  6. case class Converter[R](conv: TupleConverter[R]) extends FlatMapFn[R] with Product with Serializable

  7. class DistinctCoGroupJoiner[K] extends CoGroupedJoiner[K]

  8. case class FilteredFn[R](fmap: FlatMapFn[R], fn: (R) ⇒ Boolean) extends FlatMapFn[R] with Product with Serializable

  9. sealed trait FlatMapFn[+R] extends (TupleEntry) ⇒ TraversableOnce[R] with Serializable


    Closures are difficult for serialization.

    Closures are difficult for serialization. This class avoids that.

  10. case class FlatMappedFn[T, R](fmap: FlatMapFn[T], fn: (T) ⇒ TraversableOnce[R]) extends FlatMapFn[R] with Product with Serializable

  11. trait Grouped[K, +V] extends KeyedListLike[K, V, UnsortedGrouped] with HashJoinable[K, V] with Sortable[V, [+x]SortedGrouped[K, x] with Reversable[SortedGrouped[K, x]]] with WithReducers[Grouped[K, V]] with WithDescription[Grouped[K, V]]


    This encodes the rules that 1) sorting is only possible before doing any reduce, 2) reversing is only possible after sorting.

    This encodes the rules that 1) sorting is only possible before doing any reduce, 2) reversing is only possible after sorting. 3) unsorted Groups can be CoGrouped or HashJoined

    This may appear a complex type, but it makes sure that code won't compile if it breaks the rule

  12. trait HasDescription extends AnyRef


    Used for objects that may have a description set to be used in .dot and MR step names.

  13. trait HasReducers extends AnyRef


    used for types that may know how many reducers they need e.g.

    used for types that may know how many reducers they need e.g. CoGrouped, Grouped, SortedGrouped, UnsortedGrouped

  14. sealed trait HashEqualsArrayWrapper[T] extends AnyRef

  15. final class HashEqualsBooleanArrayWrapper extends HashEqualsArrayWrapper[Boolean]

  16. final class HashEqualsByteArrayWrapper extends HashEqualsArrayWrapper[Byte]

  17. final class HashEqualsCharArrayWrapper extends HashEqualsArrayWrapper[Char]

  18. final class HashEqualsDoubleArrayWrapper extends HashEqualsArrayWrapper[Double]

  19. final class HashEqualsFloatArrayWrapper extends HashEqualsArrayWrapper[Float]

  20. final class HashEqualsIntArrayWrapper extends HashEqualsArrayWrapper[Int]

  21. final class HashEqualsLongArrayWrapper extends HashEqualsArrayWrapper[Long]

  22. final class HashEqualsObjectArrayWrapper[T] extends HashEqualsArrayWrapper[T]

  23. final class HashEqualsShortArrayWrapper extends HashEqualsArrayWrapper[Short]

  24. trait HashJoinable[K, +V] extends CoGroupable[K, V] with KeyedPipe[K]


    If we can HashJoin, then we can CoGroup, but not vice-versa i.e., HashJoinable is a strict subset of CoGroupable (CoGrouped, for instance is CoGroupable, but not HashJoinable).

  25. class HashJoiner[K, V, W, R] extends Joiner


    Only intended to be use to implement the hashCogroup on TypedPipe/Grouped

  26. case class IdentityReduce[K, V1](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with Grouped[K, V1] with Product with Serializable

  27. case class IdentityValueSortedReduce[K, V1](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], valueSort: Ordering[_ >: V1], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with SortedGrouped[K, V1] with Reversable[IdentityValueSortedReduce[K, V1]] with Product with Serializable

  28. final case class IterablePipe[T](iterable: Iterable[T]) extends TypedPipe[T] with Product with Serializable


    Creates a TypedPipe from an Iterable[T].

    Creates a TypedPipe from an Iterable[T]. Prefer TypedPipe.from.

    If you avoid toPipe, this class is more efficient than IterableSource.

  29. case class IteratorMappedReduce[K, V1, V2](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], reduceFn: (K, Iterator[V1]) ⇒ Iterator[V2], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with UnsortedGrouped[K, V2] with Product with Serializable

  30. trait KeyedList[K, +T] extends KeyedListLike[K, T, KeyedList]


    This is for the case where you don't want to expose any structure but the ability to operate on an iterator of the values

  31. trait KeyedListLike[K, +T, +This[K, +T] <: KeyedListLike[K, T, This]] extends Serializable


    Represents sharded lists of items of type T There are exactly two fundamental operations: toTypedPipe: marks the end of the grouped-on-key operations.

    Represents sharded lists of items of type T There are exactly two fundamental operations: toTypedPipe: marks the end of the grouped-on-key operations. mapValueStream: further transforms all values, in order, one at a time, with a function from Iterator to another Iterator

  32. trait KeyedPipe[K] extends AnyRef


    Represents anything that starts as a TypedPipe of Key Value, where the value type has been erased.

    Represents anything that starts as a TypedPipe of Key Value, where the value type has been erased. Acts as proof that the K in the tuple has an Ordering

  33. case class LiteralValue[T](value: T) extends ValuePipe[T] with Product with Serializable

  34. case class MapFn[T, R](fmap: FlatMapFn[T], fn: (T) ⇒ R) extends FlatMapFn[R] with Product with Serializable

  35. class MappablePipeJoinEnrichment[T] extends AnyRef


    This class is for the syntax enrichment enabling .joinBy on TypedPipes.

    This class is for the syntax enrichment enabling .joinBy on TypedPipes. To access this, do import Syntax.joinOnMappablePipe

  36. class MemorySink[T] extends TypedSink[T]

  37. final case class MergedTypedPipe[T](left: TypedPipe[T], right: TypedPipe[T]) extends TypedPipe[T] with Product with Serializable

  38. trait MustHaveReducers extends HasReducers


    used for types that must know how many reducers they need e.g.

    used for types that must know how many reducers they need e.g. Sketched

  39. sealed trait NoStackAndThen[-A, +B] extends Serializable


    This type is used to implement .andThen on a function in a way that will never blow up the stack.

    This type is used to implement .andThen on a function in a way that will never blow up the stack. This is done to prevent deep scalding TypedPipe pipelines from blowing the stack

    This may be slow, but is used in scalding at planning time

  40. trait PartitionSchemed[P, T] extends SchemedSource with TypedSink[(P, T)] with Mappable[(P, T)] with HfsTapProvider


    Trait to assist with creating partitioned sources.

    Trait to assist with creating partitioned sources.

    Apart from the abstract members below, hdfsScheme and localScheme also need to be set. Note that for both of them the sink fields need to be set to only include the actual fields that should be written to file and not the partition fields.

  41. trait PartitionedDelimited extends Serializable


    Trait to assist with creating objects such as PartitionedTsv to read from separated files.

    Trait to assist with creating objects such as PartitionedTsv to read from separated files. Override separator, skipHeader, writeHeader as needed.

  42. case class PartitionedDelimitedSource[P, T](path: String, template: String, separator: String, fields: Fields, skipHeader: Boolean = false, writeHeader: Boolean = false, quote: String = "\"", strict: Boolean = true, safe: Boolean = true)(implicit mt: Manifest[T], valueSetter: TupleSetter[T], valueConverter: TupleConverter[T], partitionSetter: TupleSetter[P], partitionConverter: TupleConverter[P]) extends SchemedSource with PartitionSchemed[P, T] with Serializable with Product with Serializable


    Scalding source to read or write partitioned delimited text.

    Scalding source to read or write partitioned delimited text.

    For writing it expects a pair of (P, T), where P is the data used for partitioning and T is the output to write out. Below is an example.

    val data = List(
      (("a", "x"), ("i", 1)),
      (("a", "y"), ("j", 2)),
      (("b", "z"), ("k", 3))
    IterablePipe(data, flowDef, mode)
      .write(PartitionedDelimited[(String, String), (String, Int)](args("out"), "col1=%s/col2=%s"))

    For reading it produces a pair (P, T) where P is the partition data and T is data in the files. Below is an example.

    val in: TypedPipe[((String, String), (String, Int))] = PartitionedDelimited[(String, String), (String, Int)](args("in"), "col1=%s/col2=%s")
  43. case class PartitionedTextLine[P](path: String, template: String, encoding: String = TextLine.DEFAULT_CHARSET)(implicit valueSetter: TupleSetter[String], valueConverter: TupleConverter[(Long, String)], partitionSetter: TupleSetter[P], partitionConverter: TupleConverter[P]) extends SchemedSource with TypedSink[(P, String)] with Mappable[(P, (Long, String))] with HfsTapProvider with Serializable with Product with Serializable


    Scalding source to read or write partitioned text.

    Scalding source to read or write partitioned text.

    For writing it expects a pair of (P, String), where P is the data used for partitioning and String is the output to write out. Below is an example.

    val data = List(
      (("a", "x"), "line1"),
      (("a", "y"), "line2"),
      (("b", "z"), "line3")
    IterablePipe(data, flowDef, mode)
      .write(PartitionTextLine[(String, String)](args("out"), "col1=%s/col2=%s"))

    For reading it produces a pair (P, (Long, String)) where P is the partition data, Long is the offset into the file and String is a line from the file. Below is an example.

    val in: TypedPipe[((String, String), (Long, String))] = PartitionTextLine[(String, String)](args("in"), "col1=%s/col2=%s")

    Base path of the partitioned directory


    Template for the partitioned path


    Text encoding of the file content

  44. class PipeTExtensions extends Serializable

  45. sealed trait ReduceStep[K, V1] extends KeyedPipe[K]


    This is a class that models the logical portion of the reduce step.

    This is a class that models the logical portion of the reduce step. details like where this occurs, the number of reducers, etc... are left in the Grouped class

  46. trait Reversable[+R] extends AnyRef

  47. case class SketchJoined[K, V, V2, R](left: Sketched[K, V], right: TypedPipe[(K, V2)], numReducers: Int)(joiner: (K, V, Iterable[V2]) ⇒ Iterator[R])(implicit evidence$1: Ordering[K]) extends MustHaveReducers with Product with Serializable

  48. case class Sketched[K, V](pipe: TypedPipe[(K, V)], numReducers: Int, delta: Double, eps: Double, seed: Int)(implicit serialization: (K) ⇒ Array[Byte], ordering: Ordering[K]) extends MustHaveReducers with Product with Serializable


    This class is generally only created by users with the TypedPipe.sketch method

  49. trait Sortable[+T, +Sorted[+_]] extends AnyRef


    All sorting methods defined here trigger Hadoop secondary sort on key + value.

    All sorting methods defined here trigger Hadoop secondary sort on key + value. Hadoop secondary sort is external sorting. i.e. it won't materialize all values of each key in memory on the reducer.

  50. trait SortedGrouped[K, +V] extends KeyedListLike[K, V, SortedGrouped] with WithReducers[SortedGrouped[K, V]] with WithDescription[SortedGrouped[K, V]]


    After sorting, we are no longer CoGroupable, and we can only call reverse in the initial SortedGrouped created from the Sortable: .sortBy(_._2).reverse for instance

    After sorting, we are no longer CoGroupable, and we can only call reverse in the initial SortedGrouped created from the Sortable: .sortBy(_._2).reverse for instance

    Once we have sorted, we cannot do a HashJoin or a CoGrouping

  51. case class TemplatePartition(partitionFields: Fields, template: String) extends Partition with Product with Serializable


    Creates a partition using the given template string.

    Creates a partition using the given template string.

    The template string needs to have %s as placeholder for a given field.

  52. trait TypedPipe[+T] extends Serializable


    Think of a TypedPipe as a distributed unordered list that may or may not yet have been materialized in memory or disk.

    Think of a TypedPipe as a distributed unordered list that may or may not yet have been materialized in memory or disk.

    Represents a phase in a distributed computation on an input data source Wraps a cascading Pipe object, and holds the transformation done up until that point

  53. class TypedPipeFactory[T] extends TypedPipe[T]


    This is a TypedPipe that delays having access to the FlowDef and Mode until toPipe is called

  54. class TypedPipeInst[T] extends TypedPipe[T]


    This is an instance of a TypedPipe that wraps a cascading Pipe

  55. trait TypedSink[-T] extends Serializable


    Opposite of TypedSource, used for writing into

  56. trait TypedSource[+T] extends Serializable

  57. trait UnsortedGrouped[K, +V] extends KeyedListLike[K, V, UnsortedGrouped] with HashJoinable[K, V] with WithReducers[UnsortedGrouped[K, V]] with WithDescription[UnsortedGrouped[K, V]]


    This is the state after we have done some reducing.

    This is the state after we have done some reducing. It is not possible to sort at this phase, but it is possible to do a CoGrouping or a HashJoin.

  58. case class UnsortedIdentityReduce[K, V1](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with UnsortedGrouped[K, V1] with Product with Serializable

  59. sealed trait ValuePipe[+T] extends Serializable


    ValuePipe is special case of a TypedPipe of just a optional single element.

    ValuePipe is special case of a TypedPipe of just a optional single element. It is like a distribute Option type It allows to perform scalar based operations on pipes like normalization.

  60. case class ValueSortedReduce[K, V1, V2](keyOrdering: Ordering[K], mapped: TypedPipe[(K, V1)], valueSort: Ordering[_ >: V1], reduceFn: (K, Iterator[V1]) ⇒ Iterator[V2], reducers: Option[Int], descriptions: Seq[String]) extends ReduceStep[K, V1] with SortedGrouped[K, V2] with Product with Serializable

  61. trait WithDescription[+This <: WithDescription[This]] extends HasDescription


    Used for objects that may _set_ a description to be used in .dot and MR step names.

  62. case class WithDescriptionTypedPipe[T](typedPipe: TypedPipe[T], description: String) extends TypedPipe[T] with Product with Serializable

  63. case class WithOnComplete[T](typedPipe: TypedPipe[T], fn: () ⇒ Unit) extends TypedPipe[T] with Product with Serializable

  64. trait WithReducers[+This <: WithReducers[This]] extends HasReducers


    used for objects that may _set_ how many reducers they need e.g.

    used for objects that may _set_ how many reducers they need e.g. CoGrouped, Grouped, SortedGrouped, UnsortedGrouped

Value Members

  1. object BijectedSourceSink extends Serializable

  2. object CoGroupable extends Serializable

  3. object CoGrouped extends Serializable

  4. object CumulativeSum


    Extension for TypedPipe to add a cumulativeSum method.

    Extension for TypedPipe to add a cumulativeSum method. Given a TypedPipe with T = (GroupField, (SortField, SummableField)) cumulaitiveSum will return a SortedGrouped with the SummableField accumulated according to the sort field. eg: ('San Francisco', (100, 100)), ('San Francisco', (101, 50)), ('San Francisco', (200, 200)), ('Vancouver', (100, 50)), ('Vancouver', (101, 300)), ('Vancouver', (200, 100)) becomes ('San Francisco', (100, 100)), ('San Francisco', (101, 150)), ('San Francisco', (200, 300)), ('Vancouver', (100, 50)), ('Vancouver', (101, 350)), ('Vancouver', (200, 450))

    If you provide cumulativeSum a partition function you get the same result but you allow for more than one reducer per group. This is useful for when you have a single group that has a very large number of entries. For example in the previous example if you gave a partition function of the form { _ / 100 } then you would never have any one reducer deal with more than 2 entries.

  5. object Empty extends FlatMapFn[Nothing] with Product with Serializable

  6. object EmptyTypedPipe extends TypedPipe[Nothing] with Product with Serializable


    This object is the EmptyTypedPipe.

    This object is the EmptyTypedPipe. Prefer to create it with TypedPipe.empty

  7. object EmptyValue extends ValuePipe[Nothing] with Product with Serializable

  8. object FlattenGroup


    Autogenerated methods for flattening the nested value tuples that result after joining many pipes together.

    Autogenerated methods for flattening the nested value tuples that result after joining many pipes together. These methods can be used directly, or via the the joins available in MultiJoin.

  9. object Grouped extends Serializable

  10. object HashEqualsArrayWrapper

  11. object Joiner extends Serializable

  12. object KeyedListLike extends Serializable

  13. object LookupJoin extends Serializable


    lookupJoin simulates the behavior of a realtime system attempting to leftJoin (K, V) pairs against some other value type (JoinedV) by performing realtime lookups on a key-value Store.

    lookupJoin simulates the behavior of a realtime system attempting to leftJoin (K, V) pairs against some other value type (JoinedV) by performing realtime lookups on a key-value Store.

    An example would join (K, V) pairs of (URL, Username) against a service of (URL, ImpressionCount). The result of this join would be a pipe of (ShortenedURL, (Username, Option[ImpressionCount])).

    To simulate this behavior, lookupJoin accepts pipes of key-value pairs with an explicit time value T attached. T must have some sensible ordering. The semantics are, if one were to hit the right pipe's simulated realtime service at any time between T(tuple) T(tuple + 1), one would receive Some((K, JoinedV)(tuple)).

    The entries in the left pipe's tuples have the following meaning:

    T: The time at which the (K, W) lookup occurred. K: the join key. W: the current value for the join key.

    The right pipe's entries have the following meaning:

    T: The time at which the "service" was fed an update K: the join K. V: value of the key at time T

    Before the time T in the right pipe's very first entry, the simulated "service" will return None. After this time T, the right side will return None only if the key is absent, else, the service will return Some(joinedV).

  14. object MultiJoin extends Serializable


    This is an autogenerated object which gives you easy access to doing N-way joins so the types are cleaner.

    This is an autogenerated object which gives you easy access to doing N-way joins so the types are cleaner. However, it just calls the underlying methods on CoGroupable and flattens the resulting tuple

  15. object NoStackAndThen extends Serializable

  16. object PartitionUtil


    Utility functions to assist with creating partitioned sourced.

  17. object PartitionedCsv extends PartitionedDelimited


    Partitioned typed commma separated source.

  18. object PartitionedOsv extends PartitionedDelimited


    Partitioned typed \1 separated source (commonly used by Pig).

  19. object PartitionedPsv extends PartitionedDelimited


    Partitioned typed pipe separated source.

  20. object PartitionedTsv extends PartitionedDelimited


    Partitioned typed tab separated source.

  21. object SketchJoined extends Serializable

  22. object Syntax


    These are named syntax extensions that users can optionally import.

    These are named syntax extensions that users can optionally import. Avoid import Syntax._

  23. object TDsl extends Serializable with GeneratedTupleAdders


    implicits for the type-safe DSL import TDsl._ to get the implicit conversions from Grouping/CoGrouping to Pipe, to get the .toTypedPipe method on standard cascading Pipes.

    implicits for the type-safe DSL import TDsl._ to get the implicit conversions from Grouping/CoGrouping to Pipe, to get the .toTypedPipe method on standard cascading Pipes. to get automatic conversion of Mappable[T] to TypedPipe[T]

  24. object TypedPipe extends Serializable


    factory methods for TypedPipe, which is the typed representation of distributed lists in scalding.

    factory methods for TypedPipe, which is the typed representation of distributed lists in scalding. This object is here rather than in the typed package because a lot of code was written using the functions in the object, which we do not see how to hide with package object tricks.

  25. object TypedPipeDiff


    Some methods for comparing two typed pipes and finding out the difference between them.

    Some methods for comparing two typed pipes and finding out the difference between them.

    Has support for the normal case where the typed pipes are pipes of objects usable as keys in scalding (have an ordering, proper equals and hashCode), as well as some special cases for dealing with Arrays and thrift objects.

    See diffByHashCode for comparing typed pipes of objects that have no ordering but a stable hash code (such as Scrooge thrift).

    See diffByGroup for comparing typed pipes of objects that have no ordering *and* an unstable hash code.

  26. object TypedPipeFactory extends Serializable


    This is an implementation detail (and should be marked private)

  27. object TypedSink extends Serializable

  28. object ValuePipe extends Serializable

