com.twitter.scalding.commons.extensions

Checkpoint

object Checkpoint

Checkpoint provides a simple mechanism to read and write intermediate results from a Scalding flow to HDFS.

Checkpoints are useful for debugging one part of a long flow, when you would otherwise have to run many steps to get to the one you care about. To enable checkpoints, sprinkle calls to Checkpoint() throughout your flow, ideally after expensive steps.

When checkpoints are enabled, each Checkpoint() looks for a checkpoint file on HDFS. If it exists we read results from the file; otherwise we execute the flow and write the results to the file. When checkpoints are disabled, the flow is always executed and the results are never stored.

Each call to Checkpoint() takes the checkpoint name, as well as the types and names of the expected fields. A sample invocation might look like this: val pipe = Checkpoint[(Long, String, Long)]( "clicks", ('tweetId, 'clickUrl, 'clickCount)) { ... } where { ... } contains a flow which computes the result.

Most checkpoint parameters are specified via command-line flags: --checkpoint.clobber: if true, recompute and overwrite any existing checkpoint files. --checkpoint.clobber.<name>: override clobber for the given checkpoint. --checkpoint.file: specifies a filename prefix to use for checkpoint files. If blank, checkpoints are disabled; otherwise the file for checkpoint <name> is <prefix>_<name>. --checkpoint.file.<name>: override --checkpoint.file for the given checkpoint; specifies the whole filename, not the prefix. --checkpoint.format: specifies a file format, either sequencefile or tsv. Default is sequencefile for HDFS, tsv for local. --checkpoint.format.<name>: specifies file format for the given checkpoint.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Checkpoint
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def apply[A](checkpointName: String)(flow: ⇒ TypedPipe[A])(implicit args: Args, mode: Mode, flowDef: FlowDef, conv: TupleConverter[A], setter: TupleSetter[A]): TypedPipe[A]

  7. def apply[A](checkpointName: String, resultFields: Fields)(flow: ⇒ Pipe)(implicit args: Args, mode: Mode, flowDef: FlowDef, conv: TupleConverter[A], setter: TupleSetter[A]): Pipe

    Type parameters: A: tuple of result types

    Type parameters: A: tuple of result types

    Parameters: checkpointName: name of the checkpoint resultFields: tuple of result field names flow: a function to run a flow to compute the result

    Implicit parameters: args: provided by com.twitter.pluck.job.TwitterJob mode: provided by com.twitter.scalding.Job flowDef: provided by com.twitter.scalding.Job conv: provided by com.twitter.scalding.TupleConversions setter: provided by com.twitter.scalding.TupleConversions

  8. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  9. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  16. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  19. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  20. def toString(): String

    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped