com.twitter.scalding.examples

PageRank

class PageRank extends Job

Options: --input: the three column TSV with node, comma-sep-out-neighbors, initial pagerank (set to 1.0 first) --output: the name for the TSV you want to write to, same as above. optional arguments: --errorOut: name of where to write the L1 error between the input page-rank and the output if this is omitted, we don't compute the error --iterations: how many iterations to run inside this job. Default is 1, 10 is about as much as cascading can handle. --jumpprob: probability of a random jump, default is 0.15 --convergence: if this is set, after every "--iterations" steps, we check the error and see if we should continue. Since the error check is expensive (involving a join), you should avoid doing this too frequently. 10 iterations is probably a good number to set. --temp: this is the name where we will store a temporary output so we can compare to the previous for convergence checking. If convergence is set, this MUST be.

Linear Supertypes
Job, Serializable, FieldConversions, LowPriorityFieldConversions, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. PageRank
  2. Job
  3. Serializable
  4. FieldConversions
  5. LowPriorityFieldConversions
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new PageRank(args: Args)

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. val ALPHA: Double

  7. val EDGE: Int

  8. val JOB_COUNT: Int

  9. val NODESET: Int

  10. val STEPS: Int

  11. implicit def _implicitJobArgs: Args

    Attributes
    protected
    Definition Classes
    Job
  12. def anyToFieldArg(f: Any): Comparable[_]

    Attributes
    protected
    Definition Classes
    LowPriorityFieldConversions
  13. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  14. def asList(f: Fields): List[Comparable[_]]

    Definition Classes
    FieldConversions
  15. def asSet(f: Fields): Set[Comparable[_]]

    Definition Classes
    FieldConversions
  16. def buildFlow: Flow[_]

    Definition Classes
    Job
  17. def classIdentifier: String

    Definition Classes
    Job
  18. def clear(): Unit

    Definition Classes
    Job
  19. def clone(nextargs: Args): Job

    Definition Classes
    Job
  20. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. def computeError(pr: RichPipe): RichPipe

  22. def config: Map[AnyRef, AnyRef]

    Definition Classes
    Job
  23. implicit def dateParser: DateParser

    Definition Classes
    Job
  24. def defaultComparator: Option[Class[_ <: Comparator[_]]]

    Definition Classes
    Job
  25. def defaultMode(fromFields: Fields, toFields: Fields): Fields

    Definition Classes
    FieldConversions
  26. def defaultSpillThreshold: Int

    Definition Classes
    Job
  27. final def doPageRank(steps: Int)(pagerank: RichPipe): RichPipe

    The basic idea is to groupBy the dst key with BOTH the nodeset and the edge rows.

    The basic idea is to groupBy the dst key with BOTH the nodeset and the edge rows. the nodeset rows have the old page-rank, the edge rows are reversed, so we can get the incoming page-rank from the nodes that point to each destination.

    Annotations
    @tailrec()
  28. final def ensureUniqueFields(left: Fields, right: Fields, rightPipe: Pipe): (Fields, Pipe)

    Definition Classes
    FieldConversions
  29. implicit def enumValueToFields(x: Value): Fields

    Definition Classes
    FieldConversions
  30. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  31. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  32. implicit def fieldFields[T <: TraversableOnce[Field[_]]](f: T): RichFields

    Definition Classes
    FieldConversions
  33. implicit def fieldToFields(f: Field[_]): RichFields

    Definition Classes
    FieldConversions
  34. implicit def fields[T <: TraversableOnce[Symbol]](f: T): Fields

    Definition Classes
    FieldConversions
  35. implicit def fieldsToRichFields(fields: Fields): RichFields

    Definition Classes
    FieldConversions
  36. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  37. implicit val flowDef: FlowDef

    Attributes
    protected
    Definition Classes
    Job
  38. implicit def fromEnum[T <: Enumeration](enumeration: T): Fields

    Definition Classes
    FieldConversions
  39. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  40. def getField(f: Fields, idx: Int): Fields

    Definition Classes
    FieldConversions
  41. def handleStats(statsData: CascadingStats): Unit

    Attributes
    protected
    Definition Classes
    Job
  42. def hasInts(f: Fields): Boolean

    Definition Classes
    FieldConversions
  43. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  44. def initialize(nodeCol: Symbol, neighCol: Symbol, pageRank: Symbol): Pipe

    override this function to change how you generate a pipe of (Long, String, Double) where the first entry is the nodeid, the second is the list of neighbors, as a comma (no spaces) separated string representation of the numeric nodeids, the third is the initial page rank (if not starting from a previous run, this should be 1.

    override this function to change how you generate a pipe of (Long, String, Double) where the first entry is the nodeid, the second is the list of neighbors, as a comma (no spaces) separated string representation of the numeric nodeids, the third is the initial page rank (if not starting from a previous run, this should be 1.0

    NOTE: if you want to run until convergence, the initialize method must read the same EXACT format as the output method writes. This is your job!

  45. implicit def intFields[T <: TraversableOnce[Int]](f: T): Fields

    Definition Classes
    FieldConversions
  46. implicit def intToFields(x: Int): Fields

    Definition Classes
    FieldConversions
  47. implicit def integerToFields(x: Integer): Fields

    Definition Classes
    FieldConversions
  48. def ioSerializations: List[Class[_ <: Serialization[_]]]

    Definition Classes
    Job
  49. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  50. implicit def iterableToRichPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): RichPipe

    Definition Classes
    Job
  51. def keepAlive(): Unit

    Definition Classes
    Job
  52. def listeners: List[FlowListener]

    Definition Classes
    Job
  53. implicit def mode: Mode

    Definition Classes
    Job
  54. def name: String

    Definition Classes
    Job
  55. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  56. final def newSymbol(avoid: Set[Symbol], guess: Symbol, trial: Int): Symbol

    Definition Classes
    FieldConversions
    Annotations
    @tailrec()
  57. def next: Option[Job]

    Here is where we check for convergence and then run the next job if we're not converged

    Here is where we check for convergence and then run the next job if we're not converged

    Definition Classes
    PageRank → Job
  58. final def notify(): Unit

    Definition Classes
    AnyRef
  59. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  60. def output(pipe: RichPipe): Pipe

  61. implicit def parseAnySeqToFields[T <: TraversableOnce[Any]](anyf: T): Fields

    Definition Classes
    FieldConversions
  62. implicit def pipeToRichPipe(pipe: Pipe): RichPipe

    Definition Classes
    Job
  63. implicit def productToFields(f: Product): Fields

    Definition Classes
    LowPriorityFieldConversions
  64. implicit def read(src: Source): Pipe

    Definition Classes
    Job
  65. def run(): Boolean

    Definition Classes
    Job
  66. implicit def scaldingConfig: Config

    Attributes
    protected
    Definition Classes
    Job
  67. def skipStrategy: Option[FlowSkipStrategy]

    Definition Classes
    Job
  68. implicit def sourceToRichPipe(src: Source): RichPipe

    Definition Classes
    Job
  69. def stepListeners: List[FlowStepListener]

    Definition Classes
    Job
  70. def stepStrategy: Option[FlowStepStrategy[_]]

    Definition Classes
    Job
  71. implicit def strFields[T <: TraversableOnce[String]](f: T): Fields

    Definition Classes
    FieldConversions
  72. implicit def stringToFields(x: String): Fields

    Definition Classes
    FieldConversions
  73. implicit def symbolToFields(x: Symbol): Fields

    Definition Classes
    FieldConversions
  74. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  75. def timeout[T](timeout: AbsoluteDuration)(t: ⇒ T): Option[T]

    Definition Classes
    Job
  76. implicit def toPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): Pipe

    Definition Classes
    Job
  77. def toString(): String

    Definition Classes
    AnyRef → Any
  78. implicit def tuple2ToFieldsPair[T, U](pair: (T, U))(implicit tf: (T) ⇒ Fields, uf: (U) ⇒ Fields): (Fields, Fields)

    Definition Classes
    FieldConversions
  79. implicit def unitToFields(u: Unit): Fields

    Definition Classes
    FieldConversions
  80. def validate(): Unit

    Definition Classes
    Job
  81. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  82. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  83. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  84. def write(pipe: Pipe, src: Source): Unit

    Definition Classes
    Job

Inherited from Job

Inherited from Serializable

Inherited from FieldConversions

Inherited from LowPriorityFieldConversions

Inherited from AnyRef

Inherited from Any

Ungrouped