com.twitter.scalding.examples

WeightedPageRank

class WeightedPageRank extends Job

weighted page rank for the given graph, start from the given pagerank, perform one iteartion, test for convergence, if not yet, clone itself and start the next page rank job with updated pagerank as input.

This class is very similar to the PageRank class, main differences are: 1. supported weighted pagerank 2. the reset pagerank is pregenerated, possibly through a previous job 3. dead pagerank is evenly distributed

Options: --pwd: working directory, will read/generate the following files there numnodes: total number of nodes nodes: nodes file <'src_id, 'dst_ids, 'weights, 'mass_prior> pagerank: the page rank file eg pagerank_0, pagerank_1 etc totaldiff: the current max pagerank delta Optional arguments: --weighted: do weighted pagerank, default false --curiteration: what is the current iteration, default 0 --maxiterations: how many iterations to run. Default is 20 --jumpprob: probability of a random jump, default is 0.1 --threshold: total difference before finishing early, default 0.001

Linear Supertypes
Job, Serializable, FieldConversions, LowPriorityFieldConversions, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. WeightedPageRank
  2. Job
  3. Serializable
  4. FieldConversions
  5. LowPriorityFieldConversions
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new WeightedPageRank(args: Args)

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. val ALPHA: Double

  7. val CURITERATION: Int

  8. val MAXITERATIONS: Int

  9. val PWD: String

  10. val ROW_TYPE_1: Int

  11. val ROW_TYPE_2: Int

  12. val THRESHOLD: Double

  13. val WEIGHTED: Boolean

  14. implicit def _implicitJobArgs: Args

    Attributes
    protected
    Definition Classes
    Job
  15. def anyToFieldArg(f: Any): Comparable[_]

    Attributes
    protected
    Definition Classes
    LowPriorityFieldConversions
  16. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  17. def asList(f: Fields): List[Comparable[_]]

    Definition Classes
    FieldConversions
  18. def asSet(f: Fields): Set[Comparable[_]]

    Definition Classes
    FieldConversions
  19. def buildFlow: Flow[_]

    Definition Classes
    Job
  20. def classIdentifier: String

    Definition Classes
    Job
  21. def clear(): Unit

    Definition Classes
    Job
  22. def clone(nextargs: Args): Job

    Definition Classes
    Job
  23. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. def config: Map[AnyRef, AnyRef]

    Definition Classes
    Job
  25. implicit def dateParser: DateParser

    Definition Classes
    Job
  26. def defaultComparator: Option[Class[_ <: Comparator[_]]]

    Definition Classes
    Job
  27. def defaultMode(fromFields: Fields, toFields: Fields): Fields

    Definition Classes
    FieldConversions
  28. def defaultSpillThreshold: Int

    Definition Classes
    Job
  29. def doPageRank(nodeRows: RichPipe, inputPagerank: RichPipe): RichPipe

    one iteration of pagerank inputPagerank: <'src_id_input, 'mass_input> return <'src_id, 'mass_n, 'mass_input>

    one iteration of pagerank inputPagerank: <'src_id_input, 'mass_input> return <'src_id, 'mass_n, 'mass_input>

    Here is a highlevel view of the unweighted algorithm: let N: number of nodes inputPagerank(N_i): prob of walking to node i, d(N_j): N_j's out degree then pagerankNext(N_i) = (\sum_{j points to i} inputPagerank(N_j) / d_j) deadPagerank = (1 - \sum_{i} pagerankNext(N_i)) / N randomPagerank(N_i) = userMass(N_i) * ALPHA + deadPagerank * (1-ALPHA) pagerankOutput(N_i) = randomPagerank(N_i) + pagerankNext(N_i) * (1-ALPHA)

    For weighted algorithm: let w(N_j, N_i): weight from N_j to N_i tw(N_j): N_j's total out weights then pagerankNext(N_i) = (\sum_{j points to i} inputPagerank(N_j) * w(N_j, N_i) / tw(N_j))

  30. final def ensureUniqueFields(left: Fields, right: Fields, rightPipe: Pipe): (Fields, Pipe)

    Definition Classes
    FieldConversions
  31. implicit def enumValueToFields(x: Value): Fields

    Definition Classes
    FieldConversions
  32. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  33. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  34. implicit def fieldFields[T <: TraversableOnce[Field[_]]](f: T): RichFields

    Definition Classes
    FieldConversions
  35. implicit def fieldToFields(f: Field[_]): RichFields

    Definition Classes
    FieldConversions
  36. implicit def fields[T <: TraversableOnce[Symbol]](f: T): Fields

    Definition Classes
    FieldConversions
  37. implicit def fieldsToRichFields(fields: Fields): RichFields

    Definition Classes
    FieldConversions
  38. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  39. implicit val flowDef: FlowDef

    Attributes
    protected
    Definition Classes
    Job
  40. implicit def fromEnum[T <: Enumeration](enumeration: T): Fields

    Definition Classes
    FieldConversions
  41. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  42. def getField(f: Fields, idx: Int): Fields

    Definition Classes
    FieldConversions
  43. def getInputPagerank(fileName: String): Pipe

  44. def getNodes(fileName: String): Pipe

    read the pregenerated nodes file <'src_id, 'dst_ids, 'weights, 'mass_prior>

  45. def getNumNodes(fileName: String): Pipe

    the total number of nodes, single line file

  46. def handleStats(statsData: CascadingStats): Unit

    Attributes
    protected
    Definition Classes
    Job
  47. def hasInts(f: Fields): Boolean

    Definition Classes
    FieldConversions
  48. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  49. val inputPagerank: Pipe

  50. implicit def intFields[T <: TraversableOnce[Int]](f: T): Fields

    Definition Classes
    FieldConversions
  51. implicit def intToFields(x: Int): Fields

    Definition Classes
    FieldConversions
  52. implicit def integerToFields(x: Integer): Fields

    Definition Classes
    FieldConversions
  53. def ioSerializations: List[Class[_ <: Serialization[_]]]

    Definition Classes
    Job
  54. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  55. implicit def iterableToRichPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): RichPipe

    Definition Classes
    Job
  56. def keepAlive(): Unit

    Definition Classes
    Job
  57. def listeners: List[FlowListener]

    Definition Classes
    Job
  58. implicit def mode: Mode

    Definition Classes
    Job
  59. def name: String

    Definition Classes
    Job
  60. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  61. final def newSymbol(avoid: Set[Symbol], guess: Symbol, trial: Int): Symbol

    Definition Classes
    FieldConversions
    Annotations
    @tailrec()
  62. def next: Option[Job]

    test convergence, if not yet, kick off the next iteration

    test convergence, if not yet, kick off the next iteration

    Definition Classes
    WeightedPageRank → Job
  63. val nodes: Pipe

  64. final def notify(): Unit

    Definition Classes
    AnyRef
  65. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  66. val numNodes: Pipe

  67. val outputFileName: String

  68. val outputPagerank: RichPipe

  69. implicit def parseAnySeqToFields[T <: TraversableOnce[Any]](anyf: T): Fields

    Definition Classes
    FieldConversions
  70. implicit def pipeToRichPipe(pipe: Pipe): RichPipe

    Definition Classes
    Job
  71. implicit def productToFields(f: Product): Fields

    Definition Classes
    LowPriorityFieldConversions
  72. implicit def read(src: Source): Pipe

    Definition Classes
    Job
  73. def run(): Boolean

    Definition Classes
    Job
  74. implicit def scaldingConfig: Config

    Attributes
    protected
    Definition Classes
    Job
  75. def skipStrategy: Option[FlowSkipStrategy]

    Definition Classes
    Job
  76. implicit def sourceToRichPipe(src: Source): RichPipe

    Definition Classes
    Job
  77. def stepListeners: List[FlowStepListener]

    Definition Classes
    Job
  78. def stepStrategy: Option[FlowStepStrategy[_]]

    Definition Classes
    Job
  79. implicit def strFields[T <: TraversableOnce[String]](f: T): Fields

    Definition Classes
    FieldConversions
  80. implicit def stringToFields(x: String): Fields

    Definition Classes
    FieldConversions
  81. implicit def symbolToFields(x: Symbol): Fields

    Definition Classes
    FieldConversions
  82. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  83. def timeout[T](timeout: AbsoluteDuration)(t: ⇒ T): Option[T]

    Definition Classes
    Job
  84. implicit def toPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): Pipe

    Definition Classes
    Job
  85. def toString(): String

    Definition Classes
    AnyRef → Any
  86. val totalDiff: Pipe

  87. implicit def tuple2ToFieldsPair[T, U](pair: (T, U))(implicit tf: (T) ⇒ Fields, uf: (U) ⇒ Fields): (Fields, Fields)

    Definition Classes
    FieldConversions
  88. implicit def unitToFields(u: Unit): Fields

    Definition Classes
    FieldConversions
  89. def validate(): Unit

    Definition Classes
    Job
  90. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  91. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  92. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  93. def write(pipe: Pipe, src: Source): Unit

    Definition Classes
    Job

Inherited from Job

Inherited from Serializable

Inherited from FieldConversions

Inherited from LowPriorityFieldConversions

Inherited from AnyRef

Inherited from Any

Ungrouped