com.twitter.scalding.examples

WeightedPageRankFromMatrix

Related Doc: package examples

class WeightedPageRankFromMatrix extends Job

A weighted PageRank implementation using the Scalding Matrix API. This assumes that all rows and columns are of type Int and values or egde weights are Double. If you want an unweighted PageRank, simply set the weights on the edges to 1.

Input arguments:

d -- damping factor n -- number of nodes in the graph currentIteration -- start with 0 probably maxIterations -- stop after n iterations convergenceThreshold -- using the sum of the absolute difference between iteration solutions, iterating stops once we reach this threshold rootDir -- the root directory holding all starting, intermediate and final data/output

The expected structure of the rootDir is:

rootDir |- iterations | |- 0 <-- a TSV of (row, value) of size n, value can be 1/n (generate this) | |- n <-- holds future iterations/solutions |- edges <-- a TSV of (row, column, value) for edges in the graph |- onesVector <-- a TSV of (row, 1) of size n (generate this) |- diff <-- a single line representing the difference between the last iterations |- constants <-- built at iteration 0, these are constant for any given matrix/graph |- M_hat |- priorVector

Don't forget to set the number of reducers for this job: -D mapred.reduce.tasks=n

Linear Supertypes
Job, Serializable, FieldConversions, LowPriorityFieldConversions, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. WeightedPageRankFromMatrix
  2. Job
  3. Serializable
  4. FieldConversions
  5. LowPriorityFieldConversions
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new WeightedPageRankFromMatrix(args: Args)

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. def M_hat: Matrix[Int, Int, Double]

    Load or generate on first iteration the matrix M^ given A.

  5. def anyToFieldArg(f: Any): Comparable[_]

    Attributes
    protected
    Definition Classes
    LowPriorityFieldConversions
  6. val args: Args

    Definition Classes
    Job
  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def asList(f: Fields): List[Comparable[_]]

    Definition Classes
    FieldConversions
  9. def asSet(f: Fields): Set[Comparable[_]]

    Definition Classes
    FieldConversions
  10. def buildFlow: Flow[_]

    combine the config, flowDef and the Mode to produce a flow

    combine the config, flowDef and the Mode to produce a flow

    Definition Classes
    Job
  11. def classIdentifier: String

    Definition Classes
    Job
  12. def clear: Unit

    Definition Classes
    Job
  13. def clone(nextargs: Args): Job

    Copy this job By default, this uses reflection and the single argument Args constructor

    Copy this job By default, this uses reflection and the single argument Args constructor

    Definition Classes
    Job
  14. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  15. def colVectorFromTsv(input: String): ColVector[Int, Double]

  16. def config: Map[AnyRef, AnyRef]

    This is the exact config that is passed to the Cascading FlowConnector.

    This is the exact config that is passed to the Cascading FlowConnector. By default: if there are no spill thresholds in mode.config, we replace with defaultSpillThreshold we overwrite io.serializations with ioSerializations we overwrite cascading.tuple.element.comparator.default to defaultComparator we add some scalding keys for debugging/logging

    Tip: override this method, call super, and ++ your additional map to add or overwrite more options

    This returns Map[AnyRef, AnyRef] for compatibility with older code

    Definition Classes
    Job
  17. val convergenceThreshold: Double

  18. val currentIteration: Int

  19. val d: Double

  20. implicit def dateParser: DateParser

    Override this to control how dates are parsed

    Override this to control how dates are parsed

    Definition Classes
    Job
  21. def defaultComparator: Option[Class[_ <: Comparator[_]]]

    Override this if you want to customize comparisons/hashing for your job the config method overwrites using this before sending to cascading The one we use by default is needed used to make Joins in the Fields-API more robust to Long vs Int differences.

    Override this if you want to customize comparisons/hashing for your job the config method overwrites using this before sending to cascading The one we use by default is needed used to make Joins in the Fields-API more robust to Long vs Int differences. If you only use the Typed-API, consider changing this to return None

    Definition Classes
    Job
  22. def defaultMode(fromFields: Fields, toFields: Fields): Fields

    Rather than give the full power of cascading's selectors, we have a simpler set of rules encoded below: 1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output.

    Rather than give the full power of cascading's selectors, we have a simpler set of rules encoded below: 1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. Perhaps only fromFields=ALL will make sense 2) If one of from or to is a strict super set of the other, SWAP is used. 3) If they are equal, REPLACE is used. 4) Otherwise, ALL is used.

    Definition Classes
    FieldConversions
  23. def defaultSpillThreshold: Int

    Keep 100k tuples in memory by default before spilling Turn this up as high as you can without getting OOM.

    Keep 100k tuples in memory by default before spilling Turn this up as high as you can without getting OOM.

    This is ignored if there is a value set in the incoming jobConf on Hadoop

    Definition Classes
    Job
  24. val diffLoc: String

  25. val edgesLoc: String

  26. final def ensureUniqueFields(left: Fields, right: Fields, rightPipe: Pipe): (Fields, Pipe)

    Definition Classes
    FieldConversions
  27. implicit def enumValueToFields(x: Value): Fields

    Definition Classes
    FieldConversions
  28. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  29. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  30. implicit def fieldFields[T <: TraversableOnce[Field[_]]](f: T): RichFields

    Definition Classes
    FieldConversions
  31. implicit def fieldToFields(f: Field[_]): RichFields

    Definition Classes
    FieldConversions
  32. implicit def fields[T <: TraversableOnce[Symbol]](f: T): Fields

    Definition Classes
    FieldConversions
  33. implicit def fieldsToRichFields(fields: Fields): RichFields

    We can't set the field Manifests because cascading doesn't (yet) expose field type information in the Fields API.

    We can't set the field Manifests because cascading doesn't (yet) expose field type information in the Fields API.

    Definition Classes
    FieldConversions
  34. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  35. implicit val flowDef: FlowDef

    Attributes
    protected
    Definition Classes
    Job
  36. implicit def fromEnum[T <: Enumeration](enumeration: T): Fields

    Multi-entry fields.

    Multi-entry fields. This are higher priority than Product conversions so that List will not conflict with Product.

    Definition Classes
    FieldConversions
  37. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  38. def getField(f: Fields, idx: Int): Fields

    Definition Classes
    FieldConversions
  39. def handleStats(statsData: CascadingStats): Unit

    Attributes
    protected
    Definition Classes
    Job
  40. def hasInts(f: Fields): Boolean

    Definition Classes
    FieldConversions
  41. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  42. implicit def intFields[T <: TraversableOnce[Int]](f: T): Fields

    Definition Classes
    FieldConversions
  43. implicit def intToFields(x: Int): Fields

    Definition Classes
    FieldConversions
  44. implicit def integerToFields(x: Integer): Fields

    Definition Classes
    FieldConversions
  45. def ioSerializations: List[Class[_ <: Serialization[_]]]

    These are user-defined serializations IN-ADDITION to (but deduped) with the required serializations

    These are user-defined serializations IN-ADDITION to (but deduped) with the required serializations

    Definition Classes
    Job
  46. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  47. implicit def iterableToRichPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): RichPipe

    Definition Classes
    Job
  48. val iterationsDir: String

  49. def keepAlive: Unit

    Use this if a map or reduce phase takes a while before emitting tuples.

    Use this if a map or reduce phase takes a while before emitting tuples.

    Definition Classes
    Job
  50. def listeners: List[FlowListener]

    Definition Classes
    Job
  51. def matrixFromTsv(input: String): Matrix[Int, Int, Double]

  52. val maxIterations: Int

  53. def measureConvergenceAndStore(): Unit

    Measure convergence by calculating the total of the absolute difference between the previous and next vectors.

    Measure convergence by calculating the total of the absolute difference between the previous and next vectors. This stores the result after calculation.

  54. implicit def mode: Mode

    Definition Classes
    Job
  55. val n: Int

  56. def name: String

    Definition Classes
    Job
  57. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  58. final def newSymbol(avoid: Set[Symbol], guess: Symbol, trial: Int = 0): Symbol

    Definition Classes
    FieldConversions
    Annotations
    @tailrec()
  59. def next: Option[Job]

    Recurse and iterate again iff we are under the max number of iterations and vector has not converged.

    Recurse and iterate again iff we are under the max number of iterations and vector has not converged.

    Definition Classes
    WeightedPageRankFromMatrixJob
  60. val nextVector: ColVector[Int, Double]

  61. val nextVectorLoc: String

  62. final def notify(): Unit

    Definition Classes
    AnyRef
  63. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  64. val onesVectorLoc: String

  65. implicit def parseAnySeqToFields[T <: TraversableOnce[Any]](anyf: T): Fields

    Useful to convert f : Any* to Fields.

    Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). Not sure we should be this flexible, but given that Cascading will throw an exception before scheduling the job, I guess this is okay.

    Definition Classes
    FieldConversions
  66. implicit def pipeToRichPipe(pipe: Pipe): RichPipe

    you should never call this directly, it is here to make the DSL work.

    you should never call this directly, it is here to make the DSL work. Just know, you can treat a Pipe as a RichPipe within a Job

    Definition Classes
    Job
  67. val previousVector: ColVector[Int, Double]

  68. val previousVectorLoc: String

  69. def priorVector: ColVector[Int, Double]

    Load or generate on first iteration the prior vector given d and n.

  70. implicit def productToFields(f: Product): Fields

    Handles treating any TupleN as a Fields object.

    Handles treating any TupleN as a Fields object. This is low priority because List is also a Product, but this method will not work for List (because List is Product2(head, tail) and so productIterator won't work as expected. Lists are handled by an implicit in FieldConversions, which have higher priority.

    Definition Classes
    LowPriorityFieldConversions
  71. implicit def read(src: Source): Pipe

    This is implicit so that a Source can be used as the argument to a join or other method that accepts Pipe.

    This is implicit so that a Source can be used as the argument to a join or other method that accepts Pipe.

    Definition Classes
    Job
  72. val rootDir: String

  73. def run: Boolean

    Definition Classes
    Job
  74. implicit def scaldingConfig: Config

    This is here so that Mappable.toIterator can find an implicit config

    This is here so that Mappable.toIterator can find an implicit config

    Attributes
    protected
    Definition Classes
    Job
  75. def skipStrategy: Option[FlowSkipStrategy]

    Definition Classes
    Job
  76. implicit def sourceToRichPipe(src: Source): RichPipe

    This implicit is to enable RichPipe methods directly on Source objects, such as map/flatMap, etc...

    This implicit is to enable RichPipe methods directly on Source objects, such as map/flatMap, etc...

    Note that Mappable is a subclass of Source, and Mappable already has mapTo and flatMapTo BUT WITHOUT incoming fields used (see the Mappable trait). This creates some confusion when using these methods (this is an unfortunate mistake in our design that was not noticed until later). To remove ambiguity, explicitly call .read on any Source that you begin operating with a mapTo/flatMapTo.

    Definition Classes
    Job
  77. def stepListeners: List[FlowStepListener]

    Definition Classes
    Job
  78. def stepStrategy: Option[FlowStepStrategy[_]]

    Specify a callback to run before the start of each flow step.

    Specify a callback to run before the start of each flow step.

    Defaults to what Config.getReducerEstimator specifies.

    Definition Classes
    Job
    See also

    ExecutionContext.buildFlow

  79. implicit def strFields[T <: TraversableOnce[String]](f: T): Fields

    Definition Classes
    FieldConversions
  80. implicit def stringToFields(x: String): Fields

    Definition Classes
    FieldConversions
  81. implicit def symbolToFields(x: Symbol): Fields

    '* means Fields.ALL, otherwise we take the .name

    '* means Fields.ALL, otherwise we take the .name

    Definition Classes
    FieldConversions
  82. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  83. def timeout[T](timeout: AbsoluteDuration)(t: ⇒ T): Option[T]

    Definition Classes
    Job
  84. implicit def toPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): Pipe

    Definition Classes
    Job
  85. def toString(): String

    Definition Classes
    AnyRef → Any
  86. implicit def tuple2ToFieldsPair[T, U](pair: (T, U))(implicit tf: (T) ⇒ Fields, uf: (U) ⇒ Fields): (Fields, Fields)

    Definition Classes
    FieldConversions
  87. implicit def unitToFields(u: Unit): Fields

    Definition Classes
    FieldConversions
  88. def validate: Unit

    Definition Classes
    Job
  89. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  90. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  91. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  92. def write(pipe: Pipe, src: Source): Unit

    This is only here for Java jobs which cannot automatically access the implicit Pipe => RichPipe which makes: pipe.write( ) convenient

    This is only here for Java jobs which cannot automatically access the implicit Pipe => RichPipe which makes: pipe.write( ) convenient

    Definition Classes
    Job

Inherited from Job

Inherited from Serializable

Inherited from FieldConversions

Inherited from AnyRef

Inherited from Any

Ungrouped