WeightedPageRank

Instance Constructors

new WeightedPageRank(args: Args)

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
val ALPHA: Double
val CURITERATION: Int
val MAXITERATIONS: Int
val PWD: String
val ROW_TYPE_1: Int
val ROW_TYPE_2: Int
val THRESHOLD: Double
val WEIGHTED: Boolean
def anyToFieldArg(f: Any): Comparable[_]

Attributes
protected
Definition Classes
LowPriorityFieldConversions
final def asInstanceOf[T0]: T0

Definition Classes
Any
def asList(f: Fields): List[Comparable[_]]

Definition Classes
FieldConversions
def asSet(f: Fields): Set[Comparable[_]]

Definition Classes
FieldConversions
def buildFlow: Flow[_]

combine the config, flowDef and the Mode to produce a flow
combine the config, flowDef and the Mode to produce a flow

Definition Classes
Job
def classIdentifier: String

Definition Classes
Job
def clear: Unit

Definition Classes
Job
def clone(nextargs: Args): Job

Copy this job By default, this uses reflection and the single argument Args constructor
Copy this job By default, this uses reflection and the single argument Args constructor

Definition Classes
Job
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def config: Map[AnyRef, AnyRef]

This is the exact config that is passed to the Cascading FlowConnector.
This is the exact config that is passed to the Cascading FlowConnector. By default: if there are no spill thresholds in mode.config, we replace with defaultSpillThreshold we overwrite io.serializations with ioSerializations we overwrite cascading.tuple.element.comparator.default to defaultComparator we add some scalding keys for debugging/logging
Tip: override this method, call super, and ++ your additional map to add or overwrite more options
This returns Map[AnyRef, AnyRef] for compatibility with older code

Definition Classes
Job
implicit def dateParser: DateParser

Override this to control how dates are parsed
Override this to control how dates are parsed

Definition Classes
Job
def defaultComparator: Option[Class[_ <: Comparator[_]]]

Override this if you want to customize comparisons/hashing for your job the config method overwrites using this before sending to cascading The one we use by default is needed used to make Joins in the Fields-API more robust to Long vs Int differences.
Override this if you want to customize comparisons/hashing for your job the config method overwrites using this before sending to cascading The one we use by default is needed used to make Joins in the Fields-API more robust to Long vs Int differences. If you only use the Typed-API, consider changing this to return None

Definition Classes
Job
def defaultMode(fromFields: Fields, toFields: Fields): Fields

Rather than give the full power of cascading's selectors, we have a simpler set of rules encoded below: 1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have a simpler set of rules encoded below: 1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. Perhaps only fromFields=ALL will make sense 2) If one of from or to is a strict super set of the other, SWAP is used. 3) If they are equal, REPLACE is used. 4) Otherwise, ALL is used.

Definition Classes
FieldConversions
def defaultSpillThreshold: Int

Keep 100k tuples in memory by default before spilling Turn this up as high as you can without getting OOM.
Keep 100k tuples in memory by default before spilling Turn this up as high as you can without getting OOM.
This is ignored if there is a value set in the incoming jobConf on Hadoop

Definition Classes
Job
def doPageRank(nodeRows: RichPipe, inputPagerank: RichPipe): RichPipe

one iteration of pagerank inputPagerank: <'src_id_input, 'mass_input> return <'src_id, 'mass_n, 'mass_input>
one iteration of pagerank inputPagerank: <'src_id_input, 'mass_input> return <'src_id, 'mass_n, 'mass_input>
Here is a highlevel view of the unweighted algorithm: let N: number of nodes inputPagerank(N_i): prob of walking to node i, d(N_j): N_j's out degree then pagerankNext(N_i) = (\sum_{j points to i} inputPagerank(N_j) / d_j) deadPagerank = (1 - \sum_{i} pagerankNext(N_i)) / N randomPagerank(N_i) = userMass(N_i) * ALPHA + deadPagerank * (1-ALPHA) pagerankOutput(N_i) = randomPagerank(N_i) + pagerankNext(N_i) * (1-ALPHA)
For weighted algorithm: let w(N_j, N_i): weight from N_j to N_i tw(N_j): N_j's total out weights then pagerankNext(N_i) = (\sum_{j points to i} inputPagerank(N_j) * w(N_j, N_i) / tw(N_j))
final def ensureUniqueFields(left: Fields, right: Fields, rightPipe: Pipe): (Fields, Pipe)

Definition Classes
FieldConversions
implicit def enumValueToFields(x: Value): Fields

Definition Classes
FieldConversions
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
implicit def fieldFields[T <: TraversableOnce[Field[_]]](f: T): RichFields

Definition Classes
FieldConversions
implicit def fieldToFields(f: Field[_]): RichFields

Definition Classes
FieldConversions
implicit def fields[T <: TraversableOnce[Symbol]](f: T): Fields

Definition Classes
FieldConversions
implicit def fieldsToRichFields(fields: Fields): RichFields

We can't set the field Manifests because cascading doesn't (yet) expose field type information in the Fields API.
We can't set the field Manifests because cascading doesn't (yet) expose field type information in the Fields API.

Definition Classes
FieldConversions
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
implicit val flowDef: FlowDef

Attributes
protected
Definition Classes
Job
implicit def fromEnum[T <: Enumeration](enumeration: T): Fields

Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so that List will not conflict with Product.

Definition Classes
FieldConversions
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getField(f: Fields, idx: Int): Fields

Definition Classes
FieldConversions
def getInputPagerank(fileName: String): Pipe
def getNodes(fileName: String): Pipe

read the pregenerated nodes file <'src_id, 'dst_ids, 'weights, 'mass_prior>
def getNumNodes(fileName: String): Pipe

the total number of nodes, single line file
def handleStats(statsData: CascadingStats): Unit

Attributes
protected
Definition Classes
Job
def hasInts(f: Fields): Boolean

Definition Classes
FieldConversions
def hashCode(): Int

Definition Classes
AnyRef → Any
val inputPagerank: Pipe
implicit def intFields[T <: TraversableOnce[Int]](f: T): Fields

Definition Classes
FieldConversions
implicit def intToFields(x: Int): Fields

Definition Classes
FieldConversions
implicit def integerToFields(x: Integer): Fields

Definition Classes
FieldConversions
def ioSerializations: List[Class[_ <: Serialization[_]]]

These are user-defined serializations IN-ADDITION to (but deduped) with the required serializations
These are user-defined serializations IN-ADDITION to (but deduped) with the required serializations

Definition Classes
Job
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
implicit def iterableToRichPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): RichPipe

Definition Classes
Job
def keepAlive: Unit

Use this if a map or reduce phase takes a while before emitting tuples.
Use this if a map or reduce phase takes a while before emitting tuples.

Definition Classes
Job
def listeners: List[FlowListener]

Definition Classes
Job
implicit def mode: Mode

Definition Classes
Job
def name: String

Definition Classes
Job
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def newSymbol(avoid: Set[Symbol], guess: Symbol, trial: Int = 0): Symbol

Definition Classes
FieldConversions
Annotations
@tailrec()
def next: Option[Job]

test convergence, if not yet, kick off the next iteration
test convergence, if not yet, kick off the next iteration

Definition Classes
WeightedPageRank → Job
val nodes: Pipe
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val numNodes: Pipe
val outputFileName: String
val outputPagerank: RichPipe
implicit def parseAnySeqToFields[T <: TraversableOnce[Any]](anyf: T): Fields

Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). Not sure we should be this flexible, but given that Cascading will throw an exception before scheduling the job, I guess this is okay.

Definition Classes
FieldConversions
implicit def pipeToRichPipe(pipe: Pipe): RichPipe

you should never call this directly, it is here to make the DSL work.
you should never call this directly, it is here to make the DSL work. Just know, you can treat a Pipe as a RichPipe within a Job

Definition Classes
Job
implicit def productToFields(f: Product): Fields

Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. This is low priority because List is also a Product, but this method will not work for List (because List is Product2(head, tail) and so productIterator won't work as expected. Lists are handled by an implicit in FieldConversions, which have higher priority.

Definition Classes
LowPriorityFieldConversions
implicit def read(src: Source): Pipe

This is implicit so that a Source can be used as the argument to a join or other method that accepts Pipe.
This is implicit so that a Source can be used as the argument to a join or other method that accepts Pipe.

Definition Classes
Job
def run: Boolean

Definition Classes
Job
implicit def scaldingConfig: Config

This is here so that Mappable.
This is here so that Mappable.toIterator can find an implicit config

Attributes
protected
Definition Classes
Job
def skipStrategy: Option[FlowSkipStrategy]

Definition Classes
Job
implicit def sourceToRichPipe(src: Source): RichPipe

This implicit is to enable RichPipe methods directly on Source objects, such as map/flatMap, etc.
This implicit is to enable RichPipe methods directly on Source objects, such as map/flatMap, etc...
Note that Mappable is a subclass of Source, and Mappable already has mapTo and flatMapTo BUT WITHOUT incoming fields used (see the Mappable trait). This creates some confusion when using these methods (this is an unfortunate mistake in our design that was not noticed until later). To remove ambiguity, explicitly call .read on any Source that you begin operating with a mapTo/flatMapTo.

Definition Classes
Job
def stepListeners: List[FlowStepListener]

Definition Classes
Job
def stepStrategy: Option[FlowStepStrategy[_]]

Specify a callback to run before the start of each flow step.
Specify a callback to run before the start of each flow step.
Defaults to what Config.getReducerEstimator specifies.

Definition Classes
Job
See also
ExecutionContext.buildFlow
implicit def strFields[T <: TraversableOnce[String]](f: T): Fields

Definition Classes
FieldConversions
implicit def stringToFields(x: String): Fields

Definition Classes
FieldConversions
implicit def symbolToFields(x: Symbol): Fields

'* means Fields.
'* means Fields.ALL, otherwise we take the .name

Definition Classes
FieldConversions
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def timeout[T](timeout: AbsoluteDuration)(t: ⇒ T): Option[T]

Definition Classes
Job
implicit def toPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): Pipe

Definition Classes
Job
def toString(): String

Definition Classes
AnyRef → Any
val totalDiff: Pipe
implicit def tuple2ToFieldsPair[T, U](pair: (T, U))(implicit tf: (T) ⇒ Fields, uf: (U) ⇒ Fields): (Fields, Fields)

Definition Classes
FieldConversions
implicit def unitToFields(u: Unit): Fields

Definition Classes
FieldConversions
def validate: Unit

Definition Classes
Job
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
def write(pipe: Pipe, src: Source): Unit

This is only here for Java jobs which cannot automatically access the implicit Pipe => RichPipe which makes: pipe.
This is only here for Java jobs which cannot automatically access the implicit Pipe => RichPipe which makes: pipe.write( ) convenient

Definition Classes
Job

class WeightedPageRank extends Job

Instance Constructors

new WeightedPageRank(args: Args)

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

val ALPHA: Double

val CURITERATION: Int

val MAXITERATIONS: Int

val PWD: String

val ROW_TYPE_1: Int

val ROW_TYPE_2: Int

val THRESHOLD: Double

val WEIGHTED: Boolean

def anyToFieldArg(f: Any): Comparable[_]

final def asInstanceOf[T0]: T0

def asList(f: Fields): List[Comparable[_]]

def asSet(f: Fields): Set[Comparable[_]]

def buildFlow: Flow[_]

def classIdentifier: String

def clear: Unit

def clone(nextargs: Args): Job

def clone(): AnyRef

def config: Map[AnyRef, AnyRef]

implicit def dateParser: DateParser

def defaultComparator: Option[Class[_ <: Comparator[_]]]

def defaultMode(fromFields: Fields, toFields: Fields): Fields

def defaultSpillThreshold: Int

def doPageRank(nodeRows: RichPipe, inputPagerank: RichPipe): RichPipe

final def ensureUniqueFields(left: Fields, right: Fields, rightPipe: Pipe): (Fields, Pipe)

implicit def enumValueToFields(x: Value): Fields

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

implicit def fieldFields[T <: TraversableOnce[Field[_]]](f: T): RichFields

implicit def fieldToFields(f: Field[_]): RichFields

implicit def fields[T <: TraversableOnce[Symbol]](f: T): Fields

implicit def fieldsToRichFields(fields: Fields): RichFields

def finalize(): Unit

implicit val flowDef: FlowDef

implicit def fromEnum[T <: Enumeration](enumeration: T): Fields

final def getClass(): Class[_]

def getField(f: Fields, idx: Int): Fields

def getInputPagerank(fileName: String): Pipe

def getNodes(fileName: String): Pipe

def getNumNodes(fileName: String): Pipe

def handleStats(statsData: CascadingStats): Unit

def hasInts(f: Fields): Boolean

def hashCode(): Int

val inputPagerank: Pipe

implicit def intFields[T <: TraversableOnce[Int]](f: T): Fields

implicit def intToFields(x: Int): Fields

implicit def integerToFields(x: Integer): Fields

def ioSerializations: List[Class[_ <: Serialization[_]]]

final def isInstanceOf[T0]: Boolean

implicit def iterableToRichPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): RichPipe

def keepAlive: Unit

def listeners: List[FlowListener]

implicit def mode: Mode

def name: String

final def ne(arg0: AnyRef): Boolean

final def newSymbol(avoid: Set[Symbol], guess: Symbol, trial: Int = 0): Symbol

def next: Option[Job]

val nodes: Pipe

final def notify(): Unit

final def notifyAll(): Unit

val numNodes: Pipe

val outputFileName: String

val outputPagerank: RichPipe

implicit def parseAnySeqToFields[T <: TraversableOnce[Any]](anyf: T): Fields

implicit def pipeToRichPipe(pipe: Pipe): RichPipe

implicit def productToFields(f: Product): Fields

implicit def read(src: Source): Pipe

def run: Boolean

implicit def scaldingConfig: Config

def skipStrategy: Option[FlowSkipStrategy]

implicit def sourceToRichPipe(src: Source): RichPipe