com.twitter.cassovary.util

io

package io

Visibility
  1. Public
  2. All

Type Members

  1. class AdjacencyListGraphReader[T] extends GraphReaderFromDirectory[T]

    Reads in a multi-line adjacency list from multiple files in a directory, where ids are of type T.

    Reads in a multi-line adjacency list from multiple files in a directory, where ids are of type T. Does not check for duplicate edges or nodes.

    You can optionally specify which files in a directory to read. For example, you may have files starting with "part-" that you'd like to read. Only these will be read in if you specify that as the file prefix.

    In each file, a node and its neighbors is defined by the first line being that node's id and its # of neighbors, followed by that number of ids on subsequent lines. For example, when ids are Ints, 241 3 2 4 1 53 1 241 ... In this file, node 241 has 3 neighbors, namely 2, 4 and 1. Node 53 has 1 neighbor, 241.

    Similarly, when ids are String, input file should follow the example: Alice 2 Bob Chris Bob 1 Chris Chris 1 Bob ... In this file Alice has 2 directed edges to Bob and Chris, Bob has an edge to Chris, and Chris has outgoing edge to Bob. *

  2. trait GraphReader[T] extends AnyRef

    Trait that classes should implement to read in graphs that nodes have ids of type T.

    Trait that classes should implement to read in graphs that nodes have ids of type T.

    The reader class is required to implement iteratorSeq, a method which returns a sequence of functions that themselves return an Iterator over NodeIdEdgesMaxId (see its type signature below as well).

    It is also required to provide a nodeNumberer[T].

    NodeIdEdgesMaxId is a case class defined in ArrayBasedDirectedGraph that stores 1) the id of a node, 2) the ids of its neighbors, and 3) the maximum id of itself and its neighbors.

    One useful reference implementation is AdjacencyListGraphReader.

  3. trait GraphReaderFromDirectory[T] extends GraphReader[T]

    A subtrait of GraphReader that reads files of names specified by prefix and containing directory.

  4. class LabelsReader extends AnyRef

    Only reads node labels of type int right now and only uses array based label.

    Only reads node labels of type int right now and only uses array based label. Assumes that the label files are named as follows: collPrefix_anything_labelName.txt So the file name starts with an identifier that marks this collection of labels to be read. And that each line has an id followed by a single space followed by int value of label <id> <labelValue>

  5. class ListOfEdgesGraphReader[T] extends GraphReaderFromDirectory[T]

    Reads in a multi-line list of edges from multiple files in a directory, which nodes have ids of type T.

    Reads in a multi-line list of edges from multiple files in a directory, which nodes have ids of type T. Does not check for duplicate edges or nodes.

    You can optionally specify which files in a directory to read. For example, you may have files starting with "part-" that you'd like to read. Only these will be read in if you specify that as the file prefix.

    You should also specify nodeNumberer, idReader for reading node ids.

    For a default version for Int graphs see ListOfEdgesGraphReader.forIntIds builder method.

    In each file, a directed edges is defined by a pair of T: from and to. For example, we use String ids with (space) separator, when reading file:

    a b
    b d
    d c
    a e
    ...

    In this file, node a has two outgoing edges (to b and e), node b has an outgoing edge to node d and node d has an outgoing edge to node c.

    Note that, it is recommended to use AdjacencyListGraphReader, because of its efficiency.

Value Members

  1. object AdjacencyListGraphReader

  2. object GraphWriter

    Utility class for writing a graph object to a Writer output stream, such that it could be read back in by a GraphReader.

  3. object IoUtils

  4. object ListOfEdgesGraphReader

Ungrouped