com.twitter.cassovary.util

io

package io

Visibility
  1. Public
  2. All

Type Members

  1. class AdjacencyListGraphReader[T] extends GraphReaderFromDirectory[T]

    Reads in a multi-line adjacency list from multiple files in a directory, where ids are of type T.

    Reads in a multi-line adjacency list from multiple files in a directory, where ids are of type T. Does not check for duplicate edges or nodes.

    You can optionally specify which files in a directory to read. For example, you may have files starting with "part-" that you'd like to read. Only these will be read in if you specify that as the file prefix.

    In each file, a node and its neighbors is defined by the first line being that node's id and its # of neighbors, followed by that number of ids on subsequent lines. For example, when ids are Ints, 241 3 2 4 1 53 1 241 ... In this file, node 241 has 3 neighbors, namely 2, 4 and 1. Node 53 has 1 neighbor, 241.

    Similarly, when ids are String, input file should follow the example: Alice 2 Bob Chris Bob 1 Chris Chris 1 Bob ... In this file Alice has 2 directed edges to Bob and Chris, Bob has an edge to Chris, and Chris has outgoing edge to Bob. *

  2. trait GraphReader[T] extends AnyRef

    Trait that classes should implement to read in graphs that nodes have ids of type T.

    Trait that classes should implement to read in graphs that nodes have ids of type T.

    The reader class is required to implement iteratorSeq, a method which returns a sequence of functions that themselves return an Iterator over NodeIdEdgesMaxId (see its type signature below as well).

    It is also required to provide a nodeNumberer[T].

    NodeIdEdgesMaxId is a case class defined in ArrayBasedDirectedGraph that stores 1) the id of a node, 2) the ids of its neighbors, and 3) the maximum id of itself and its neighbors.

    One useful reference implementation is AdjacencyListGraphReader.

  3. trait GraphReaderFromDirectory[T] extends GraphReader[T]

    A subtrait of GraphReader that reads files of names specified by prefix and containing directory.

  4. trait IntLongSource extends AnyRef

    Represents an arbitrarily large sequence of bytes which can be interpreted as ints or longs.

  5. class LabelsReader extends AnyRef

    Only reads node labels where the key is of type int.

    Only reads node labels where the key is of type int. Label values can be of type int and string.

    ASSUMES that the label files are named as follows: collPrefix_anything_labelName_labelValueType.txt

    So the file name starts with an identifier that marks this collection of labels to be read. And that each line has an id followed by a single space followed by int value of label <id> <labelValue>

  6. class ListOfEdgesGraphReader[T] extends GraphReaderFromDirectory[T]

    Reads in a multi-line list of edges from multiple files in a directory.

    Reads in a multi-line list of edges from multiple files in a directory. Each edge is in its own line and is of the form: source-id<separator>destination-id where separator is a single character.

    One can optionally specify which files in a directory to read. For example, one may have files starting with "part-" that one would like to read, perhaps containing subgraphs of one single graph.

    One can optionally specify two additional operations during reading: - to remove duplicate edges - to sort list of adjacent nodes

    For a default version for Int graphs see ListOfEdgesGraphReader.forIntIds builder method.

    In each file, a directed edges is defined by a pair of T: from and to. For example, we use String ids with (space) separator, when reading file:

    a b
    b d
    d c
    a e
    ...

    In this file, node a has two outgoing edges (to b and e), node b has an outgoing edge to node d and node d has an outgoing edge to node c.

    Note that, it is recommended to use AdjacencyListGraphReader, because of its efficiency.

  7. class MemoryMappedIntLongSource extends IntLongSource

    Wraps a sequence of FileChannels to enable random access on a memory mapped file of arbitrary size.

    Wraps a sequence of FileChannels to enable random access on a memory mapped file of arbitrary size. Motivation: FileChannel.open only supports 2GB at a time.

Value Members

  1. object AdjacencyListGraphReader

  2. object GraphWriter

    Utility class for writing a graph object to a Writer output stream, such that it could be read back in by a GraphReader.

  3. object IoUtils

  4. object ListOfEdgesGraphReader

Ungrouped