Package

com.twitter.summingbird

example

Permalink

package example

The "example" package contains all of the code and configuration necessary to run a basic Summingbird job locally that consumes Tweets from the public streaming API and generates counts of the number of times each word appears per tweet.

Clients can use the code exposed in this example to build realtime versions of word-count dashboards like Google's N-Gram product:

http://books.google.com/ngrams

# Code Structure

## Serialization.scala

defines a number of serialization Injections needed by the Storm and Scalding platforms to ensure that data can move across network boundaries without corruption.

## Storage.scala

Defines a few helper methods that make it easy to instantiate instances of MergeableStore backed by Memcache.

## ExampleJob.scala

The actual Summingbird job, plus a couple of helper implicits (a batcher and a time extractor) necessary for running jobs in combined batch/realtime mode across Scalding and Storm.

## StormRunner.scala

Configuration and Execution of the summingbird word count job in Storm's local mode, plus some advice on how to test that Storm is populating the Memcache store with good counts.

# Have Fun!

Please send any questions on the code you see here to [email protected], or send me a tweet at @sritchie. Once you get this code compiling and running, you'll be cooking with gas!

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. example
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Visibility
  1. Public
  2. All

Value Members

  1. object ExeStorm

    Permalink
  2. object Memcache

    Permalink

    TODO: Delete when https://github.com/twitter/storehaus/pull/121 is merged into Storehaus and Storehaus sees its next release.

    TODO: Delete when https://github.com/twitter/storehaus/pull/121 is merged into Storehaus and Storehaus sees its next release. This pull req will make it easier to create Memcache store instances.

  3. object Serialization

    Permalink

    Serialization is often the most important (and hairy) configuration issue for any system that needs to store its data over the long term.

    Serialization is often the most important (and hairy) configuration issue for any system that needs to store its data over the long term. Summingbird controls serialization through the "Injection" interface.

    By maintaining identical Injections from K and V to Array[Byte], one can guarantee that data written one day will be readable the next. This isn't the case with serialization engines like Kryo, where serialization format depends on unstable parameters, like the serializer registration order for the given Kryo instance.

  4. object StatusStreamer

    Permalink
  5. object StormRunner

    Permalink

    The following object contains code to execute the Summingbird WordCount job defined in ExampleJob.scala on a storm cluster.

Inherited from AnyRef

Inherited from Any

Ungrouped