com.twitter.summingbird.memory
We can always push all Also nodes all the way to the bottom of the dag MergedProducer(AlsoProducer(t, a), b) == AlsoProducer(t, MergedProducer(a, b))
We can always push all Also nodes all the way to the bottom of the dag MergedProducer(AlsoProducer(t, a), b) == AlsoProducer(t, MergedProducer(a, b))
Unary(l, fn), if l == AlsoProducer(tail, r) can be changed to AlsoProducer(tail, fn(r))
(a.flatMap(f1) ++ a.flatMap(f2)) == a.flatMap { i => f1(i) ++ f2(i) }
(a.flatMap(f1) ++ a.flatMap(f2)) == a.flatMap { i => f1(i) ++ f2(i) }
a.flatMap(fn).flatMap(fn2) can be written as a.flatMap(compose(fn, fn2))
a.flatMap(fn).flatMap(fn2) can be written as a.flatMap(compose(fn, fn2))
Combine flatMaps followed by optionMap into a single operation
Combine flatMaps followed by optionMap into a single operation
On the other direction, you might not want to run optionMap with flatMap since some platforms (storm) can't easily control source parallelism, so we don't want to push big expansions up to sources
If you can't optimize KeyFlatMaps, use this
If you can't optimize KeyFlatMaps, use this
(a ++ b).flatMap(fn) == (a.flatMap(fn) ++ b.flatMap(fn)) (a ++ b).optionMap(fn) == (a.optionMap(fn) ++ b.optionMap(fn)) and since Merge is usually a no-op when combined with a grouping operation, it often pays to get merges as high the graph as possible.
(a ++ b).flatMap(fn) == (a.flatMap(fn) ++ b.flatMap(fn)) (a ++ b).optionMap(fn) == (a.optionMap(fn) ++ b.optionMap(fn)) and since Merge is usually a no-op when combined with a grouping operation, it often pays to get merges as high the graph as possible.
If you don't care to distinguish between optionMap and flatMap, you can use this rule
If you don't care to distinguish between optionMap and flatMap, you can use this rule
Identity keyed producer is just a trick to make scala see methods on keyed types, they have no meaning at runtime.
Identity keyed producer is just a trick to make scala see methods on keyed types, they have no meaning at runtime.
Strip all the names.
Strip all the names. Names are rightly considered as names on the irreducible parts of the input graph (functions, stores, sinks, sources, etc...) and not the AST that we generate and optimize along the way
If you can't optimize ValueFlatMaps, use this
If you can't optimize ValueFlatMaps, use this
Create an ExpressionDag for the given node.
Create an ExpressionDag for the given node. This should be the final tail of the graph. You can apply optimizations on this Dag and then use the Id returned to evaluate it back to an optimized producer
This makes a potentially unsound cast.
This makes a potentially unsound cast. Since this method is only use in converting from an AlsoProducer to a Literal[T, Prod] below, it is not actually dangerous because we always use it in a safe position.
Optimize the given producer according to the rule
Optimize the given producer according to the rule
Convert a Producer graph into a Literal in the Dag rewriter This is where the tedious work comes in.
Convert a Producer graph into a Literal in the Dag rewriter This is where the tedious work comes in.