This function is not type-safe for others to call, but it should never have an error.
This function is not type-safe for others to call, but it should never have an error. By construction, we never call it with incorrect types. It would be preferable to have stronger type safety here, but unclear how to achieve, and since it is an internal function, not clear it would actually help anyone for it to be type-safe
Smaller is about average values/key not total size (that does not matter, but is clearly related).
Smaller is about average values/key not total size (that does not matter, but is clearly related).
Note that from the type signature we see that the right side is iterated (or may be) over and over, but the left side is not. That means that you want the side with fewer values per key on the right. If both sides are similar, no need to worry. If one side is a one-to-one mapping, that should be the "smaller" side.
This fully replicates this entire Grouped to the argument: mapside.
This fully replicates this entire Grouped to the argument: mapside. This means that we never see the case where the key is absent in the pipe. This means implementing a right-join (from the pipe) is impossible. Note, there is no reduce-phase in this operation. The next issue is that obviously, unlike a cogroup, for a fixed key, each joiner will NOT See all the tuples with those keys. This is because the keys on the left are distributed across many machines See hashjoin: http://docs.cascading.org/cascading/2.0/javadoc/cascading/pipe/HashJoin.html
A HashJoinable has a single input into to the cogroup
A HashJoinable has a single input into to the cogroup
If we can HashJoin, then we can CoGroup, but not vice-versa i.e., HashJoinable is a strict subset of CoGroupable (CoGrouped, for instance is CoGroupable, but not HashJoinable).