Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test modes, so you should invoke this method for test modes unless your Source has some special handling of testing.
If you want to filter, you should use this and output a 0 or 1 length Iterable.
If you want to filter, you should use this and output a 0 or 1 length Iterable. Filter does not change column names, and we generally expect to change columns here
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. Typical use might be to read in Job.next to determine if another job is needed
write the pipe and return the input so it can be chained into the next operation
write the pipe and return the input so it can be chained into the next operation
Allows working with an iterable object defined in the job (on the submitter) to be used within a Job as you would a Pipe/RichPipe
These lists should probably be very tiny by Hadoop standards. If they are getting large, you should probably dump them to HDFS and use the normal mechanisms to address the data (a FileSource).