Class

com.sparkfits

FitsRecordReader

Related Doc: package sparkfits

Permalink

class FitsRecordReader extends RecordReader[LongWritable, Seq[Row]]

Class to handle the relationship between executors & HDFS when reading a FITS file: File -> InputSplit -> RecordReader (this class) -> Mapper (executors) It extends the abstract class RecordReader from Hadoop. The idea behind is to describe the split of the FITS file in block and splits in HDFS. First the file is split into blocks in HDFS (physical blocks), whose size are given by Hadoop configuration (typically 128 MB). Then inside a block, the data is sent to executors record-by-record (logical split) of size < 128 MB. The purpose of this class is to describe the 2nd step, that is the split of blocks in records.

The data is first read in chunks of binary data, then converted to the correct type element by element, and finally grouped into rows.

Linear Supertypes
RecordReader[LongWritable, Seq[Row]], Closeable, AutoCloseable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FitsRecordReader
  2. RecordReader
  3. Closeable
  4. AutoCloseable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new FitsRecordReader()

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def close(): Unit

    Permalink

    Close the file after reading it.

    Close the file after reading it.

    Definition Classes
    FitsRecordReader → RecordReader → Closeable → AutoCloseable
  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  11. def getCurrentKey(): LongWritable

    Permalink

    Get the current Key.

    Get the current Key.

    returns

    (LongWritable) key.

    Definition Classes
    FitsRecordReader → RecordReader
  12. def getCurrentValue(): Seq[Row]

    Permalink

    Get the current Value.

    Get the current Value.

    returns

    (Seq[Row]) Value is a list of heterogeneous lists. It will be converted to List[Row] later.

    Definition Classes
    FitsRecordReader → RecordReader
  13. def getProgress(): Float

    Permalink

    Fancy way of getting a process bar.

    Fancy way of getting a process bar. Useful to know whether you have time for a coffee and a cigarette before the next run.

    returns

    (Float) progression inside a block.

    Definition Classes
    FitsRecordReader → RecordReader
  14. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  15. def initialize(inputSplit: InputSplit, context: TaskAttemptContext): Unit

    Permalink

    Here an executor will come and ask for a block of data by calling initialize().

    Here an executor will come and ask for a block of data by calling initialize(). Hadoop will split the data into records and those records will be sent. One needs then to know: the data file, the starting index of a split (byte index), the size of one record of data (byte), the ending index of a split (byte).

    Typically, a record must not be bigger than 1MB for the process to be efficient. Otherwise you will have a lot of Garbage collector call!

    inputSplit

    : (InputSplit) Represents the data to be processed by an individual Mapper.

    context

    : (TaskAttemptContext) Currently active context to access contextual information about running tasks.

    returns

    (Long) the current position of the pointer cursor in the file.

    Definition Classes
    FitsRecordReader → RecordReader
  16. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  17. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  18. def nextKeyValue(): Boolean

    Permalink

    Here you describe how the records are made, and the split data sent.

    Here you describe how the records are made, and the split data sent.

    returns

    (Boolean) true if the Mapper did not reach the end of the split. false otherwise.

    Definition Classes
    FitsRecordReader → RecordReader
  19. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  21. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  22. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  23. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from RecordReader[LongWritable, Seq[Row]]

Inherited from Closeable

Inherited from AutoCloseable

Inherited from AnyRef

Inherited from Any

Ungrouped