FitsRecordReader

Class to handle the relationship between executors & HDFS when reading a FITS file: File -> InputSplit -> RecordReader (this class) -> Mapper (executors) It extends the abstract class RecordReader from Hadoop. The idea behind is to describe the split of the FITS file in block and splits in HDFS. First the file is split into blocks in HDFS (physical blocks), whose size are given by Hadoop configuration (typically 128 MB). Then inside a block, the data is sent to executors record-by-record (logical split) of size < 128 MB. The purpose of this class is to describe the 2nd step, that is the split of blocks in records.

The data is first read in chunks of binary data, then converted to the correct type element by element, and finally grouped into rows.

Linear Supertypes

RecordReader[LongWritable, Seq[Row]], Closeable, AutoCloseable, AnyRef, Any

Instance Constructors

new FitsRecordReader()

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def close(): Unit

Close the file after reading it.
Close the file after reading it.

Definition Classes
FitsRecordReader → RecordReader → Closeable → AutoCloseable
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getCurrentKey(): LongWritable

Get the current Key.
Get the current Key.
returns
(LongWritable) key.

Definition Classes
FitsRecordReader → RecordReader
def getCurrentValue(): Seq[Row]

Get the current Value.
Get the current Value.
returns
(Seq[Row]) Value is a list of heterogeneous lists. It will be converted to List[Row] later.

Definition Classes
FitsRecordReader → RecordReader
def getProgress(): Float

Fancy way of getting a process bar.
Fancy way of getting a process bar. Useful to know whether you have time for a coffee and a cigarette before the next run.
returns
(Float) progression inside a block.

Definition Classes
FitsRecordReader → RecordReader
def hashCode(): Int

Definition Classes
AnyRef → Any
def initialize(inputSplit: InputSplit, context: TaskAttemptContext): Unit

Here an executor will come and ask for a block of data by calling initialize().
Here an executor will come and ask for a block of data by calling initialize(). Hadoop will split the data into records and those records will be sent. One needs then to know: the data file, the starting index of a split (byte index), the size of one record of data (byte), the ending index of a split (byte).
Typically, a record must not be bigger than 1MB for the process to be efficient. Otherwise you will have a lot of Garbage collector call!
inputSplit
: (InputSplit) Represents the data to be processed by an individual Mapper.
context
: (TaskAttemptContext) Currently active context to access contextual information about running tasks.
returns
(Long) the current position of the pointer cursor in the file.

Definition Classes
FitsRecordReader → RecordReader
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def nextKeyValue(): Boolean

Here you describe how the records are made, and the split data sent.
Here you describe how the records are made, and the split data sent.
returns
(Boolean) true if the Mapper did not reach the end of the split. false otherwise.

Definition Classes
FitsRecordReader → RecordReader
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package sparkfits

class FitsRecordReader extends RecordReader[LongWritable, Seq[Row]]

Instance Constructors

new FitsRecordReader()

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def close(): Unit

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def getCurrentKey(): LongWritable

def getCurrentValue(): Seq[Row]

def getProgress(): Float

def hashCode(): Int

def initialize(inputSplit: InputSplit, context: TaskAttemptContext): Unit

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

def nextKeyValue(): Boolean

final def notify(): Unit

final def notifyAll(): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from RecordReader[LongWritable, Seq[Row]]

Inherited from Closeable

Inherited from AutoCloseable

Inherited from AnyRef

Inherited from Any

Ungrouped