sparkfits

Type Members

class DefaultSource extends RelationProvider with SchemaRelationProvider with DataSourceRegister

FITS source implementation for Spark SQL.
class FitsRecordReader extends RecordReader[LongWritable, Seq[Row]]

Class to handle the relationship between executors & HDFS when reading a FITS file: File -> InputSplit -> RecordReader (this class) -> Mapper (executors) It extends the abstract class RecordReader from Hadoop.
Class to handle the relationship between executors & HDFS when reading a FITS file: File -> InputSplit -> RecordReader (this class) -> Mapper (executors) It extends the abstract class RecordReader from Hadoop. The idea behind is to describe the split of the FITS file in block and splits in HDFS. First the file is split into blocks in HDFS (physical blocks), whose size are given by Hadoop configuration (typically 128 MB). Then inside a block, the data is sent to executors record-by-record (logical split) of size < 128 MB. The purpose of this class is to describe the 2nd step, that is the split of blocks in records.
The data is first read in chunks of binary data, then converted to the correct type element by element, and finally grouped into rows.

class FitsRelation extends BaseRelation with TableScan

Data Source API implementation for FITS.

Data Source API implementation for FITS. Note that for the moment, we provide support only for FITS table. We will add FITS image later on.

The interpreter session below shows how to use basic functionalities:

scala> val fn = "src/test/resources/test_file.fits"
scala> val df = spark.read
 .format("com.astrolabsoftware.sparkfits")
 .option("hdu", 1)
 .option("verbose", true)
 .load(fn)
+------ HEADER (HDU=1) ------+
XTENSION= BINTABLE           / binary table extension
BITPIX  =                    8 / array data type
NAXIS   =                    2 / number of array dimensions
NAXIS1  =                   34 / length of dimension 1
NAXIS2  =                20000 / length of dimension 2
PCOUNT  =                    0 / number of group parameters
GCOUNT  =                    1 / number of groups
TFIELDS =                    5 / number of table fields
TTYPE1  = target
TFORM1  = 10A
TTYPE2  = RA
TFORM2  = E
TTYPE3  = Dec
TFORM3  = D
TTYPE4  = Index
TFORM4  = K
TTYPE5  = RunId
TFORM5  = J
END
+----------------------------+
df: org.apache.spark.sql.DataFrame = [target: string, RA: float ... 3 more fields]

scala> df.printSchema
root
 |-- target: string (nullable = true)
 |-- RA: float (nullable = true)
 |-- Dec: double (nullable = true)
 |-- Index: long (nullable = true)
 |-- RunId: integer (nullable = true)

scala> df.show(5)
+----------+---------+--------------------+-----+-----+
|    target|       RA|                 Dec|Index|RunId|
+----------+---------+--------------------+-----+-----+
|NGC0000000| 3.448297| -0.3387486324784641|    0|    1|
|NGC0000001| 4.493667| -1.4414990980543227|    1|    1|
|NGC0000002| 3.787274|  1.3298379564211742|    2|    1|
|NGC0000003| 3.423602|-0.29457151504987844|    3|    1|
|NGC0000004|2.6619017|  1.3957536426732444|    4|    1|
+----------+---------+--------------------+-----+-----+
only showing top 5 rows

class ReadFitsJ extends AnyRef

Value Members

object FitsHdu
object FitsHduBintable

Contain class and methods to manipulate Bintable HDU.
object FitsHduImage

Contain class and methods to manipulate Image HDU.
object FitsLib

This is the beginning of a FITS library in Scala.
This is the beginning of a FITS library in Scala. You will find a large number of methodes to manipulate Binary Table HDUs. There is no support for image HDU for the moment.
object FitsSchema

Object to handle the conversion from a HDU header to a DataFrame Schema.
object ReadCSV
object ReadFits
object ReadImage

package sparkfits

Type Members

class DefaultSource extends RelationProvider with SchemaRelationProvider with DataSourceRegister

class FitsRecordReader extends RecordReader[LongWritable, Seq[Row]]

class FitsRelation extends BaseRelation with TableScan

class ReadFitsJ extends AnyRef

Value Members

object FitsHdu

object FitsHduBintable

object FitsHduImage

object FitsLib

object FitsSchema

object ReadCSV

object ReadFits

object ReadImage

Ungrouped