FITS source implementation for Spark SQL.
Class to handle the relationship between executors & HDFS when reading a FITS file: File -> InputSplit -> RecordReader (this class) -> Mapper (executors) It extends the abstract class RecordReader from Hadoop.
Data Source API implementation for FITS.
Data Source API implementation for FITS. Note that for the moment, we provide support only for FITS table. We will add FITS image later on.
The interpreter session below shows how to use basic functionalities:
scala> val fn = "src/test/resources/test_file.fits" scala> val df = spark.read .format("com.astrolabsoftware.sparkfits") .option("hdu", 1) .option("verbose", true) .load(fn) +------ HEADER (HDU=1) ------+ XTENSION= BINTABLE / binary table extension BITPIX = 8 / array data type NAXIS = 2 / number of array dimensions NAXIS1 = 34 / length of dimension 1 NAXIS2 = 20000 / length of dimension 2 PCOUNT = 0 / number of group parameters GCOUNT = 1 / number of groups TFIELDS = 5 / number of table fields TTYPE1 = target TFORM1 = 10A TTYPE2 = RA TFORM2 = E TTYPE3 = Dec TFORM3 = D TTYPE4 = Index TFORM4 = K TTYPE5 = RunId TFORM5 = J END +----------------------------+ df: org.apache.spark.sql.DataFrame = [target: string, RA: float ... 3 more fields] scala> df.printSchema root |-- target: string (nullable = true) |-- RA: float (nullable = true) |-- Dec: double (nullable = true) |-- Index: long (nullable = true) |-- RunId: integer (nullable = true) scala> df.show(5) +----------+---------+--------------------+-----+-----+ | target| RA| Dec|Index|RunId| +----------+---------+--------------------+-----+-----+ |NGC0000000| 3.448297| -0.3387486324784641| 0| 1| |NGC0000001| 4.493667| -1.4414990980543227| 1| 1| |NGC0000002| 3.787274| 1.3298379564211742| 2| 1| |NGC0000003| 3.423602|-0.29457151504987844| 3| 1| |NGC0000004|2.6619017| 1.3957536426732444| 4| 1| +----------+---------+--------------------+-----+-----+ only showing top 5 rows
Contain class and methods to manipulate Bintable HDU.
Contain class and methods to manipulate Image HDU.
This is the beginning of a FITS library in Scala.
This is the beginning of a FITS library in Scala. You will find a large number of methodes to manipulate Binary Table HDUs. There is no support for image HDU for the moment.
Object to handle the conversion from a HDU header to a DataFrame Schema.
Class to handle the relationship between executors & HDFS when reading a FITS file: File -> InputSplit -> RecordReader (this class) -> Mapper (executors) It extends the abstract class RecordReader from Hadoop. The idea behind is to describe the split of the FITS file in block and splits in HDFS. First the file is split into blocks in HDFS (physical blocks), whose size are given by Hadoop configuration (typically 128 MB). Then inside a block, the data is sent to executors record-by-record (logical split) of size < 128 MB. The purpose of this class is to describe the 2nd step, that is the split of blocks in records.
The data is first read in chunks of binary data, then converted to the correct type element by element, and finally grouped into rows.