Package

com.astrolabsoftware

sparkfits

Permalink

package sparkfits

Visibility
  1. Public
  2. All

Type Members

  1. class DefaultSource extends RelationProvider with SchemaRelationProvider with DataSourceRegister

    Permalink

    FITS source implementation for Spark SQL.

  2. class FitsRecordReader extends RecordReader[LongWritable, Seq[Row]]

    Permalink

    Class to handle the relationship between executors & HDFS when reading a FITS file: File -> InputSplit -> RecordReader (this class) -> Mapper (executors) It extends the abstract class RecordReader from Hadoop.

    Class to handle the relationship between executors & HDFS when reading a FITS file: File -> InputSplit -> RecordReader (this class) -> Mapper (executors) It extends the abstract class RecordReader from Hadoop. The idea behind is to describe the split of the FITS file in block and splits in HDFS. First the file is split into blocks in HDFS (physical blocks), whose size are given by Hadoop configuration (typically 128 MB). Then inside a block, the data is sent to executors record-by-record (logical split) of size < 128 MB. The purpose of this class is to describe the 2nd step, that is the split of blocks in records.

    The data is first read in chunks of binary data, then converted to the correct type element by element, and finally grouped into rows.

  3. class FitsRelation extends BaseRelation with TableScan

    Permalink

    Data Source API implementation for FITS.

    Data Source API implementation for FITS. Note that for the moment, we provide support only for FITS table. We will add FITS image later on.

    The interpreter session below shows how to use basic functionalities:

    scala> val fn = "src/test/resources/test_file.fits"
    scala> val df = spark.read
     .format("com.astrolabsoftware.sparkfits")
     .option("hdu", 1)
     .option("verbose", true)
     .load(fn)
    +------ HEADER (HDU=1) ------+
    XTENSION= BINTABLE           / binary table extension
    BITPIX  =                    8 / array data type
    NAXIS   =                    2 / number of array dimensions
    NAXIS1  =                   34 / length of dimension 1
    NAXIS2  =                20000 / length of dimension 2
    PCOUNT  =                    0 / number of group parameters
    GCOUNT  =                    1 / number of groups
    TFIELDS =                    5 / number of table fields
    TTYPE1  = target
    TFORM1  = 10A
    TTYPE2  = RA
    TFORM2  = E
    TTYPE3  = Dec
    TFORM3  = D
    TTYPE4  = Index
    TFORM4  = K
    TTYPE5  = RunId
    TFORM5  = J
    END
    +----------------------------+
    df: org.apache.spark.sql.DataFrame = [target: string, RA: float ... 3 more fields]
    
    scala> df.printSchema
    root
     |-- target: string (nullable = true)
     |-- RA: float (nullable = true)
     |-- Dec: double (nullable = true)
     |-- Index: long (nullable = true)
     |-- RunId: integer (nullable = true)
    
    scala> df.show(5)
    +----------+---------+--------------------+-----+-----+
    |    target|       RA|                 Dec|Index|RunId|
    +----------+---------+--------------------+-----+-----+
    |NGC0000000| 3.448297| -0.3387486324784641|    0|    1|
    |NGC0000001| 4.493667| -1.4414990980543227|    1|    1|
    |NGC0000002| 3.787274|  1.3298379564211742|    2|    1|
    |NGC0000003| 3.423602|-0.29457151504987844|    3|    1|
    |NGC0000004|2.6619017|  1.3957536426732444|    4|    1|
    +----------+---------+--------------------+-----+-----+
    only showing top 5 rows
  4. class ReadFitsJ extends AnyRef

    Permalink

Value Members

  1. object FitsHdu

    Permalink
  2. object FitsHduBintable

    Permalink

    Contain class and methods to manipulate Bintable HDU.

  3. object FitsHduImage

    Permalink

    Contain class and methods to manipulate Image HDU.

  4. object FitsLib

    Permalink

    This is the beginning of a FITS library in Scala.

    This is the beginning of a FITS library in Scala. You will find a large number of methodes to manipulate Binary Table HDUs. There is no support for image HDU for the moment.

  5. object FitsSchema

    Permalink

    Object to handle the conversion from a HDU header to a DataFrame Schema.

  6. object ReadCSV

    Permalink
  7. object ReadFits

    Permalink
  8. object ReadImage

    Permalink

Ungrouped