FitsContext

Adds a method, fitsFile, to SparkSession that allows reading FITS data. Note that for the moment, we provide support only for FITS table. We will add FITS image later on.

The interpreter session below shows how to use basic functionalities:

scala> val fn = "src/test/resources/test_file.fits"
scala> val df = spark.readfits
 .option("datatype", "table")
 .option("HDU", 1)
 .option("printHDUHeader", true)
 .load(fn)
+------ HEADER (HDU=1) ------+
XTENSION= BINTABLE           / binary table extension
BITPIX  =                    8 / array data type
NAXIS   =                    2 / number of array dimensions
NAXIS1  =                   34 / length of dimension 1
NAXIS2  =                20000 / length of dimension 2
PCOUNT  =                    0 / number of group parameters
GCOUNT  =                    1 / number of groups
TFIELDS =                    5 / number of table fields
TTYPE1  = target
TFORM1  = 10A
TTYPE2  = RA
TFORM2  = E
TTYPE3  = Dec
TFORM3  = D
TTYPE4  = Index
TFORM4  = K
TTYPE5  = RunId
TFORM5  = J
END
+----------------------------+
df: org.apache.spark.sql.DataFrame = [target: string, RA: float ... 3 more fields]

scala> df.printSchema
root
 |-- target: string (nullable = true)
 |-- RA: float (nullable = true)
 |-- Dec: double (nullable = true)
 |-- Index: long (nullable = true)
 |-- RunId: integer (nullable = true)

scala> df.show(5)
+----------+---------+--------------------+-----+-----+
|    target|       RA|                 Dec|Index|RunId|
+----------+---------+--------------------+-----+-----+
|NGC0000000| 3.448297| -0.3387486324784641|    0|    1|
|NGC0000001| 4.493667| -1.4414990980543227|    1|    1|
|NGC0000002| 3.787274|  1.3298379564211742|    2|    1|
|NGC0000003| 3.423602|-0.29457151504987844|    3|    1|
|NGC0000004|2.6619017|  1.3957536426732444|    4|    1|
+----------+---------+--------------------+-----+-----+
only showing top 5 rows

Linear Supertypes

Serializable, Serializable, AnyRef, Any

Instance Constructors

new FitsContext(spark: SparkSession)

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def checkSchema(listOfFitsFiles: List[String]): Unit

Check that the schemas of different FITS files to be added are the same.
Check that the schemas of different FITS files to be added are the same. Throw an AssertionError if not.
listOfFitsFiles
: (List[String]) List of files as a list of String.
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
val conf: Configuration
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getListOfFiles(it: RemoteIterator[LocatedFileStatus], extensions: List[String] = List(".fits")): List[String]

Load recursively all FITS file inside a directory.
Load recursively all FITS file inside a directory.
it
: (RemoteIterator[LocatedFileStatus]) Iterator from a Hadoop Path containing informations about files.
extensions
: (List[String) List of accepted extensions. Currently only .fits is available. Default is List("*.fits").
returns
List of files as a list of String.
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def load(fns: List[String]): DataFrame

Load the HDU data from several FITS file into a single DataFrame.
Load the HDU data from several FITS file into a single DataFrame. The structure of the HDU must be the same, that is contain the same number of columns with the same name and element types. The schema of the DataFrame is directly inferred from the header of the fits HDU.
fns
: (List[String]) List of filenames with the same structure.
returns
(DataFrame) always one single DataFrame made from the HDU of one FITS file, or from the same kind of HDU from several FITS file.
def load(fn: String): DataFrame

Create a DataFrame from the data of one HDU.
Create a DataFrame from the data of one HDU. The input can be either the path to one FITS file (path + filename), or the path to a directory containing FITS files. In the latter, the code will load all FITS files listed inside this directory and make the union of the HDU data. Needless to say that the FITS files must have the same structure, otherwise the union will be impossible. The format of the input must be a String with Hadoop format
- (local) file://path/to/data
- (HDFS) hdfs://<IP>:<PORT>//path/to/data
The schema of the DataFrame is directly inferred from the header of the fits HDU.
fn
: (String) Filename of the fits file to be read, or a directory containing FITS files with the same HDU structure.
returns
(DataFrame) always one single DataFrame made from the HDU of one FITS file, or from the same kind of HDU from several FITS file.
def loadOne(fn: String): DataFrame

Load a BinaryTableHDU data contained in one HDU as a DataFrame.
Load a BinaryTableHDU data contained in one HDU as a DataFrame. The schema of the DataFrame is directly inferred from the header of the fits HDU.
fn
: (String) Path + filename of the fits file to be read.
returns
: DataFrame made from one single HDU.
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def option(key: String, value: Double): FitsContext

Adds an input options for reading the underlying data source.
Adds an input options for reading the underlying data source. (key, Double)
key
: (String) Name of the option
value
: (Double) Value of the option.
def option(key: String, value: Long): FitsContext

Adds an input options for reading the underlying data source.
Adds an input options for reading the underlying data source. (key, Long)
key
: (String) Name of the option
value
: (Long) Value of the option.
def option(key: String, value: Boolean): FitsContext

Adds an input options for reading the underlying data source.
Adds an input options for reading the underlying data source. (key, boolean)
key
: (String) Name of the option
value
: (Boolean) Value of the option.
def option(key: String, value: String): FitsContext

Adds an input options for reading the underlying data source.
Adds an input options for reading the underlying data source.
In general you can set the following option(s): - option("HDU", <Int>) - option("datatype", <String>) - option("printHDUHeader", <Boolean>)
Note that values pass as Boolean, Long, or Double will be first converted to String and then decoded later on.
key
: (String) Name of the option
value
: (String) Value of the option.
def readfits: FitsContext

Replace the current syntax in spark 2.X spark.read.format("fits") --> spark.readfits This is a hack to avoid touching DataFrameReader class, for which the constructor is private...
Replace the current syntax in spark 2.X spark.read.format("fits") --> spark.readfits This is a hack to avoid touching DataFrameReader class, for which the constructor is private... If you have a better idea, bug me!
returns
FitsContext
def schema(schema: StructType): FitsContext

Adds a schema to our data.
Adds a schema to our data. It will overwrite the inferred schema from the HDU header. Useful if the header is corrupted.
schema
: (StructType) The schema for the data (StructType(List(StructField)))
returns
return the FitsContext (to chain operations)
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
var verbosity: Boolean
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package fits

implicit class FitsContext extends Serializable

Instance Constructors

new FitsContext(spark: SparkSession)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def checkSchema(listOfFitsFiles: List[String]): Unit

def clone(): AnyRef

val conf: Configuration

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def getListOfFiles(it: RemoteIterator[LocatedFileStatus], extensions: List[String] = List(".fits")): List[String]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

def load(fns: List[String]): DataFrame

def load(fn: String): DataFrame

def loadOne(fn: String): DataFrame

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def option(key: String, value: Double): FitsContext

def option(key: String, value: Long): FitsContext

def option(key: String, value: Boolean): FitsContext

def option(key: String, value: String): FitsContext

def readfits: FitsContext

def schema(schema: StructType): FitsContext

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

var verbosity: Boolean

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped