Construct a RDD[Point3D] from whatever data source registered in Spark.
Construct a RDD[Point3D] from whatever data source registered in Spark.
For more information about available official connectors:
https://spark-packages.org/?q=tags%3A%22Data%20Sources%22
That currently includes: CSV, JSON, TXT, FITS, ROOT, HDF5, Avro, Parquet...
// Here is an example with a CSV file containing // 3 spherical coordinates columns labeled Z_COSMO,RA,Dec. // Filename val fn = "path/to/file.csv" // Spark datasource val format = "csv" // Options to pass to the DataFrameReader - optional val options = Map("header" -> "true") // Load the data as RDD[Point3D] val rdd = new Point3DRDD(spark, fn, "Z_COSMO,RA,Dec", true, format, options)
: (SparkSession) The spark session
: (String) File name where the data is stored.
: (String) Comma-separated names of (x, y, z) columns. Example: "Z_COSMO,RA,Dec".
: (Boolean) If true, it assumes that the coordinates of the Point3D are (r, theta, phi). Otherwise, it assumes cartesian coordinates (x, y, z).
: (String) The name of the data source as registered in Spark. For example:
: (Map[String, String]) Options to pass to the DataFrameReader. Default is no options.
(RDD[Point3D])
Point3DRDDFromV2 version suitable for py4j.
Point3DRDDFromV2 version suitable for py4j.
Note that pyspark works with Python wrappers around the *Java* version
of Spark objects, not around the *Scala* version of Spark objects.
Therefore on the Scala side, we trigger the method
Point3DRDDFromV2PythonHelper
which is a modified version of
Point3DRDDFromV2
. The change is that options
on the Scala side
is a java.util.HashMap in order to smoothly connect to dictionary
in
the Python side.
Construct a RDD[ShellEnvelope] from whatever data source registered in Spark.
Construct a RDD[ShellEnvelope] from whatever data source registered in Spark.
For more information about available official connectors:
https://spark-packages.org/?q=tags%3A%22Data%20Sources%22
That currently includes: CSV, JSON, TXT, FITS, ROOT, HDF5, Avro, Parquet...
// Here is an example with a CSV file containing // 3 cartesian coordinates + 1 radius columns labeled x,y,z,radius. // Filename val fn = "path/to/file.csv" // Spark datasource val format = "csv" // Options to pass to the DataFrameReader - optional val options = Map("header" -> "true") // Load the data as RDD[ShellEnvelope] val rdd = new SphereRDD(spark, fn, "x,y,z,radius", true, format, options)
: (SparkSession) The spark session
: (String) File name where the data is stored. Extension must be explicitly written (.cvs, .json, or .txt)
: (String) Comma-separated names of (x, y, z, r) columns to read. Example: "Z_COSMO,RA,Dec,Radius".
: (Boolean) If true, it assumes that the coordinates of the center of the ShellEnvelope are (r, theta, phi). Otherwise, it assumes cartesian coordinates (x, y, z). Default is false.
: (String) The name of the data source as registered in Spark. For example:
: (Map[String, String]) Options to pass to the DataFrameReader. Default is no options.
(RDD[ShellEnvelope])
SphereRDDFromV2 version suitable for py4j.
SphereRDDFromV2 version suitable for py4j.
Note that pyspark works with Python wrappers around the *Java* version
of Spark objects, not around the *Scala* version of Spark objects.
Therefore on the Scala side, we trigger the method
SphereRDDFromV2PythonHelper
which is a modified version of
SphereRDDFromV2
. The change is that options
on the Scala side
is a java.util.HashMap in order to smoothly connect to dictionary
in
the Python side.
Put here routine to load data for a specific data format Currently available: all Spark DataSource V2 compatible format! i.e. CSV, JSON, TXT, Avro, Parquet, FITS, HDF5, ROOT (<= 6.11), ...