com.astrolabsoftware.spark3d.spatial3DRDD
Constructor of Point3DRDD
which is suitable for py4j.
Constructor of Point3DRDD
which is suitable for py4j.
It calls Point3DRDDFromV2PythonHelper
instead of Point3DRDDFromV2
.
All args are the same but options
which is a java.util.HashMap
, and
storageLevel
which is removed and set to StorageLevel.MEMORY_ONLY
(user cannot set the storage level in pyspark3d for the moment).
Construct a RDD[Point3D] from whatever data source registered in Spark.
Construct a RDD[Point3D] from whatever data source registered in Spark.
For more information about available official connectors:
https://spark-packages.org/?q=tags%3A%22Data%20Sources%22
We currently include: CSV, JSON, TXT, FITS, ROOT, HDF5, Avro, Parquet...
// Here is an example with a CSV file containing // 3 spherical coordinates columns labeled Z_COSMO,RA,Dec. // Filename val fn = "path/to/file.csv" // Spark datasource val format = "csv" // Options to pass to the DataFrameReader - optional val options = Map("header" -> "true") // Load the data as RDD[Point3D] val rdd = new Point3DRDD(spark, fn, "Z_COSMO,RA,Dec", true, format, options)
: (SparkSession) The spark session
: (String) File name where the data is stored.
: (String) Comma-separated names of (x, y, z) columns. Example: "Z_COSMO,RA,Dec".
: (Boolean) If true, it assumes that the coordinates of the Point3D are (r, theta, phi). Otherwise, it assumes cartesian coordinates (x, y, z).
: (String) The name of the data source as registered in Spark. For example:
: (Map[String, String]) Options to pass to the DataFrameReader. Default is no options.
: (StorageLevel) Storage level for the raw RDD (unpartitioned). Default is StorageLevel.NONE. See https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence for more information.
(RDD[Point3D])
Repartion a RDD[T] according to a custom partitioner.
Repartion a RDD[T] according to a custom partitioner.
: (SpatialPartitioner) Instance of SpatialPartitioner or any extension of it.
(RDD[T]) Repartitioned RDD[T].
RDD containing the initial data formated as T.
RDD containing the initial data formated as T.
Apply a spatial partitioning to this.rawRDD, and return a RDD[T] with the new partitioning.
Apply a spatial partitioning to this.rawRDD, and return a RDD[T] with the new partitioning. The list of available partitioning can be found in utils/GridType. By default, the outgoing level of parallelism is the same as the incoming one (i.e. same number of partitions).
: (GridType) Type of partitioning to apply. See utils/GridType.
: (Int) Number of partitions for the partitioned RDD. By default (-1), the number of partitions is that of the raw RDD. You can force it to be different by setting manually this parameter. Be aware of shuffling though...
(RDD[T]) RDD whose elements are T (Point3D, Sphere, etc...)
Apply any Spatial Partitioner to this.rawRDD[T], and return a RDD[T] with the new partitioning.
Apply any Spatial Partitioner to this.rawRDD[T], and return a RDD[T] with the new partitioning.
: (SpatialPartitioner) Spatial partitioner as defined in utils.GridType
(RDD[T]) RDD whose elements are T (Point3D, Sphere, etc...)