Package

com.acervera.osm4scala

spark

Permalink

package spark

Visibility
  1. Public
  2. All

Type Members

  1. class InputStreamLengthLimit extends FilterInputStream with InputStreamSentinel

    Permalink

    Keep in mind that this is not a thread-safe implementation.

  2. class OsmPbfFormat extends FileFormat with DataSourceRegister

    Permalink

    FileFormat implementation to read OSM Pbf files.

    FileFormat implementation to read OSM Pbf files.

    Basically, it is implemented in three steps:

    1. Take every split file
    2. Search first block of data per split. 2. Extract all entities that are present in the split, starting from the first block found and ending at the last block whose header is in the split.
    Example:
    1. Its usage is like other Spark connectors:

      scala> spark.read.format("osm.pbf").load("").select("id","latitude","longitude","tags").filter("type == 0 and size(tags) > 0").show(false)
      +------+------------------+------------------+-------------------------------------------------------+
      |id    |latitude          |longitude         |tags                                                   |
      +------+------------------+------------------+-------------------------------------------------------+
      |272629|39.01344329999999 |-77.03783400000007|{ref -> 31, highway -> motorway_junction}              |
      |278454|38.99543909999999 |-76.87885730000008|{noref -> yes, highway -> motorway_junction}           |
      |278495|39.09727710000001 |-76.8004246000001 |{noref -> yes, highway -> motorway_junction}           |
      |278499|39.10949280000001 |-76.7842974000001 |{noref -> yes, highway -> motorway_junction}           |
      |278665|39.13675140000001 |-76.75700080000009|{highway -> motorway_junction, noref -> yes}           |
      |278679|39.16433720000001 |-76.73629450000011|{noref -> yes, highway -> motorway_junction}           |
      |278702|39.20996720000001 |-76.68302190000011|{noref -> yes, highway -> motorway_junction}           |
      |281260|39.047928000000006|-77.15067590000012|{noref -> yes, highway -> motorway_junction}           |
      |281323|39.1811582        |-77.2515329000001 |{ref -> 15A, highway -> motorway_junction}             |
      |281359|39.152438         |-77.29614050000009|{highway -> traffic_signals}                           |
      |287905|38.843457699999995|-77.1106591000001 |{highway -> traffic_signals, traffic_signals -> signal}|
      |287913|38.85178319999999 |-77.13165830000011|{highway -> traffic_signals}                           |
      |287943|38.876419799999994|-77.05647270000011|{curve_geometry -> yes}                                |
      |390841|38.41534949999998 |-77.42648520000014|{ref -> 140, highway -> motorway_junction}             |
      |390920|38.333087899999974|-77.49828340000015|{ref -> 133, highway -> motorway_junction}             |
      |390955|38.29694369999998 |-77.50489810000018|{ref -> 130B, highway -> motorway_junction}            |
      |391002|38.24086209999997 |-77.50026980000017|{ref -> 126B, highway -> motorway_junction}            |
      |396346|37.97631839999997 |-77.49251700000009|{noref -> yes, highway -> motorway_junction}           |
      |396542|37.93273709999998 |-77.46779110000014|{ref -> 104, highway -> motorway_junction}             |
      |396693|37.84187889999999 |-77.45102540000016|{ref -> 98, highway -> motorway_junction}              |
      +------+------------------+------------------+-------------------------------------------------------+
      only showing top 20 rows
    Note

    Dataframe schema used is:

    root
     |-- id: long (nullable = true)
     |-- type: byte (nullable = true)
     |-- latitude: double (nullable = true)
     |-- longitude: double (nullable = true)
     |-- nodes: array (nullable = true)
     |    |-- element: long (containsNull = true)
     |-- relations: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- id: long (nullable = true)
     |    |    |-- relationType: byte (nullable = true)
     |    |    |-- role: string (nullable = true)
     |-- tags: map (nullable = true)
     |    |-- key: string
     |    |-- value: string (valueContainsNull = true)
     |-- info: struct (nullable = true)
     |    |-- version: integer (nullable = true)
     |    |-- timestamp: timestamp (nullable = true)
     |    |-- changeset: long (nullable = true)
     |    |-- userId: integer (nullable = true)
     |    |-- userName: string (nullable = true)
     |    |-- visible: boolean (nullable = true)
  3. class OsmPbfRowIterator extends Iterator[InternalRow]

    Permalink

Value Members

  1. object OSMDataFinder

    Permalink

    java.io.InputStream enricher class that adds the firstBlockIndex functionality.

    java.io.InputStream enricher class that adds the firstBlockIndex functionality.

    The idea is that every BlobHeaders starts with the same pattern. So to be able to find the first BlobHeader in a chunk file, we are looking for that pattern.

    Just before every BlobHeader, there is a set of 4 bytes that contains the size of the next BlobHeader, but of course it is not a fixed value like the pattern. It is necessary to keep it in mind to ignore these 4 bytes.

    So this is the structure of every block that contains data:

    1. 4 bytes with header size.
    2. 9 bytes with the constant "0x0A, 0x07, OSMData"
    3. Rest of the header.
    4. Blob containing data.
    See also

    OSM PBF Format documentation

  2. object OsmPbfFormat

    Permalink
  3. object OsmPbfRowIterator

    Permalink
  4. object OsmSqlEntity

    Permalink

Ungrouped