Keep in mind that this is not a thread-safe implementation.
FileFormat implementation to read OSM Pbf files.
java.io.InputStream
enricher class that adds the firstBlockIndex
functionality.
java.io.InputStream
enricher class that adds the firstBlockIndex
functionality.
The idea is that every BlobHeader
s starts with the same pattern
. So to be able to find the first BlobHeader
in
a chunk file, we are looking for that pattern
.
Just before every BlobHeader
, there is a set of 4 bytes that contains the size of the next BlobHeader
, but of course
it is not a fixed value like the pattern
. It is necessary to keep it in mind to ignore these 4 bytes.
So this is the structure of every block that contains data:
FileFormat implementation to read OSM Pbf files.
Basically, it is implemented in three steps:
Its usage is like other Spark connectors:
Dataframe schema used is: