it.agilelab.bigdata.wasp.consumers.spark.strategies.gdpr.hdfs
Deletes the keys to delete stored in config.keysToDelete
from the files filesToFilter
.
Deletes the keys to delete stored in config.keysToDelete
from the files filesToFilter
.
All files to filter are read and merged into a DataFrame, and then only the rows that do not match the
RawMatchingStrategy and PartitionPruningStrategy defined in config are written into a staging directory.
Once the new files are correctly written into the staging directory, the original filesToFilter
are deleted,
and the newly written files are moved to the data directory.