it.agilelab.bigdata.wasp.consumers.spark.strategies.gdpr.hdfs
Create a new directory inside backupParentDir
, called "backup_{randomUUID}".
Create a new directory inside backupParentDir
, called "backup_{randomUUID}".
Each of the files inside filesToBackup
will be copied in this directory, also maintaining
the eventual HDFS partitioning. The new file path is created by removing the base directory
(that is dataPath
) from the file path, and replacing it with the path of the backup directory.
Example:
filesToBackup
= ["/user/data/p1=a/p2=b/file.parquet"]
backupParentDir
= "/user"
dataPath
= "/user/data"
backupDir
= "/user/backup_123"Files that should be copied in the backup directory
Path of the newly created backup directory
Delete the entire backup directory
Delete the entire backup directory
Path of the backup directory to delete
Restores files backup inside backupPath
to dataPath
.
Restores files backup inside backupPath
to dataPath
.
Each of the files listed recursively inside backupPath
is "moved back" to the data directory,
by replacing the prefix of the backup directory with the prefix of the data directory.
Example:
backupPath
= "/user/backup_123"
backupParentDir
= "/user"
dataPath
= "/user/data"
- Listed backup file "/user/backup_123/p1=a/p2=b/file.parquet" - this file is renamed by replacing the prefix "/user/backup_123" with "/user/data" "/user/data/p1=a/p2=b/file.parquet"
Path of the backup directory containing the files to restore
Handler for the backup operations of HDFS files