Create the dataframe with its associated format
Create the dataframe with its associated format
: created dataset without specifying format
: file path
Get domain directory name
Get domain directory name
: file path
the domain directory name
Get format file by using the first and the last line of the dataset We use mapPartitionsWithIndex to retrieve these informations to make sure that the first line really corresponds to the first line (same for the last)
Get format file by using the first and the last line of the dataset We use mapPartitionsWithIndex to retrieve these informations to make sure that the first line really corresponds to the first line (same for the last)
: created dataset without specifying format
Get schema pattern
Get schema pattern
: file path
the schema pattern
Get separator file by taking the character that appears the most in 10 lines of the dataset
Get separator file by taking the character that appears the most in 10 lines of the dataset
: created dataset without specifying format
the file separator
Just to force any spark job to implement its entry point using within the "run" method
Just to force any spark job to implement its entry point using within the "run" method
: Spark Session used for the job
Read file without specifying the format
Read file without specifying the format
: file path
a dataset of string that contains data file
* Infers the schema of a given datapath, domain name, schema name.