Create the dataframe with its associated format
Create the dataframe with its associated format
: list of lines read from file
Get domain directory name
Get domain directory name
: file path
the domain directory name
Get format file by using the first and the last line of the dataset We use mapPartitionsWithIndex to retrieve these information to make sure that the first line really corresponds to the first line (same for the last)
Get format file by using the first and the last line of the dataset We use mapPartitionsWithIndex to retrieve these information to make sure that the first line really corresponds to the first line (same for the last)
: list of lines read from file
Get schema pattern
Get schema pattern
: file path
the schema pattern
Get separator file by taking the character that appears the most in 10 lines of the dataset
Get separator file by taking the character that appears the most in 10 lines of the dataset
: list of lines read from file
the file separator
Just to force any spark job to implement its entry point using within the "run" method
Just to force any spark job to implement its entry point using within the "run" method
: Spark Session used for the job
Read file without specifying the format
Read file without specifying the format
: file path
a dataset of string that contains data file
* Infers the schema of a given datapath, domain name, schema name.