Method to break input dataframe via unique values of fileColumnName colume into multiple dataframes and persist each dataframe into its corresponding output file.
Method to collect values for columnList columns from filterSourceDataFrame and pass it to caller DataFrame to filter out values in caller DataFrame.
Method which implements logic of Compare Records abinitio component.
Method which implements logic of Compare Records abinitio component. Its functioning is as explained below
1. It takes join of both input dataframes via adding incremental sequence number and takes join on this sequence number. 2. It compares all records of both input dataframes and finds count of mismatching records. 3. If mismatch record count is more than limit than it throws error to terminate workflow execution. Otherwise it returns dataframe with mismatch count report.
Method for Deduplicate operation when rows to be kept in each group of rows to be either first, Last or unique-only.
Method for Deduplicate operation when rows to be kept in each group of rows to be either first, Last or unique-only. It does first groupBy on all passed groupByColumns and then depending on typeToKeep value it does further operations.
For both first and last option, it adds new temporary row_number column which returns the row number within a group of rows grouped by groupByColumns. Then to find first records it simply filters out all rows with row_number as 1. To find last records within each group it also computes the count value for each group and filters out all the records where row_number is same as group count
For unique-only case it adds new temporary count column which returns the count of rows within a window partition. Then it filters the resultant dataframe with count value 1.
option to find kind of rows. Possible values are first, last and unique-only
columns to be used to group input records.
DataFrame with first or last or unique-only records in each grouping of input records.
Method to generate abinitio log output for any component.
Method to generate abinitio log output for any component. This method takes as input array of non-standard events which are emitted by workflow component and serializes these events into separate row. This method will also add start and finish events with adding count information with finish event.
Method to read passed dataframe fileNameDF and read the content of filenames passed in this dataframe.
Method to read passed dataframe fileNameDF and read the content of filenames passed in this dataframe. It will also merge the fileName column and unique sequence id in the final generated dataframe with file content for all passed fileNames.
Finally it joins the dataframe with content of file and dataframe corresponding to input dataframe and returns the joined dataframe.
Method to take pivot on passed pivot columns.
Method to take pivot on passed pivot columns. This method splits records by pivot columns, converting each input record into a series of separate output records. There is one separate output record for each field of data in the original input record which is not in pivot list. Each output record contains the name and value of a single data field from the original input record along with pivot columns.
Method to take care of abinitio normalize functionality.
Method to take care of abinitio normalize functionality. It first replicates input dataframe rows, muliple times depending on passed lengthExpression or finishedExpression. LengthExpression evaluates to a number and will replicate each row in input data by this number.
FinishedExpression and finishedCondition are used to apply filter condition on input data and use this condition result to duplicate each input row multiple times.
tempWindowExpr is used to evaluate temp variables for Normalize with Temp case, using window functions. These expressions are then used in computation of final value for normalize output.
expression which evaluates to a integer value, used to duplicate input records.
expression to be used in filterCondition during its evaluation for duplication of records. return finishedCondition condition to be used to duplicate input records till condition result is false.
to be used to rename finishedExpressions
columns to be selected after normalize operations.
window expressions to compute value of temp variables.
final normalize output for both with Temp and without Temp case.
Method to read textual data from inputColumn and split it into multiple records via recordSeparator and then further split each record into multiple columns via fieldSeparator.
Method to read textual data from inputColumn and split it into multiple records via recordSeparator and then further split each record into multiple columns via fieldSeparator. Then finally map the resultant data to output columns passed.
Method to sync column names in dataframe with column names passed as input.
Method to take union of current dataframe with passed otherDataFrame.
Method to take union of current dataframe with passed otherDataFrame. This method also rearranges the columns ot otherDataFrame in the same order as of current dataFrame columns
Adds a column with defined value, if it doesn't exist.
Adds a column with defined value, if it doesn't exist.
Column's name
New column's value
DataFrame with a new column if it doesn't exist already
Method to add new unique sequence column in dataframe where value in each row is incremented by incrementBy
value and sequence starts with startValue.
(Since version ) see corresponding Javadoc for more information.