Returns a pipeline which performs a bagging based sub-sampling of an Apache Spark RDD of T.
Returns a pipeline which performs a bagging based sub-sampling of an Apache Spark RDD of T.
The sampling proportion between 0 and 1
The number of bags to generate.
Returns a pipeline which performs a bagging based sub-sampling of a stream of T.
Returns a pipeline which performs a bagging based sub-sampling of a stream of T.
The sampling proportion between 0 and 1
The number of bags to generate.
Creates an Encoder which replicates a DenseVector instance n times.
Creates an Encoder which can split DenseVector instances into uniform splits and put them back together.
Returns a pipe which takes a data set and calculates the mean and standard deviation of each dimension.
Returns a pipe which takes a data set and calculates the mean and standard deviation of each dimension.
Set to true if one wants the standardized data and false if one does wants the original data with the GaussianScaler instances.
Multivariate version of calculateGaussianScales
Multivariate version of calculateGaussianScales
Set to true if one wants the standardized data and false if one does wants the original data with the MVGaussianScaler instances.
Returns a pipe which takes a data set and mean centers it.
Returns a pipe which takes a data set and mean centers it.
Set to true if one wants the standardized data and false if one does wants the original data with the MeanScaler instances.
Returns a pipe which takes a data set and calculates the minimum and maximum of each dimension.
Returns a pipe which takes a data set and calculates the minimum and maximum of each dimension.
Set to true if one wants the standardized data and false if one does wants the original data with the MinMaxScaler instances.
Returns a pipe which performs PCA on data features and gaussian scaling on data targets
Returns a pipe which performs PCA on data features and gaussian scaling on data targets
Set to true if one wants the standardized data and false if one does wants the original data with the MVGaussianScaler instances.
Returns a pipe which performs PCA on data features and gaussian scaling on data targets
Returns a pipe which performs PCA on data features and gaussian scaling on data targets
Set to true if one wants the standardized data and false if one does wants the original data with the MVGaussianScaler instances.
Read a csv text file and store it in a R data frame.
Read a csv text file and store it in a R data frame.
The name of the data frame variable
Separation character in the csv file
A DataPipe instance which takes as input a file name and returns a renjin ListVector instance and stores data frame in the variable nameed as df.
Inorder to generate features for auto-regressive models, one needs to construct sliding windows in time.
Inorder to generate features for auto-regressive models, one needs to construct sliding windows in time. This function takes two parameters
deltaT: the auto-regressive order timelag: the time lag after which the windowing is conducted.
E.g
Let deltaT = 2 and timelag = 1
This pipe will take stream data of the form (t, Value_t)
and output a stream which looks like
(t, Vector(Value_t-2, Value_t-3))
The vector ARX version of DynaMLPipe.deltaOperation
The vector version of DynaMLPipe.deltaOperation
Drop the first element of a Stream of String
Takes a base pipe and creates a parallel pipe by duplicating it.
Takes a base pipe and creates a parallel pipe by duplicating it.
The base data pipe
a io.github.mandar2812.dynaml.pipes.ParallelPipe object.
This pipe assumes its input to be of the form "YYYY,Day,Hour,Value"
This pipe assumes its input to be of the form "YYYY,Day,Hour,Value"
It takes as input a function (TFunc) which converts a Tuple3 into a single "timestamp" like value.
The pipe processes its data source line by line and outputs a Tuple2 in the following format.
(Timestamp,Value)
Usage: DynaMLPipe.extractTimeSeries(TFunc)
This pipe is exactly similar to DynaMLPipe.extractTimeSeries, with one key difference, it returns a Tuple2 like (Timestamp, FeatureVector), where FeatureVector is a Vector of values.
Extract a subset of columns from a Stream of comma separated String also replace any missing value strings with the empty string.
Extract a subset of columns from a Stream of comma separated String also replace any missing value strings with the empty string.
Usage: DynaMLPipe.extractTrainingFeatures(List(1,2,3), Map(1 -> "N.A.", 2 -> "NA", 3 -> "na"))
Data pipe which takes a file name/path as a String and returns a Stream of String.
Scale a data set which is stored as a Stream, return the scaled data as well as a GaussianScaler instance which can be used to reverse the scaled values to the original data.
Perform gaussian normalization on a data stream which is a Tuple2 of the form.
Perform gaussian normalization on a data stream which is a Tuple2 of the form.
(Stream(training data), Stream(test data))
Constructs a data pipe which performs discrete Haar wavelet transform on a (breeze) vector signal.
A trivial identity data pipe
Constructs a data pipe which performs inverse discrete Haar wavelet transform on a (breeze) vector signal.
Scale a data set which is stored as a Stream, return the scaled data as well as a MinMaxScaler instance which can be used to reverse the scaled values to the original data.
Perform [0,1] scaling on a data stream which is a Tuple2 of the form.
Perform [0,1] scaling on a data stream which is a Tuple2 of the form.
(Stream(training data), Stream(test data))
Scale a data set which is stored as a Stream, return the scaled data as well as a MVGaussianScaler instance which can be used to reverse the scaled values to the original data.
Scale a data set which is stored as a Stream, return the scaled data as well as a MVGaussianScaler instance which can be used to reverse the scaled values to the original data.
Generate a numeric range by dividing an interval into bins.
Transform a data set by performing PCA on its patterns.
Transform a data set consisting of features and targets.
Transform a data set consisting of features and targets. Perform PCA scaling of features and gaussian scaling of targets.
Create a linear model from a R data frame.
Create a linear model from a R data frame.
The name of the variable to store model
The name of the target variable
A list of names denoting input variables
A DataPipe which takes as input data frame variable name and returns a ListVector containing linear model attributes. Also stores the model in the variable given by modelName in the ongoing R session.
From a Stream of String remove all records which contain missing values, this pipe should be applied after the application of DynaMLPipe.extractTrainingFeatures.
Data pipe to replace all occurrences of a regular expression or string in a Stream of String with with a specified replacement string.
Data pipe to replace all white spaces in a Stream of String with the comma character.
Take each line which is a comma separated string and extract all but the last element into a feature vector and leave the last element as the "target" value.
Take each line which is a comma separated string and extract all but the last element into a feature vector and leave the last element as the "target" value.
This pipe outputs data in a Stream of Tuple2 in the following form
(Vector(features), value)
Extract a subset of the data into a Tuple2 which can be used as a training, test combo for model learning and evaluation.
Extract a subset of the data into a Tuple2 which can be used as a training, test combo for model learning and evaluation.
Usage: DynaMLPipe.splitTrainingTest(num_training, num_test)
Writes a Stream of String to a file.
Writes a Stream of String to a file.
Usage: DynaMLPipe.streamToFile("abc.csv")
Trim white spaces from each line in a Stream of String
Writes a Stream of AnyVal to a file.
Writes a Stream of AnyVal to a file.
Usage: DynaMLPipe.valuesToFile("abc.csv")
Perform gaussian normalization on a data stream which is a Tuple2 of the form.
Perform gaussian normalization on a data stream which is a Tuple2 of the form.
(Stream(training data), Stream(test data))
Perform gaussian normalization on a data stream which is a Tuple2 of the form.
Perform gaussian normalization on a data stream which is a Tuple2 of the form.
(Stream(training data), Stream(test data))
Perform gaussian normalization on a data stream which is a Tuple2 of the form.
Perform gaussian normalization on a data stream which is a Tuple2 of the form.
(Stream(training data), Stream(test data))