model to be tuned
description of the tuning session
Blueprint
Blueprint
the project the blueprint belongs to
a list of strings representing processes the blueprint uses
the blueprint ID of this blueprint - note that this is not an ObjectId
the model this blueprint will produce
(New in version v2.6) describes the category of the blueprint and indicates the kind of model this blueprint produces. Will be either “DataRobot” or “Scaleout DataRobot”.
the ID of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If null, no such constraints are enforced.
the ID of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If null, no such constraints are enforced.
whether this model supports enforcing montonic constraints
– I know i know. Capitalized field, ugh. Did this the return from api request has this capitalized
ISO-8601 string with the time that this calendar was created
the name of this calendar. This will be source if no name was specified
the filename of the uploaded calendar
- number of dates marked as having events in the calendard.
the number of distinct eventTypes in this calendar
ISO-8601 date string of the earliest event seen in this cal- endar
projectIds of projects currently using this calendar
the role the requesting user has on this calendar
array of multiseries ID column names in calendar file. Currently only one multiseries ID column is supported.
DateTimeModel object
Date Time Partitioning use for a Time Series Project.
Date Time Partitioning use for a Time Series Project. This should not be constructed directly. See io.github.timsetsfire.datarobot.DateTimePartitioningMethod when setting up DateTime Parititioning for a project via UI.
– The ID of the project
(string)–Thedatecolumnthatwillbeusedasadate-time partition column
(string) – The date format of the partition column
(boolean)–(Newinversionv2.8)Abooleanvalueindicatingwhether a time series project should be created instead of a regular project which uses datetime partitioning.
(boolean) – (New in version v2.20) A boolean value indicating whether an unsupervised project should be created
(boolean)–(Deprecatedinversionv2.11)RenamedtodefaultTo- KnownInAdvance. This parameter always has the same value as defaultToKnownInAdvance and will be removed in a future release.
(boolean) – (New in version v2.11) Indicates whether all features in a time series project default to being treated as known in advance, unless overridden by featureSettings. Features marked as known in advance must be specified into the future when making predictions. See the Time Series Overview for more context.
(boolean) – (New in version v2.17) Indicates whether all features in a time series project default to being treated as do-not-derive features, which excludes them from feature derivation.
– (New in version v2.8) Will only be specified for projects using time series. How many timeUnits of the datetimePartitionColumn into the past relative to the forecast point the feature derivation window should begin. Will be a negative integer, if present.
– (New in version 2.8) Will only be specified for projects using time series. How many timeUnits of the datetimePartitionColumn into the past relative to the forecast point the feature derivation window should end. Will be a non- positive integer, if present.
– (New in version v2.8) Will only be specified for projects using time series. How many timeUnits of the datetimePartitionColumn into the future relative to the forecast point the forecast window should start. Will be a non-negative integer, if present.
– (New in version v2.8) Will only be specified for projects using time series. How many timeUnits of the datetimePartitionColumn into the future relative to the forecast point the forecast window should end. Will be a non-negative integer, if present.
– (New in version v2.14) Will only be specified for projects using time series. Indicates which unit is basis for feature derivation window and forecast window. Will be either detected time unit or “ROW”.
– The default validation duration for all backtests. Will not be specified if the primary date/time feature in a time series project is irregular.
–Thestartdateofavailabletrainingdataforscoring the holdout
– The duration of available training duration for scor- ing the holdout
–Theenddateofavailabletrainingdataforscoringthe holdout
– The start date of the primary training data for scoring the holdout
– The duration of the primary training data for scoring the holdout
– The end date of the primary training data for scoring the holdout
– The start date of the gap between the training and holdout scoring data
– The duration of the gap between the training and holdout scoring data
– The end date of gap between the training and holdout scoring data
– The start date of the holdout scoring data
– The duration of the holdout scoring data
– The end date of the holdout scoring data
– The number of backtests used
– Whether models created via the autopilot will use “rowCount” or “duration” as their dataSelectionMethod.
– An array of the configured backtests
– An array of available warnings about potential problems with the chosen partitioning that could cause issues during modeling, although the partition- ing may be successfully submitted
– An array of per feature settings
– (New in version v2.14) Number of features that are marked as known in advance.
– (New in version v2.17) Number of features that are marked as “do not derive”.
(boolean) – (New in version v2.14) Indicating whether to use cross-series features.
(string) – (New in version v2.14) The aggregation type to apply when creating cross-series features. Optional, must be one of “total” or “average”.
(array) – (New in version v2.15) List of columns (currently of length 1). Optional setting that indicates how to further split series into related groups. For example, if every series is sales of an individual product, the series group-by could be the product category with values like “men’s clothing”, “sports equipment”, etc..
(string) – (new in version v2.15) Optional, the id of a calendar to use with this project.
–Thedatecolumnthatwillbeusedasadate- time partition column
(New in version v2.8) Optional, defaults to false. A boolean value indicating whether a time series project should be created instead of a regular project which uses datetime partitioning.
(New in version v2.20) Optional, defaults to false. A boolean value indicating whether an unsupervised project should be created.
(New in version v2.11) Optional, may only be specified for projects using time series. An array of column names identifying the multi- series id column(s) to use to identify series within the data. Currently only one multiseries id column may be specified. See the multiseries section of the docs for more context.
(Deprecated in version v2.11) Optional, renamed to defaultToKnownInAdvance, see below for more detail.
(New in version v2.11) Optional, for time series projects only. Sets whether all features default to being treated as known in advance features, which are features that are known into the future. Features marked as known in advance must be specified into the future when making predictions. The default is false, all features are not known in advance. Individual features can be set to a value different than the default using the featureSettings parameter. See the Time Series Overview for more context.
(New in version v2.17) Optional, for time se- ries projects only. Sets whether all features default to being treated as do-not-derive features, excluding them from feature derivation. Individual features can be set to a value different than the default by using the featureSettings parameter.
(New in version v2.8) Optional, may only be specified for projects using time series. How many timeUnits of the datetimeParti- tionColumn into the past relative to the forecast point the feature derivation window should begin. Must be a negative integer, if specified.
(int)–(Newinversion2.8)Optional,mayonlybe specified for projects using time series. How many timeUnits of the datetimePartitionCol- umn into the past relative to the forecast point the feature derivation window should end. Must be a non-positive integer, if specified.
(Newinversionv2.8)Optional,mayonlybespecified for projects using time series. How many timeUnits of the datetimePartitionColumn into the future relative to the forecast point the forecast window should start. Must be a non-negative integer, if specified.
(New in version v2.8) Optional, may only be specified for projects using time series. How many timeUnits of the datetimePartitionColumn into the future relative to the forecast point the forecast window should end. Must be a non- negative integer, if specified.
(New in version v2.14) Optional, may only be speci- fied for projects using time series. Indicates which unit is basis for feature derivation window and forecast window. Valid options are detected time unit or “ROW”. If omitted, the default value is detected time unit.
Optional. A duration string representing the de- fault validation duration for all backtests. If the primary date/time feature in a time series project is irregular, you cannot set a default validation length. Instead, set each duration individually.
(New in version v2.8) Optional. A boolean value indi- cating whether date partitioning should skip allocating a holdout fold. If omitted, the default value is false. When specifying disableHoldout: true, holdoutStartDate and holdoutDura- tion must not be set.
Optional. A datetime string representing the start date of the holdout fold. When specifying holdoutStartDate, one of holdoutEndDate or holdout- Duration must also be specified. This attribute cannot be specified when disableHoldout is true.
Optional. A datetime string representing the end date of the holdout fold. When specifying holdoutEndDate, holdoutStartDate must also be speci- fied. This attribute cannot be specified when disableHoldout is true.
Optional. A duration string representing the duration of the holdout fold. When specifying holdoutDuration, holdoutStartDate must also be spec- ified. This attribute cannot be specified when disableHoldout is true.
Optional, a duration string representing the duration of the gap between the training and the holdout data for the holdout model. For time series projects, defaults to the duration of the gap between the end of the feature derivation win- dow and the beginning of the forecast window. For OTV projects, defaults to a zero duration (P0Y0M0D).
Optional, the number of backtests to use. If omitted, defaults to a positive value selected by the server based on the validation and gap durations.
–Optional,either“duration”or“row- Count”. Defaults to “duration”. Whether models created via the autopilot will use “row- Count” or “duration” as their dataSelectionMethod.
(New in version v2.9) Optional, defaults to “auto”. Used to specify whether to treat data as exponential trend and apply transformations like log-transform. Valid options are “always”, “never”, “auto”.
(New in version v2.9) Optional, defaults to “auto” for timeseries projects. Used to specify which differencing method to apply if the data is stationary. Valid options are “auto”, “simple”, “none”, “seasonal”. Parameter “periodicities” must be specified if “seasonal” is chosen.
Optional. An array specifying individual backtests. The index of the backtests specified should range from 0 to numberOfBacktests - 1.
(New in version v2.9) Optional, an array specifying per feature settings. Features can be left unspecified.
(Newinversionv2.9)Optional,alistofperiodicities.Ifthis is provided, parameter “differencing_method” will default to “seasonal” if not provided or “auto”.
(New in version v2.14) Indicating whether to use cross-series features.
(New in version v2.14) The aggregation type to apply when creating cross-series features. Optional, must be one of “total” or “average”.
(New in version v2.15) List of columns (currently of length 1). Optional setting that indicates how to further split series into related groups. For example, if every series is sales of an individual product, the series group-by could be the product category with values like “men’s clothing”, “sports equipment”, etc.. Must be used with multiseries and useCrossSeriesFeatures enabled.
– (New in version v2.15) Optional, the ID of the calendar to use with this project.
Feature
Feature
(int) – the feature ID. (Note: Throughout the API, features are specified using their names, not this ID.)
(string) – feature name
(string) – the ID of the project the feature belongs to
(string) – feature type: ‘Numeric’, ‘Categorical’, etc.
(float) – numeric measure of the strength of relationship between the feature and target (independent of any model or other features)
(bool) – whether feature has too few values to be informative
(int) – number of unique values
(int) – number of missing values
(string)–(Newinversionv2.5)thedateformatstringforhowthisfeature was interpreted (or null if not a date feature). If not null, it will be compatible with https: //docs.python.org/2/library/time.html#time.strftime .
(bool) – (New in version v2.8) whether this feature can be used as a datetime partitioning feature for time series projects. Only sufficiently regular date features can be selected as the datetime feature for time series projects. Always false for non-date features. Date features that cannot be used in datetime partitioning for a time series project may be eligible for an OTV project, which has less stringent requirements.
(string)–(Newinversionv2.8)whythefeature is ineligible for time series projects, or “suitable” if it is eligible.
(string) – (New in version v2.8) the unit for the interval between values of this feature, e.g. DAY, MONTH, HOUR. When specifying windows for time series projects, the windows are expressed in terms of this unit. Only present for date features eligible for time series projects, and null otherwise.
(int) – (New in version v2.8) The minimum time step that can be used to specify time series windows. The units for this value are the timeUnit. When specifying windows for time series projects, all windows must have durations that are integer multiples of this number. Only present for date features that are eligible for time series projects and null otherwise.
– minimum value of the EDA sample of the feature.
– maximum value of the EDA sample of the feature.
– arithmetic mean of the EDA sample of the feature.
– median of the EDA sample of the feature.
– standard deviation of EDA sample of the feature.
(int) – whether or not the feature has target leakage. ‘SKIPPED_DETECTION’ indicates leakage detection was not run on the feature, ‘FALSE’ indicates no leakage, ‘MODERATE_RISK’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage
FeatureSetting
FeatureSetting
(string) – The name of the feature being specified.
(boolean)–(Newinversionv2.11)Optional,fortimeseriesprojects only. Sets whether the feature is known in advance, i.e., values for future dates are known at prediction time. If not specified, the feature uses the value from the defaultToKnownInAd- vance flag.
(boolean) – (New in version v2.17) Optional, for time series projects only. Sets whether the feature is do-not-derive, i.e., is excluded from feature derivation. If not specified, the feature uses the value from the defaultToDoNotDerive flag.
featurelist
featurelist
(string) – the ID of the featurelist
(string) – the ID of the project the featurelist belongs to
(string) – the name of the featurelist
(array) – a json array of names of features included in the featurelist
(int) – (New in version v2.13) the number of models that currently use this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.
(string) – (New in version v2.13) a timestamp string specifying when the featurelist was created
(boolean) – (New in version v2.13) whether the featurelist was cre- ated manually by a user or by DataRobot automation
(string) – (New in version v2.13) a user-friendly description of the fea- turelist, which can be updated by users
Jobs
Model
ModelJob
Partitioning for a given project that will be returned as part of a response from API
PredictJob
The recommender object will describe additional options specified if the project is used for a recommender problem.
The recommender object will describe additional options specified if the project is used for a recommender problem. It will be of the following form:
if a recommender problem, the name of the column containing item ids, otherwise null
indicates whether the project is a recommender problem
if a recommender problem, the name of the column containing user ids, otherwise null
Factory for io.github.timsetsfire.datarobot.Project instances.
an upper bound on running time (in hours), such that models exceeding the bound will be excluded in subsequent autopilot runs
defaults to False, if specified used to cap the maximum response of a model
defaults to null, the random seed to be used if specified
the name of the weight column, if specified, otherwise null.
- Optional, the percentage threshold between 0.1 and 50 for specifying the Rate@Top% metric.
the list of names of the offset columns, if specified, other- wise null.
the name of the exposure column, if specified, other- wise null.
the name of the event count column, if specified, otherwise null.
whether the project uses smart downsampling to throw away excess rows of the majority class. Smart downsampled projects express all sample percents in terms of percent of minority rows (as opposed to percent of all rows).
the percentage be- tween 0 and 100 of the majority rows that are kept, or null for projects without smart down- sampling
the total number of the minority rows available for modeling, or null for projects without smart downsampling
the total number of the majority rows available for modeling, or null for projects without smart downsampling
Include additional, longer-running models that will be run by the autopilot and available to run manually
Specifies the behavior of Scaleout models for the project. This is one of disabled, repositoryOnly, autopilot
null or str, the ID of the featurelist specifying a set of features with a monotonically increasing relationship to the target. All blueprints generated in the project use this as their default monotonic constraint, but it can be overriden at model submission time.
null or str, the ID of the featurelist specifying a set of features with a monotonically decreasing relationship to the target. All blueprints generated in the project use this as their default monotonic constraint, but it can be overriden at model submission time.
boolean (default to False), whether the project only includes blueprints support enforcing monotonic constraints
optional, defaults to True. Blend best models during Autopilot run.
optional, defaults to 0. Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.
optional, defaults to False. Keep only models that can be converted to scorable java code during Autopilot run.
optional, defaults to True. Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
(array) –) op- tional. For GAM models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to “B”, “C”], [“C”, “D” then GAM models will allow interactions between columns AxB, BxC, AxC, CxD. All others (AxD, BxD) will not be considered. If not specified - all possible interactions will be considered by model.
AdvancedOptions object