the dimensionality of the hyper-parameter tuning problem
the function that evaluates points in the space to real values
specifies the indices of discrete parameters and their numbers of discrete values
specifies the covariance kernel for hyper-parameters
the number of candidate points to draw at each iteration. Larger numbers give more precise results, but also incur higher computational cost.
whether to include observation noise in the evaluation function model
the random seed value
Discretize candidates with specified indices.
Discretize candidates with specified indices.
candidate with values in [0, 1]
Map that specifies the indices of discrete parameters and their numbers of discrete values
candidate with the specified discrete values
Draw candidates from the distributions along each dimension in the space
Draw candidates from the distributions along each dimension in the space
the number of candidates to draw
Searches and returns n points in the space.
Searches and returns n points in the space.
The number of points to find
The found points
Searches and returns n points in the space, given prior observations from past data sets.
Searches and returns n points in the space, given prior observations from past data sets.
The number of points to find
Observations made prior to searching, from past data sets (mean-centered)
The found points
Searches and returns n points in the space, given prior observations from this data set and past data sets.
Searches and returns n points in the space, given prior observations from this data set and past data sets.
The number of points to find
Observations made prior to searching, from this data set (not mean-centered)
Observations made prior to searching, from past data sets (mean-centered)
The found points
Returns the last model trained during search
Returns the last model trained during search
the last model
Produces the next candidate, given the last.
Produces the next candidate, given the last. In this case, we fit a Gaussian Process to the previous observations, and use it to predict the value of uniformly-drawn candidate points. The candidate with the best predicted evaluation is chosen.
the last candidate
the last observed value
the next candidate
Handler callback for each observation.
Handler callback for each observation. In this case, we record the observed point and values.
the observed point in the space
the observed value
Handler callback for each observation in the prior data.
Handler callback for each observation in the prior data. In this case, we record the observed point and values.
the observed point in the space
the observed value
Selects the best candidate according to the predicted values, where "best" is determined by the given transformation.
Selects the best candidate according to the predicted values, where "best" is determined by the given transformation. In the case of EI, we always search for the max; In the case of CB, we always search for the min.
matrix of candidates
predicted values for each candidate
prediction transformation function
the candidate with the best value
Performs a guided random search of the given ranges, where the search is guided by a Gaussian Process estimated from evaluations of the actual evaluation function. Since we assume that the evaluation function is very costly (as it often is for doing a full train / cycle evaluation of a machine learning model), it makes sense to spend time doing what would otherwise be considered an expensive computation to reduce the number of times we need to evaluate the function.
At a high level, the search routine proceeds as follows:
1) Assume a uniform prior over the evaluation function 2) Receive a new observation, and use it along with any previous observations to train a new Gaussian Process regression model for the evaluation function. This approximation is the new posterior over the evaluation function. 3) Sample candidates uniformly, evaluate the posterior for each, and select the candidate with the highest predicted evaluation. 4) Evaluate the best candidate with the actual evaluation function to acquire a new observation. 5) Repeat from step 2.