org.platanios.tensorflow.api.ops.training
Decay value to use.
Optional count of number of updates applied to the variables.
If true
, the moving averages computed for values provided in computeForValues
will be
zero-debiased.
Name prefix to use for all created ops.
Returns the variable holding the average for value
.
Returns the variable holding the average for variable
.
Returns the name of the variable holding the average for value
.
Returns the name of the variable holding the average for variable
.
Computes moving averages of the provided values.
Computes moving averages of the provided values.
This method creates shadow variables for all elements of values
. The shadow variables for each value are
created with trainable = false
, initialized to 0
and optionally zero-debiased, and added to the
Graph.Keys.MOVING_AVERAGE_VARIABLES
and the Graph.Keys.GLOBAL_VARIABLES
collections.
Values for which to compute moving averages.
Created op that updates all the shadow variables, as described above.
Computes moving averages of the provided variables.
Computes moving averages of the provided variables.
This method creates shadow variables for all elements of variables
. The shadow variables for each variable are
created with trainable = false
, initialized to the variable's initial value, and added to the
Graph.Keys.MOVING_AVERAGE_VARIABLES
and the Graph.Keys.GLOBAL_VARIABLES
collections.
Variables for which to compute moving averages.
Created op that updates all the shadow variables, as described above.
Decay value to use.
Name prefix to use for all created ops.
Optional count of number of updates applied to the variables.
If true
, the moving averages computed for values provided in computeForValues
will be
zero-debiased.
Maintains moving averages of variables by employing an exponential decay.
When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.
The
computeForVariables(...)
andcomputeForValues(...)
methods add shadow copies of the provided variables and values, along with ops that maintain their moving averages, in their shadow copies. They are used when building the training model. The ops that maintain moving averages are typically run after each training step. Theaverage(...)
andaverageName(...)
methods provide access to the shadow variables and their names. They are useful when building an evaluation model, or when restoring a model from a checkpoint file. They help use the moving averages in place of the last trained values for evaluations.The moving averages are computed using exponential decay. The decay value must be provided when creating an
ExponentialMovingAverage
object. The shadow variables are initialized with the same initial values as the corresponding variables, or with zeros for the case of values. When the ops used to maintain the moving averages are executed, each shadow variable is updated using the formula:shadowVariable -= (1 - decay) * (shadowVariable - value)
This is mathematically equivalent to the classic formula below, but the use of an
assignSub
op (the-=
in the formula) allows concurrent lock-free updates to the variables:shadowVariable = decay * shadow_variable + (1 - decay) * value
Reasonable values for
decay
are close to1.0f
, typically in the "multiple-nines" range:0.999f
, etc.Example usage when creating a training model:
There are two ways to use moving averages for evaluations:
average(...)
method which returns the shadow variable for a given variable.averageName(...)
method. Please refer to theSaver
class documentation for more information on how to restore saved variables.Example of restoring the shadow variable values:
The optional
numUpdates
parameter allows one to tweak the decay rate dynamically. It is typical to pass the count of training steps, usually kept in a variable that is incremented at each step, in which case the decay rate is lower at the start of training. This makes moving averages move faster. If passed, the actual decay rate used is defined as:min(decay, (1 + numUpdates) / (10 + numUpdates))
.