Maintains moving averages of variables by employing an exponential decay.
Maintains moving averages of variables by employing an exponential decay.
When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.
The computeForVariables(...)
and computeForValues(...)
methods add shadow copies of the provided variables and
values, along with ops that maintain their moving averages, in their shadow copies. They are used when building the
training model. The ops that maintain moving averages are typically run after each training step. The average(...)
and averageName(...)
methods provide access to the shadow variables and their names. They are useful when building
an evaluation model, or when restoring a model from a checkpoint file. They help use the moving averages in place of
the last trained values for evaluations.
The moving averages are computed using exponential decay. The decay value must be provided when creating an
ExponentialMovingAverage
object. The shadow variables are initialized with the same initial values as the
corresponding variables, or with zeros for the case of values. When the ops used to maintain the moving averages are
executed, each shadow variable is updated using the formula:
shadowVariable -= (1 - decay) * (shadowVariable - value)
This is mathematically equivalent to the classic formula below, but the use of an assignSub
op (the -=
in the
formula) allows concurrent lock-free updates to the variables:
shadowVariable = decay * shadow_variable + (1 - decay) * value
Reasonable values for decay
are close to 1.0f
, typically in the "multiple-nines" range: 0.999f
, etc.
Example usage when creating a training model:
// Create variables val v0 = tf.variable(...) val v1 = tf.variable(...) // Use the variables to build a training model ... // Create an op that applies the optimizer. This is what we usually would use as a training op. val optOp = opt.minimize(loss, variables = Set(v0, v1)) // Create an exponential moving average object. val ema = tf.train.ExponentialMovingAverage(decay = 0.999f) val trainOp = tf.createWith(controlDependencies = Set(optOp)) { // Create the shadow variables, and add ops used to maintain the moving averages of `v0` and `v1`. This also // creates an op that will update the moving averages after each training step. This is what we will use in // place of the usual training op. ema.computeForVariables(Set(v0, v1)) } // Train the model by running `trainOp`.
There are two ways to use moving averages for evaluations:
average(...)
method
which returns the shadow variable for a given variable.averageName(...)
method. Please refer to the Saver
class documentation for more information on how
to restore saved variables.Example of restoring the shadow variable values:
// Create a saver that loads variables from their saved shadow values. val shadowV0Name = ema.averageName(v0) val shadowV1Name = ema.averageName(v1) val saver = tf.saver(Map(shadowV0Name -> v0, shadowV1Name -> v1)) saver.restore(...checkpoint filename...) // `v0` and `v1` now hold the moving average values.
The optional numUpdates
parameter allows one to tweak the decay rate dynamically. It is typical to pass the count
of training steps, usually kept in a variable that is incremented at each step, in which case the decay rate is
lower at the start of training. This makes moving averages move faster. If passed, the actual decay rate used is
defined as: min(decay, (1 + numUpdates) / (10 + numUpdates))
.