ExponentialMovingAverage

Maintains moving averages of variables by employing an exponential decay.

When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.

The computeForVariables(...) and computeForValues(...) methods add shadow copies of the provided variables and values, along with ops that maintain their moving averages, in their shadow copies. They are used when building the training model. The ops that maintain moving averages are typically run after each training step. The average(...) and averageName(...) methods provide access to the shadow variables and their names. They are useful when building an evaluation model, or when restoring a model from a checkpoint file. They help use the moving averages in place of the last trained values for evaluations.

The moving averages are computed using exponential decay. The decay value must be provided when creating an ExponentialMovingAverage object. The shadow variables are initialized with the same initial values as the corresponding variables, or with zeros for the case of values. When the ops used to maintain the moving averages are executed, each shadow variable is updated using the formula:

shadowVariable -= (1 - decay) * (shadowVariable - value)

This is mathematically equivalent to the classic formula below, but the use of an assignSub op (the -= in the formula) allows concurrent lock-free updates to the variables:

shadowVariable = decay * shadow_variable + (1 - decay) * value

Reasonable values for decay are close to 1.0f, typically in the "multiple-nines" range: 0.999f, etc.

Example usage when creating a training model:

// Create variables
val v0 = tf.variable(...)
val v1 = tf.variable(...)
// Use the variables to build a training model
...
// Create an op that applies the optimizer. This is what we usually would use as a training op.
val optOp = opt.minimize(loss, variables = Set(v0, v1))

// Create an exponential moving average object.
val ema = tf.train.ExponentialMovingAverage(decay = 0.999f)

val trainOp = tf.createWith(controlDependencies = Set(optOp)) {
  // Create the shadow variables, and add ops used to maintain the moving averages of `v0` and `v1`. This also
  // creates an op that will update the moving averages after each training step. This is what we will use in
  // place of the usual training op.
  ema.computeForVariables(Set(v0, v1))
}

// Train the model by running `trainOp`.

There are two ways to use moving averages for evaluations:

Build a model that uses the shadow variables instead of the variables. For this, use the average(...) method which returns the shadow variable for a given variable.
Build a model normally but load the checkpoint files to evaluate by using the shadow variable names. For this use the averageName(...) method. Please refer to the Saver class documentation for more information on how to restore saved variables.

Example of restoring the shadow variable values:

// Create a saver that loads variables from their saved shadow values.
val shadowV0Name = ema.averageName(v0)
val shadowV1Name = ema.averageName(v1)
val saver = tf.saver(Map(shadowV0Name -> v0, shadowV1Name -> v1))
saver.restore(...checkpoint filename...)
// `v0` and `v1` now hold the moving average values.

The optional numUpdates parameter allows one to tweak the decay rate dynamically. It is typical to pass the count of training steps, usually kept in a variable that is incremented at each step, in which case the decay rate is lower at the start of training. This makes moving averages move faster. If passed, the actual decay rate used is defined as: min(decay, (1 + numUpdates) / (10 + numUpdates)).

Linear Supertypes

AnyRef, Any

Instance Constructors

new ExponentialMovingAverage(decay: Float, numUpdates: Option[Int] = None, zeroDebias: Boolean = false, name: String = "ExponentialMovingAverage")

decay
Decay value to use.
numUpdates
Optional count of number of updates applied to the variables.
zeroDebias
If true, the moving averages computed for values provided in computeForValues will be zero-debiased.
name
Name prefix to use for all created ops.

Attributes
protected

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def average(value: Output): Option[variables.Variable]

Returns the variable holding the average for value.
def average(variable: variables.Variable): Option[variables.Variable]

Returns the variable holding the average for variable.
def averageName(value: Output): Option[String]

Returns the name of the variable holding the average for value.
def averageName(variable: variables.Variable): Option[String]

Returns the name of the variable holding the average for variable.
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def computeForValues(values: Set[Output]): Op

Computes moving averages of the provided values.
Computes moving averages of the provided values.
This method creates shadow variables for all elements of values. The shadow variables for each value are created with trainable = false, initialized to 0 and optionally zero-debiased, and added to the Graph.Keys.MOVING_AVERAGE_VARIABLES and the Graph.Keys.GLOBAL_VARIABLES collections.
values
Values for which to compute moving averages.
returns
Created op that updates all the shadow variables, as described above.
def computeForVariables(variables: Set[variables.Variable] = Op.currentGraph.trainableVariables): Op

Computes moving averages of the provided variables.
Computes moving averages of the provided variables.
This method creates shadow variables for all elements of variables. The shadow variables for each variable are created with trainable = false, initialized to the variable's initial value, and added to the Graph.Keys.MOVING_AVERAGE_VARIABLES and the Graph.Keys.GLOBAL_VARIABLES collections.
variables
Variables for which to compute moving averages.
returns
Created op that updates all the shadow variables, as described above.
val decay: Float

Decay value to use.
val decayTensor: Output

Attributes
protected
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
val name: String

Name prefix to use for all created ops.
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val numUpdates: Option[Int]

Optional count of number of updates applied to the variables.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
val valueAverages: Map[Output, variables.Variable]

Attributes
protected
val variableAverages: Map[variables.Variable, variables.Variable]

Attributes
protected
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
val zeroDebias: Boolean

If true, the moving averages computed for values provided in computeForValues will be zero-debiased.

Related Docs: object ExponentialMovingAverage | package training

class ExponentialMovingAverage extends AnyRef

Instance Constructors

new ExponentialMovingAverage(decay: Float, numUpdates: Option[Int] = None, zeroDebias: Boolean = false, name: String = "ExponentialMovingAverage")

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def average(value: Output): Option[variables.Variable]

def average(variable: variables.Variable): Option[variables.Variable]

def averageName(value: Output): Option[String]

def averageName(variable: variables.Variable): Option[String]

def clone(): AnyRef

def computeForValues(values: Set[Output]): Op

def computeForVariables(variables: Set[variables.Variable] = Op.currentGraph.trainableVariables): Op

val decay: Float

val decayTensor: Output

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

val name: String

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val numUpdates: Option[Int]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

val valueAverages: Map[Output, variables.Variable]

val variableAverages: Map[variables.Variable, variables.Variable]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

val zeroDebias: Boolean

Inherited from AnyRef

Inherited from Any

Ungrouped