Package

org.platanios.tensorflow.api.ops

training

Permalink

package training

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. training
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class ExponentialMovingAverage extends AnyRef

    Permalink

    Maintains moving averages of variables by employing an exponential decay.

    Maintains moving averages of variables by employing an exponential decay.

    When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.

    The computeForVariables(...) and computeForValues(...) methods add shadow copies of the provided variables and values, along with ops that maintain their moving averages, in their shadow copies. They are used when building the training model. The ops that maintain moving averages are typically run after each training step. The average(...) and averageName(...) methods provide access to the shadow variables and their names. They are useful when building an evaluation model, or when restoring a model from a checkpoint file. They help use the moving averages in place of the last trained values for evaluations.

    The moving averages are computed using exponential decay. The decay value must be provided when creating an ExponentialMovingAverage object. The shadow variables are initialized with the same initial values as the corresponding variables, or with zeros for the case of values. When the ops used to maintain the moving averages are executed, each shadow variable is updated using the formula:

    shadowVariable -= (1 - decay) * (shadowVariable - value)

    This is mathematically equivalent to the classic formula below, but the use of an assignSub op (the -= in the formula) allows concurrent lock-free updates to the variables:

    shadowVariable = decay * shadow_variable + (1 - decay) * value

    Reasonable values for decay are close to 1.0f, typically in the "multiple-nines" range: 0.999f, etc.

    Example usage when creating a training model:

    // Create variables
    val v0 = tf.variable(...)
    val v1 = tf.variable(...)
    // Use the variables to build a training model
    ...
    // Create an op that applies the optimizer. This is what we usually would use as a training op.
    val optOp = opt.minimize(loss, variables = Set(v0, v1))
    
    // Create an exponential moving average object.
    val ema = tf.train.ExponentialMovingAverage(decay = 0.999f)
    
    val trainOp = tf.createWith(controlDependencies = Set(optOp)) {
      // Create the shadow variables, and add ops used to maintain the moving averages of `v0` and `v1`. This also
      // creates an op that will update the moving averages after each training step. This is what we will use in
      // place of the usual training op.
      ema.computeForVariables(Set(v0, v1))
    }
    
    // Train the model by running `trainOp`.

    There are two ways to use moving averages for evaluations:

    • Build a model that uses the shadow variables instead of the variables. For this, use the average(...) method which returns the shadow variable for a given variable.
    • Build a model normally but load the checkpoint files to evaluate by using the shadow variable names. For this use the averageName(...) method. Please refer to the Saver class documentation for more information on how to restore saved variables.

    Example of restoring the shadow variable values:

    // Create a saver that loads variables from their saved shadow values.
    val shadowV0Name = ema.averageName(v0)
    val shadowV1Name = ema.averageName(v1)
    val saver = tf.saver(Map(shadowV0Name -> v0, shadowV1Name -> v1))
    saver.restore(...checkpoint filename...)
    // `v0` and `v1` now hold the moving average values.

    The optional numUpdates parameter allows one to tweak the decay rate dynamically. It is typical to pass the count of training steps, usually kept in a variable that is incremented at each step, in which case the decay rate is lower at the start of training. This makes moving averages move faster. If passed, the actual decay rate used is defined as: min(decay, (1 + numUpdates) / (10 + numUpdates)).

Value Members

  1. object ExponentialMovingAverage

    Permalink
  2. package distribute

    Permalink

  3. package optimizers

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped