# Layers (contrib)

[TOC]

Ops for building neural network layers, regularizers, summaries, etc.

## Higher level ops for building neural network layers.

This package provides several ops that take care of creating variables that are used internally in a consistent way and provide the building blocks for many common machine learning algorithms.

`tf.contrib.layers.convolution2d(*args, **kwargs)`

Adds a 2D convolution followed by an optional batch_norm layer.

`convolution2d`

creates a variable called `weights`

, representing the
convolutional kernel, that is convolved with the `inputs`

to produce a
`Tensor`

of activations. If a `normalizer_fn`

is provided (such as
`batch_norm`

), it is then applied. Otherwise, if `normalizer_fn`

is
None and a `biases_initializer`

is provided then a `biases`

variable would be
created and added the activations. Finally, if `activation_fn`

is not `None`

,
it is applied to the activations as well.

##### Args:

: a 4-D tensor`inputs`

`[batch_size, height, width, channels]`

.: integer, the number of output filters.`num_outputs`

: a list of length 2`kernel_size`

`[kernel_height, kernel_width]`

of of the filters. Can be an int if both values are the same.: a list of length 2`stride`

`[stride_height, stride_width]`

. Can be an int if both strides are the same. Note that presently both strides must have the same value.: one of`padding`

`VALID`

or`SAME`

.: activation function.`activation_fn`

: normalization function to use instead of`normalizer_fn`

`biases`

. If`normalize_fn`

is provided then`biases_initializer`

and`biases_regularizer`

are ignored and`biases`

are not created nor added.: normalization function parameters.`normalizer_params`

: An initializer for the weights.`weights_initializer`

: Optional regularizer for the weights.`weights_regularizer`

: An initializer for the biases. If None skip biases.`biases_initializer`

: Optional regularizer for the biases.`biases_regularizer`

: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.`reuse`

: optional list of collections for all the variables or a dictionay containing a different list of collection per variable.`variables_collections`

: collection to add the outputs.`outputs_collections`

: If`trainable`

`True`

also add variables to the graph collection`GraphKeys.TRAINABLE_VARIABLES`

(see tf.Variable).: Optional scope for`scope`

`variable_op_scope`

.

##### Returns:

a tensor representing the output of the operation.

`tf.contrib.layers.fully_connected(*args, **kwargs)`

Adds a fully connected layer.

`fully_connected`

creates a variable called `weights`

, representing a fully
connected weight matrix, which is multiplied by the `inputs`

to produce a
`Tensor`

of hidden units. If a `normalizer_fn`

is provided (such as
`batch_norm`

), it is then applied. Otherwise, if `normalizer_fn`

is
None and a `biases_initializer`

is provided then a `biases`

variable would be
created and added the hidden units. Finally, if `activation_fn`

is not `None`

,
it is applied to the hidden units as well.

Note: that if `inputs`

have a rank greater than 2, then `inputs`

is flattened
prior to the initial matrix multiply by `weights`

.

##### Args:

: A tensor of with at least rank 2 and value for the last dimension, i.e.`inputs`

`[batch_size, depth]`

,`[None, None, None, channels]`

.: Integer, the number of output units in the layer.`num_outputs`

: activation function.`activation_fn`

: normalization function to use instead of`normalizer_fn`

`biases`

. If`normalize_fn`

is provided then`biases_initializer`

and`biases_regularizer`

are ignored and`biases`

are not created nor added.: normalization function parameters.`normalizer_params`

: An initializer for the weights.`weights_initializer`

: Optional regularizer for the weights.`weights_regularizer`

: An initializer for the biases. If None skip biases.`biases_initializer`

: Optional regularizer for the biases.`biases_regularizer`

: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.`reuse`

: Optional list of collections for all the variables or a dictionary containing a different list of collections per variable.`variables_collections`

: collection to add the outputs.`outputs_collections`

: If`trainable`

`True`

also add variables to the graph collection`GraphKeys.TRAINABLE_VARIABLES`

(see tf.Variable).: Optional scope for variable_op_scope.`scope`

##### Returns:

the tensor variable representing the result of the series of operations.

##### Raises:

: if x has rank less than 2 or if its last dimension is not set.`ValueError`

Aliases for fully_connected which set a default activation function are
available: `relu`

, `relu6`

and `linear`

.

## Regularizers

Regularization can help prevent overfitting. These have the signature
`fn(weights)`

. The loss is typically added to `tf.GraphKeys.REGULARIZATION_LOSS`

`tf.contrib.layers.apply_regularization(regularizer, weights_list=None)`

Returns the summed penalty by applying `regularizer`

to the `weights_list`

.

Adding a regularization penalty over the layer weights and embedding weights can help prevent overfitting the training data. Regularization over layer biases is less common/useful, but assuming proper data preprocessing/mean subtraction, it usually shouldn't hurt much either.

##### Args:

: A function that takes a single`regularizer`

`Tensor`

argument and returns a scalar`Tensor`

output.: List of weights`weights_list`

`Tensors`

or`Variables`

to apply`regularizer`

over. Defaults to the`GraphKeys.WEIGHTS`

collection if`None`

.

##### Returns:

A scalar representing the overall regularization penalty.

##### Raises:

: If`ValueError`

`regularizer`

does not return a scalar output.

`tf.contrib.layers.l1_regularizer(scale)`

Returns a function that can be used to apply L1 regularization to weights.

L1 regularization encourages sparsity.

##### Args:

: A scalar multiplier`scale`

`Tensor`

. 0.0 disables the regularizer.

##### Returns:

A function with signature `l1(weights, name=None)`

that apply L1
regularization.

##### Raises:

: If scale is outside of the range [0.0, 1.0] or if scale is not a float.`ValueError`

`tf.contrib.layers.l2_regularizer(scale)`

Returns a function that can be used to apply L2 regularization to weights.

Small values of L2 can help prevent overfitting the training data.

##### Args:

: A scalar multiplier`scale`

`Tensor`

. 0.0 disables the regularizer.

##### Returns:

A function with signature `l2(weights, name=None)`

that applies L2
regularization.

##### Raises:

: If scale is outside of the range [0.0, 1.0] or if scale is not a float.`ValueError`

`tf.contrib.layers.sum_regularizer(regularizer_list)`

Returns a function that applies the sum of multiple regularizers.

##### Args:

: A list of regularizers to apply.`regularizer_list`

##### Returns:

A function with signature `sum_reg(weights, name=None)`

that applies the
sum of all the input regularizers.

## Initializers

Initializers are used to initialize variables with sensible values given their size, data type, and purpose.

`tf.contrib.layers.xavier_initializer(uniform=True, seed=None, dtype=tf.float32)`

Returns an initializer performing "Xavier" initialization for weights.

This function implements the weight initialization from:

Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics.

This initializer is designed to keep the scale of the gradients roughly the
same in all layers. In uniform distribution this ends up being the range:
`x = sqrt(6. / (in + out)); [-x, x]`

and for normal distribution a standard
deviation of `sqrt(3. / (in + out))`

is used.

##### Args:

: Whether to use uniform or normal distributed random initialization.`uniform`

: A Python integer. Used to create random seeds. See`seed`

`set_random_seed`

for behavior.: The data type. Only floating point types are supported.`dtype`

##### Returns:

An initializer for a weight matrix.

`tf.contrib.layers.xavier_initializer_conv2d(uniform=True, seed=None, dtype=tf.float32)`

Returns an initializer performing "Xavier" initialization for weights.

This function implements the weight initialization from:

Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics.

This initializer is designed to keep the scale of the gradients roughly the
same in all layers. In uniform distribution this ends up being the range:
`x = sqrt(6. / (in + out)); [-x, x]`

and for normal distribution a standard
deviation of `sqrt(3. / (in + out))`

is used.

##### Args:

: Whether to use uniform or normal distributed random initialization.`uniform`

: A Python integer. Used to create random seeds. See`seed`

`set_random_seed`

for behavior.: The data type. Only floating point types are supported.`dtype`

##### Returns:

An initializer for a weight matrix.

`tf.contrib.layers.variance_scaling_initializer(factor=2.0, mode='FAN_IN', uniform=False, seed=None, dtype=tf.float32)`

Returns an initializer that generates tensors without scaling variance.

When initializing a deep network, it is in principle advantageous to keep the scale of the input variance constant, so it does not explode or diminish by reaching the final layer. This initializer use the following formula: if mode='FAN_IN': # Count only number of input connections. n = fan_in elif mode='FAN_OUT': # Count only number of output connections. n = fan_out elif mode='FAN_AVG': # Average number of inputs and output connections. n = (fan_in + fan_out)/2.0

```
truncated_normal(shape, 0.0, stddev=sqrt(factor / n))
```

To get http://arxiv.org/pdf/1502.01852v1.pdf use (Default):

- factor=2.0 mode='FAN_IN' uniform=False To get http://arxiv.org/abs/1408.5093 use:
- factor=1.0 mode='FAN_IN' uniform=True To get http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf use:
- factor=1.0 mode='FAN_AVG' uniform=True. To get xavier_initializer use either:
- factor=1.0 mode='FAN_AVG' uniform=True.
- factor=1.0 mode='FAN_AVG' uniform=False.

##### Args:

: Float. A multiplicative factor.`factor`

: String. 'FAN_IN', 'FAN_OUT', 'FAN_AVG'.`mode`

: Whether to use uniform or normal distributed random initialization.`uniform`

: A Python integer. Used to create random seeds. See`seed`

`set_random_seed`

for behavior.: The data type. Only floating point types are supported.`dtype`

##### Returns:

An initializer that generates tensors with unit variance.

##### Raises:

: if`ValueError`

`dtype`

is not a floating point type.: if`TypeError`

`mode`

is not in ['FAN_IN', 'FAN_OUT', 'FAN_AVG'].

## Optimization

Optimize weights given a loss.

`tf.contrib.layers.optimize_loss(loss, global_step, learning_rate, optimizer, gradient_noise_scale=None, gradient_multipliers=None, clip_gradients=None, moving_average_decay=0.9, learning_rate_decay_fn=None, update_ops=None, variables=None, name=None)`

Given loss and parameters for optimizer, returns a training op.

##### Args:

: Tensor, 0 dimensional.`loss`

: Tensor, step counter for each update.`global_step`

: float or Tensor, magnitude of update per each training step.`learning_rate`

: string, class or optimizer instance, used as trainer.`optimizer`

`string should be name of optimizer, like 'SGD', 'Adam', 'Adagrad'. Full list in OPTIMIZER_CLS_NAMES constant. class should be sub-class of tf.Optimizer that implements `compute_gradients` and `apply_gradients` functions. optimizer instance should be instantion of tf.Optimizer sub-class and have `compute_gradients` and `apply_gradients` functions.`

: float or None, adds 0-mean normal noise scaled by this`gradient_noise_scale`

`value.`

: dict of variables or variable names to floats.`gradient_multipliers`

`If present, gradients for specified variables will be multiplied by given constant.`

: float or`clip_gradients`

`None`

, clips gradients by this value.: float or None, takes into account previous loss`moving_average_decay`

`to make learning smoother due to outliers.`

: function, takes`learning_rate_decay_fn`

`learning_rate`

and`global_step`

``Tensor`s, returns `Tensor`. Can be used to implement any learning rate decay functions. For example: tf.train.exponential_decay.`

: list of update`update_ops`

`Operation`

s to execute at each step. If`None`

,`uses elements of UPDATE_OPS collection.`

: list of variables to optimize or`variables`

``None` to use all trainable variables.`

: The name for this operation is used to scope operations and summaries.`name`

##### Returns:

Training op.

##### Raises:

: if optimizer is wrong type.`ValueError`

## Summaries

Helper functions to summarize specific variables or ops.

`tf.contrib.layers.summarize_activation(op)`

Summarize an activation.

This applies the given activation and adds useful summaries specific to the activation.

##### Args:

: The tensor to summarize (assumed to be a layer activation).`op`

##### Returns:

The summary op created to summarize `op`

.

`tf.contrib.layers.summarize_tensor(tensor, tag=None)`

Summarize a tensor using a suitable summary type.

This function adds a summary op for `tensor`

. The type of summary depends on
the shape of `tensor`

. For scalars, a `scalar_summary`

is created, for all
other tensors, `histogram_summary`

is used.

##### Args:

: The tensor to summarize`tensor`

: The tag to use, if None then use tensor's op's name.`tag`

##### Returns:

The summary op created or None for string tensors.

`tf.contrib.layers.summarize_tensors(tensors, summarizer=summarize_tensor)`

Summarize a set of tensors.

`tf.contrib.layers.summarize_collection(collection, name_filter=None, summarizer=summarize_tensor)`

Summarize a graph collection of tensors, possibly filtered by name.

The layers module defines convenience functions `summarize_variables`

,
`summarize_weights`

and `summarize_biases`

, which set the `collection`

argument
of `summarize_collection`

to `VARIABLES`

, `WEIGHTS`

and `BIASES`

, respectively.

`tf.contrib.layers.summarize_activations(name_filter=None, summarizer=summarize_activation)`

Summarize activations, using `summarize_activation`

to summarize.