Module API

Overview

The module API, defined in the module (or simply mod) package, provides an intermediate and high-level interface for performing computation with a Symbol. One can roughly think a module is a machine which can execute a program defined by a Symbol.

The class module.Module is a commonly used module, which accepts a Symbol as the input:

data = mx.symbol.Variable('data')
fc1  = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu")
fc2  = mx.symbol.FullyConnected(act1, name='fc2', num_hidden=10)
out  = mx.symbol.SoftmaxOutput(fc2, name = 'softmax')
mod = mx.mod.Module(out)  # create a module by given a Symbol

Assume there is a valid MXNet data iterator data. We can initialize the module:

mod.bind(data_shapes=data.provide_data,
         label_shapes=data.provide_label)  # create memory by given input shapes
mod.init_params()  # initial parameters with the default random initializer

Now the module is able to compute. We can call high-level API to train and predict:

mod.fit(data, num_epoch=10, ...)  # train
mod.predict(new_data)  # predict on new data

or use intermediate APIs to perform step-by-step computations

mod.forward(data_batch)  # forward on the provided data batch
mod.backward()  # backward to calculate the gradients
mod.update()  # update parameters using the default optimizer

A detailed tutorial is available at http://mxnet.io/tutorials/python/module.html.

Note

module is used to replace model, which has been deprecated.

The module package provides several modules:

BaseModule The base class of a module.
Module Module is a basic module that wrap a Symbol.
SequentialModule A SequentialModule is a container module that can chain multiple modules together.
BucketingModule This module helps to deal efficiently with varying-length inputs.
PythonModule A convenient module class that implements many of the module APIs as empty functions.
PythonLossModule A convenient module class that implements many of the module APIs as empty functions.

We summarize the interface for each class in the following sections.

The BaseModule class

The BaseModule is the base class for all other module classes. It defines the interface each module class should provide.

Initialize memory

BaseModule.bind Bind the symbols to construct executors.

Get and set parameters

BaseModule.init_params Initialize the parameters and auxiliary states.
BaseModule.set_params Assign parameter and aux state values.
BaseModule.get_params Get parameters, those are potentially copies of the the actual parameters used to do computation on the device.
BaseModule.save_params Save model parameters to file.
BaseModule.load_params Load model parameters from file.

Train and predict

BaseModule.fit Train the module parameters.
BaseModule.score Run prediction on eval_data and evaluate the performance according to eval_metric.
BaseModule.iter_predict Iterate over predictions.
BaseModule.predict Run prediction and collect the outputs.

Forward and backward

BaseModule.forward Forward computation.
BaseModule.backward Backward computation.
BaseModule.forward_backward A convenient function that calls both forward and backward.

Update parameters

BaseModule.init_optimizer Install and initialize optimizers.
BaseModule.update Update parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.
BaseModule.update_metric Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Input and output

BaseModule.data_names A list of names for data required by this module.
BaseModule.output_names A list of names for the outputs of this module.
BaseModule.data_shapes A list of (name, shape) pairs specifying the data inputs to this module.
BaseModule.label_shapes A list of (name, shape) pairs specifying the label inputs to this module.
BaseModule.output_shapes A list of (name, shape) pairs specifying the outputs of this module.
BaseModule.get_outputs Get outputs of the previous forward computation.
BaseModule.get_input_grads Get the gradients to the inputs, computed in the previous backward computation.

Others

BaseModule.get_states Get states from all devices
BaseModule.set_states Set value for states.
BaseModule.install_monitor Install monitor on all executors.
BaseModule.symbol Get the symbol associated with this module.

Other build-in modules

Besides the basic interface defined in BaseModule, each module class supports additional functionality. We summarize them in this section.

Class Module

Module.load Create a model from previously saved checkpoint.
Module.save_checkpoint Save current progress to checkpoint.
Module.reshape Reshape the module for new input shapes.
Module.borrow_optimizer Borrow optimizer from a shared module.
Module.save_optimizer_states Save optimizer (updater) state to file
Module.load_optimizer_states Load optimizer (updater) state from file

Class BucketModule

BucketModule.switch_bucket

Class SequentialModule

SequentialModule.add Add a module to the chain.

API Reference

class mxnet.module.BaseModule(logger=<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>)

The base class of a module.

A module represents a computation component. Modules are designed so that they can be thoughtof as a computation “machine”. Each model can run forward, backward, update its parameters, etc. We aim to make the APIs easy to use, especially in the case when we need to use the imperative API to work with multiple modules (e.g. stochastic depth network).

A module has several states:

  • Initial state: Memory is not allocated yet, thus the moule is not ready for computation yet.
  • Binded: Shapes for inputs, outputs, and parameters are all known, memory has been allocated, and the module is ready for computation.
  • Parameters initialized: For modules with parameters, doing computation before initializing the parameters might result in undefined outputs.
  • Optimizer installed: An optimizer can be installed to a module. After this, the parameters of the module can be updated according to the optimizer after gradients are computed (forward-backward).

In order for a module to interact with others, it must be able to report the following information in its initial state (before binding):

  • data_names: list of type string indicating the names of the required input data.
  • output_names: list of type string indicating the names of the required outputs.

After binding, a modulse should be able to report the following richer information:

  • state information
    • binded: bool, indicates whether the memory buffers needed for computation have been allocated.
    • for_training: whether the module is bound for training.
    • params_initialized: bool, indicates whether the parameters of this modules has been initialized.
    • optimizer_initialized: bool, indicates whether an optimizer is defined and initialized.
    • inputs_need_grad: bool, indicates whether gradients with respect to the input data are needed. Might be useful when implementing composition of modules.
  • input/output information
    • data_shapes: a list of (name, shape). In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelism, the data arrays might not be of the same shape as viewed from the external world.
    • label_shapes: a list of (name, shape). This might be [] if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not bound for training.
    • output_shapes: a list of (name, shape) for outputs of the module.
  • parameters (for modules with parameters)
    • get_params(): return a tuple (arg_params, aux_params). Each of those is a dictionary of name to NDArray mapping. Those NDArray always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters. Therefore, modifying
    • set_params(arg_params, aux_params): assign parameters to the devices doing the computation.
    • init_params(...): a more flexible interface to assign or initialize the parameters.
  • setup
    • bind(): prepare environment for computation.
    • init_optimizer(): install optimizer for parameter updating.
  • computation
    • forward(data_batch): forward operation.
    • backward(out_grads=None): backward operation.
    • update(): update parameters according to installed optimizer.
    • get_outputs(): get outputs of the previous forward operation.
    • get_input_grads(): get the gradients with respect to the inputs computed in the previous backward operation.
    • update_metric(metric, labels): update performance metric for the previous forward
      computed results.
  • other properties (mostly for backward compatability)
    • symbol: the underlying symbolic graph for this module (if any) This property is not necessarily constant. For example, for BucketingModule, this property is simply the current symbol being used. For other modules, this value might not be well defined.

When those intermediate-level API are implemented properly, the following high-level API will be automatically available for a module:

  • fit: train the module parameters on a data set.
  • predict: run prediction on a data set and collect outputs.
  • score: run prediction on a data set and evaluate performance.

Examples

An example of creating a mxnet module::
>>> import mxnet as mx
>>> data = mx.symbol.Variable('data')
>>> fc1  = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
>>> act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu")
>>> fc2  = mx.symbol.FullyConnected(act1, name = 'fc2', num_hidden = 64)
>>> act2 = mx.symbol.Activation(fc2, name='relu2', act_type="relu")
>>> fc3  = mx.symbol.FullyConnected(act2, name='fc3', num_hidden=10)
>>> out  = mx.symbol.SoftmaxOutput(fc3, name = 'softmax')
>>> mod = mx.mod.Module(out)
forward_backward(data_batch)

A convenient function that calls both forward and backward.

score(eval_data, eval_metric, num_batch=None, batch_end_callback=None, score_end_callback=None, reset=True, epoch=0)

Run prediction on eval_data and evaluate the performance according to eval_metric.

Parameters:
  • eval_data (DataIter) –
  • eval_metric (EvalMetric) –
  • num_batch (int) – Number of batches to run. Defaults to None, indicating run until the DataIter finishes.
  • batch_end_callback (function) – Could also be a list of functions.
  • reset (bool) – Defaults to True. Indicates whether we should reset eval_data before starting evaluating.
  • epoch (int) – Defaults to 0. For compatibility, this will be passed to callbacks (if any). During training, this will correspond to the training epoch number.

Examples

An example of using score for prediction::
>>> #Evaluate accuracy on val_dataiter
>>> metric = mx.metric.Accuracy()
>>> mod.score(val_dataiter, metric)
iter_predict(eval_data, num_batch=None, reset=True)

Iterate over predictions.

for pred, i_batch, batch in module.iter_predict(eval_data):
# pred is a list of outputs from the module # i_batch is a integer # batch is the data batch from the data iterator
Parameters:
  • eval_data (DataIter) –
  • num_batch (int) – Default is None, indicating running all the batches in the data iterator.
  • reset (bool) – Default is True, indicating whether we should reset the data iter before start doing prediction.
predict(eval_data, num_batch=None, merge_batches=True, reset=True, always_output_list=False)

Run prediction and collect the outputs.

When merge_batches is True (by default), the return value will be a list [out1, out2, out3], where each element is formed by concatenating the outputs for all the mini-batches. When always_output_list is False (as by default), then in the case of a single output, out1 is returned instead of [out1].

When merge_batches is False, the return value will be a nested list like [[out1_batch1, out2_batch1], [out1_batch2], ...]. This mode is useful because in some cases (e.g. bucketing), the module does not necessarily produce the same number of outputs.

The objects in the results have type NDArray. If you need to work with a numpy array, just call .asnumpy() on each NDArray.

Parameters:
  • eval_data (DataIter) –
  • num_batch (int) – Defaults to None, indicating running all the batches in the data iterator.
  • merge_batches (bool) – Defaults to True, see above for return values.
  • reset (bool) – Defaults to True, indicating whether we should reset the data iter before start doing prediction.
  • always_output_list (bool) – Defaults to False, see above for return values.
Returns:

Prediction results.

Return type:

list of NDArray or list of list of NDArray

Examples

An example of using predict for prediction:: >>> #Predict on the first 10 batches of val_dataiter >>> mod.predict(eval_data=val_dataiter, num_batch=10)

fit(train_data, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), eval_end_callback=None, eval_batch_end_callback=None, initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_rebind=False, force_init=False, begin_epoch=0, num_epoch=None, validation_metric=None, monitor=None)

Train the module parameters.

Parameters:
  • train_data (DataIter) –
  • eval_data (DataIter) – If not None, will be used as validation set and evaluate the performance after each epoch.
  • eval_metric (str or EvalMetric) – Defaults to ‘accuracy’. The performance measure used to display during training. Other possible predefined metrics are: ‘ce’ (CrossEntropy), ‘f1’, ‘mae’, ‘mse’, ‘rmse’, ‘top_k_accuracy’
  • epoch_end_callback (function or list of functions) – Each callback will be called with the current epoch, symbol, arg_params and aux_params.
  • batch_end_callback (function or list of function) – Each callback will be called with a BatchEndParam.
  • kvstore (str or KVStore) – Defaults to ‘local’.
  • optimizer (str or Optimizer) – Defaults to ‘sgd’
  • optimizer_params (dict) – Defaults to (('learning_rate', 0.01),). The parameters for the optimizer constructor. The default value is not a dict, just to avoid pylint warning on dangerous default values.
  • eval_end_callback (function or list of function) – These will be called at the end of each full evaluation, with the metrics over the entire evaluation set.
  • eval_batch_end_callback (function or list of function) – These will be called at the end of each minibatch during evaluation.
  • initializer (Initializer) – The initializer is called to initialize the module parameters when they are not already initialized.
  • arg_params (dict) – Defaults to None, if not None, should be existing parameters from a trained model or loaded from a checkpoint (previously saved model). In this case, the value here will be used to initialize the module parameters, unless they are already initialized by the user via a call to init_params or fit. arg_params has higher priority to initializer.
  • aux_params (dict) – Defaults to None. Similar to arg_params, except for auxiliary states.
  • allow_missing (bool) – Defaults to False. Indicate whether we allow missing parameters when arg_params and aux_params are not None. If this is True, then the missing parameters will be initialized via the initializer.
  • force_rebind (bool) – Defaults to False. Whether to force rebinding the executors if already bound.
  • force_init (bool) – Defaults to False. Indicate whether we should force initialization even if the parameters are already initialized.
  • begin_epoch (int) – Defaults to 0. Indicate the starting epoch. Usually, if we are resuming from a checkpoint saved at a previous training phase at epoch N, then we should specify this value as N+1.
  • num_epoch (int) – Number of epochs to run training.

Examples

An example of using fit for training::
>>> #Assume training dataIter and validation dataIter are ready
>>> mod.fit(train_data=train_dataiter, eval_data=val_dataiter,
            optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
            num_epoch=10)
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not bound for training, then this should return an empty list [].

output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

get_params()

Get parameters, those are potentially copies of the the actual parameters used to do computation on the device.

Returns:A pair of dictionaries each mapping parameter names to NDArray values.
Return type:(arg_params, aux_params)

Examples

An example of getting module parameters:: >>> print mod.get_params() ({‘fc2_weight’: <NDArray 64x128 @cpu(0)>, ‘fc1_weight’: <NDArray 128x100 @cpu(0)>, ‘fc3_bias’: <NDArray 10 @cpu(0)>, ‘fc3_weight’: <NDArray 10x64 @cpu(0)>, ‘fc2_bias’: <NDArray 64 @cpu(0)>, ‘fc1_bias’: <NDArray 128 @cpu(0)>}, {})

init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize the parameters and auxiliary states.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If True, force_init will force re-initialize even if already initialized.

Examples

An example of initializing module parameters::
>>> mod.init_params()
set_params(arg_params, aux_params, allow_missing=False, force_init=True)

Assign parameter and aux state values.

Parameters:
  • arg_params (dict) – Dictionary of name to value (NDArray) mapping.
  • aux_params (dict) – Dictionary of name to value (NDArray) mapping.
  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If True, will force re-initialize even if already initialized.

Examples

An example of setting module parameters::
>>> sym, arg_params, aux_params =             >>>     mx.model.load_checkpoint(model_prefix, n_epoch_load)
>>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
save_params(fname)

Save model parameters to file.

Parameters:fname (str) – Path to output param file.

Examples

An example of saving module parameters::
>>> mod.save_params('myfile')
load_params(fname)

Load model parameters from file.

Parameters:fname (str) – Path to input param file.

Examples

An example of loading module parameters
>>> mod.load_params('myfile')
get_states(merge_multi_context=True)

Get states from all devices

If merge_multi_context is True, returns output of form [out1, out2]. Otherwise, it returns output of the form [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All output elements are NDArray.

Parameters:merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the states will be collected from multiple devices. A True value indicates that we should merge the collected results so that they look like from a single executor.
Returns:
Return type:A list of NDArray or a list of list of NDArray.
set_states(states=None, value=None)

Set value for states. Only one of states & value can be specified.

Parameters:
  • states (list of list of NDArray) – Source states arrays formatted like [[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]].
  • value (number) – A single scalar value for all state arrays.
install_monitor(mon)

Install monitor on all executors.

prepare(data_batch)

Prepare the module for processing a data batch.

Usually involves switching bucket and reshaping.

Parameters:data_batch (DataBatch) –
forward(data_batch, is_train=None)

Forward computation.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.

Examples

An example of forward computation::
>>> from collections import namedtuple
>>> Batch = namedtuple('Batch', ['data'])
>>> mod.bind(data_shapes=[('data', (1, 10, 10))])
>>> mod.init_params()
>>> data1 = [mx.nd.ones([1, 10, 10])]
>>> mod.forward(Batch(data1))
>>> print mod.get_outputs()[0].asnumpy()
[[ 0.09999977  0.10000153  0.10000716  0.10000195  0.09999853  0.09999743
   0.10000272  0.10000113  0.09999088  0.09999888]]
backward(out_grads=None)

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.

Examples

An example of backward computation::
>>> mod.backward()
>>> print mod.get_input_grads()[0].asnumpy()
[[[  1.10182791e-05   5.12257748e-06   4.01927764e-06   8.32566820e-06
    -1.59775993e-06   7.24269375e-06   7.28067835e-06  -1.65902311e-05
     5.46342608e-06   8.44196393e-07]
     ...]]
get_outputs(merge_multi_context=True)

Get outputs of the previous forward computation.

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it returns out put of form [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements have type NDArray. When merge_multi_context is False, those NDArray instances might live on different devices.

Parameters:merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicates that we should merge the collected results so that they look like from a single executor.
Returns:Output
Return type:list of NDArray or list of list of NDArray.

Examples

An example of getting forward output::
>>> print mod.get_outputs()[0].asnumpy()
[[ 0.09999977  0.10000153  0.10000716  0.10000195  0.09999853  0.09999743
   0.10000272  0.10000113  0.09999088  0.09999888]]
get_input_grads(merge_multi_context=True)

Get the gradients to the inputs, computed in the previous backward computation.

If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements have type NDArray. When merge_multi_context is False, those NDArray instances might live on different devices.

Parameters:merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the gradients will be collected from multiple devices. A True value indicates that we should merge the collected results so that they look like from a single executor.
Returns:Input gradients.
Return type:list of NDArray or list of list of NDArray

Examples

An example of getting input gradients::
>>> print mod.get_input_grads()[0].asnumpy()
[[[  1.10182791e-05   5.12257748e-06   4.01927764e-06   8.32566820e-06
    -1.59775993e-06   7.24269375e-06   7.28067835e-06  -1.65902311e-05
    5.46342608e-06   8.44196393e-07]
    ...]]
update()

Update parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

Examples

An example of updating module parameters::
>>> mod.init_optimizer(kvstore='local', optimizer='sgd',
>>>                    optimizer_params=(('learning_rate', 0.01), ))
>>> mod.backward()
>>> mod.update()
>>> print mod.get_params()[0]['fc3_weight'].asnumpy()
[[  5.86930104e-03   5.28078526e-03  -8.88729654e-03  -1.08308345e-03
    6.13054074e-03   4.27560415e-03   1.53817423e-03   4.62131854e-03
    4.69872449e-03  -2.42400169e-03   9.94111411e-04   1.12386420e-03
    ...]]
update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.

Examples

An example of updating evaluation metric::
>>> mod.forward(data_batch)
>>> mod.update_metric(metric, data_batch.label)
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).

Examples

An example of binding symbols::
>>> mod.bind(data_shapes=[('data', (1, 10, 10))])
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers.

Parameters:
  • kvstore (str or KVStore) – Defaults to ‘local’.
  • optimizer (str or Optimizer) – Defaults to ‘sgd’
  • optimizer_params (dict) – Defaults to (('learning_rate', 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Defaults to False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.

Examples

An example of initializing optimizer::
>>> mod.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate', 0.005),))
symbol

Get the symbol associated with this module.

Except for Module, for other types of modules (e.g. BucketingModule), this property might not be a constant throughout its life time. Some modules might not even be associated with any symbols.

class mxnet.module.Module(symbol, data_names=('data', ), label_names=('softmax_label', ), logger=<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>, context=cpu(0), work_load_list=None, fixed_param_names=None, state_names=None)

Module is a basic module that wrap a Symbol. It is functionally the same as the FeedForward model, except under the module API.

Parameters:
  • symbol (Symbol) –
  • data_names (list of str) – Defaults to (‘data’) for a typical model used in image classification.
  • label_names (list of str) – Defaults to (‘softmax_label’) for a typical model used in image classification.
  • logger (Logger) – Defaults to logging.
  • context (Context or list of Context) – Defaults to mx.cpu().
  • work_load_list (list of number) – Default None, indicating uniform workload.
  • fixed_param_names (list of str) – Default None, indicating no network parameters are fixed.
  • state_names (list of str) – states are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by set_states()
static load(prefix, epoch, load_optimizer_states=False, **kwargs)

Create a model from previously saved checkpoint.

Parameters:
  • prefix (str) – path prefix of saved model files. You should have “prefix-symbol.json”, “prefix-xxxx.params”, and optionally “prefix-xxxx.states”, where xxxx is the epoch number.
  • epoch (int) – epoch to load.
  • load_optimizer_states (bool) – whether to load optimizer states. Checkpoint needs to have been made with save_optimizer_states=True.
  • data_names (list of str) – Default is (‘data’) for a typical model used in image classification.
  • label_names (list of str) – Default is (‘softmax_label’) for a typical model used in image classification.
  • logger (Logger) – Default is logging.
  • context (Context or list of Context) – Default is cpu().
  • work_load_list (list of number) – Default None, indicating uniform workload.
  • fixed_param_names (list of str) – Default None, indicating no network parameters are fixed.
save_checkpoint(prefix, epoch, save_optimizer_states=False)

Save current progress to checkpoint. Use mx.callback.module_checkpoint as epoch_end_callback to save during training.

Parameters:
  • prefix (str) – The file prefix to checkpoint to
  • epoch (int) – The current epoch number
  • save_optimizer_states (bool) – Whether to save optimizer states for continue training
data_names

A list of names for data required by this module.

label_names

A list of names for labels required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

Get data shapes.

Returns:
Return type:A list of (name, shape) pairs.
label_shapes

Get label shapes.

Returns:
  • A list of (name, shape) pairs. The return value could be None if
  • the module does not need labels, or if the module is not bound for
  • training (in this case, label information is not available).
output_shapes

Get output shapes.

Returns:
Return type:A list of (name, shape) pairs.
get_params()

Get current parameters. :returns: * (arg_params, aux_params), each a dictionary of name to parameters (in

  • NDArray) mapping.
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize the parameters and auxiliary states.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If True, will force re-initialize even if already initialized.
set_params(arg_params, aux_params, allow_missing=False, force_init=True)

Assign parameter and aux state values.

Parameters:
  • arg_params (dict) – Dictionary of name to NDArray.
  • aux_params (dict) – Dictionary of name to NDArray.
  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If True``, will force re-initialize even if already initialized.

Examples

An example of setting module parameters::
>>> sym, arg_params, aux_params =             mx.model.load_checkpoint(model_prefix, n_epoch_load)
>>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bound for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
reshape(data_shapes, label_shapes=None)

Reshape the module for new input shapes.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
borrow_optimizer(shared_module)

Borrow optimizer from a shared module. Used in bucketing, where exactly the same optimizer (esp. kvstore) is used.

Parameters:shared_module (Module) –
forward(data_batch, is_train=None)

Forward computation.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.
backward(out_grads=None)

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
update()

Update parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

get_outputs(merge_multi_context=True)

Get outputs of the previous forward computation.

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are NDArray. When merge_multi_context is False, those NDArray might live on different devices.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:Output.
Return type:list of NDArray or list of list of NDArray
get_input_grads(merge_multi_context=True)

Get the gradients with respect to the inputs of the module.

If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements are NDArray.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:Input gradients
Return type:list of NDArray or list of list of NDArray
get_states(merge_multi_context=True)

Get states from all devices

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are NDArray.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the states will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:States
Return type:list of NDArray or list of list of NDArray
set_states(states=None, value=None)

Set value for states. Only one of states & value can be specified.

Parameters:
  • states (list of list of NDArrays) – source states arrays formatted like [[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]].
  • value (number) – a single scalar value for all state arrays.
update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
save_optimizer_states(fname)

Save optimizer (updater) state to file

Parameters:fname (str) – Path to output states file.
load_optimizer_states(fname)

Load optimizer (updater) state from file

Parameters:fname (str) – Path to input states file.
install_monitor(mon)

Install monitor on all executors

class mxnet.module.BucketingModule(sym_gen, default_bucket_key=None, logger=<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>, context=cpu(0), work_load_list=None, fixed_param_names=None, state_names=None)

This module helps to deal efficiently with varying-length inputs.

Parameters:
  • sym_gen (function) – A function when called with a bucket key, returns a triple (symbol, data_names, label_names).
  • default_bucket_key (str (or any python object)) – The key for the default bucket.
  • logger (Logger) –
  • context (Context or list of Context) – Defaults to mx.cpu()
  • work_load_list (list of number) – Defaults to None, indicating uniform workload.
  • fixed_param_names (list of str) – Defaults to None, indicating no network parameters are fixed.
  • state_names (list of str) – States are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by set_states()
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

Get data shapes. :returns: :rtype: A list of (name, shape) pairs.

label_shapes

Get label shapes. :returns: * A list of (name, shape) pairs. The return value could be None if

  • the module does not need labels, or if the module is not bound for
  • training (in this case, label information is not available).
output_shapes

Get output shapes. :returns: :rtype: A list of (name, shape) pairs.

get_params()

Get current parameters. :returns: * (arg_params, aux_params), each a dictionary mapping names to parameters

  • (NDArray).
set_params(arg_params, aux_params, allow_missing=False, force_init=True)

Assign parameter and aux state values.

Parameters:
  • arg_params (dict) – Dictionary of name to value (NDArray) mapping.
  • aux_params (dict) – Dictionary of name to value (NDArray) mapping.
  • allow_missing (bool) – If true, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If true, will force re-initialize even if already initialized.

Examples

An example of setting module parameters::
>>> sym, arg_params, aux_params =             >>>     mx.model.load_checkpoint(model_prefix, n_epoch_load)
>>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize parameters.

Parameters:
  • initializer (Initializer) –
  • arg_params (dict) – Defaults to None. Existing parameters. This has higher priority than initializer.
  • aux_params (dict) – Defaults to None. Existing auxiliary states. This has higher priority than initializer.
  • allow_missing (bool) – Allow missing values in arg_params and aux_params (if not None). In this case, missing values will be filled with initializer.
  • force_init (bool) – Defaults to False.
get_states(merge_multi_context=True)

Get states from all devices

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the states will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [out1, out2]. Otherwise, it
  • is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output
  • elements are NDArray.
set_states(states=None, value=None)

Set value for states. Only one of states & value can be specified.

Parameters:
  • states (list of list of NDArrays) – Source states arrays formatted like [[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]].
  • value (number) – A single scalar value for all state arrays.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')

Binding for a BucketingModule means setting up the buckets and binding the executor for the default bucket key. Executors corresponding to other keys are bound afterwards with switch_bucket.

Parameters:
  • data_shapes (list of (str, tuple)) – This should correspond to the symbol for the default bucket.
  • label_shapes (list of (str, tuple)) – This should correspond to the symbol for the default bucket.
  • for_training (bool) – Default is True.
  • inputs_need_grad (bool) – Default is False.
  • force_rebind (bool) – Default is False.
  • shared_module (BucketingModule) – Default is None. This value is currently not used.
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
  • bucket_key (str (or any python object)) – bucket key for binding. by default use the default_bucket_key
switch_bucket(bucket_key, data_shapes, label_shapes=None)

Switch to a different bucket. This will change self.curr_module.

Parameters:
  • bucket_key (str (or any python object)) – The key of the target bucket.
  • data_shapes (list of (str, tuple)) – Typically data_batch.provide_data.
  • label_shapes (list of (str, tuple)) – Typically data_batch.provide_label.
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers.

Parameters:
  • kvstore (str or KVStore) – Defaults to ‘local’.
  • optimizer (str or Optimizer) – Defaults to ‘sgd’
  • optimizer_params (dict) – Defaults to ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Defaults to False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
prepare(data_batch)

Prepare a data batch for forward.

Parameters:data_batch (DataBatch) –
forward(data_batch, is_train=None)

Forward computation.

Parameters:
  • data_batch (DataBatch) –
  • is_train (bool) – Defaults to None, in which case is_train is take as self.for_training.
backward(out_grads=None)

Backward computation.

update()

Update parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

get_outputs(merge_multi_context=True)

Get outputs from a previous forward computation.

Parameters:merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [out1, out2]. Otherwise, it
  • is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output
  • elements are numpy arrays.
get_input_grads(merge_multi_context=True)

Get the gradients with respect to the inputs of the module.

Parameters:merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it
  • is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output
  • elements are NDArray.
update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
symbol

The symbol of the current bucket being used.

install_monitor(mon)

Install monitor on all executors

class mxnet.module.SequentialModule(logger=<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>)

A SequentialModule is a container module that can chain multiple modules together.

Note building a computation graph with this kind of imperative container is less flexible and less efficient than the symbolic graph. So this should be only used as a handy utility.

add(module, **kwargs)

Add a module to the chain.

Parameters:
  • module (BaseModule) – The new module to add.
  • kwargs (**keywords) –

    All the keyword arguments are saved as meta information for the added module. The currently known meta includes

    • take_labels: indicating whether the module expect to take labels when doing computation. Note any module in the chain can take labels (not necessarily only the top most one), and they all take the same labels passed from the original data batch for the SequentialModule.
Returns:

This function returns self to allow us to easily chain a series of add calls.

Return type:

self

Examples

An example of addinging two modules to a chain::
>>> seq_mod = mx.mod.SequentialModule()
>>> seq_mod.add(mod1)
>>> seq_mod.add(mod2)
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

Get data shapes.

Returns:A list of (name, shape) pairs. The data shapes of the first module is the data shape of a SequentialModule.
Return type:list
label_shapes

Get label shapes.

Returns:A list of (name, shape) pairs. The return value could be None if the module does not need labels, or if the module is not bound for training (in this case, label information is not available).
Return type:list
output_shapes

Get output shapes.

Returns:A list of (name, shape) pairs. The output shapes of the last module is the output shape of a SequentialModule.
Return type:list
get_params()

Get current parameters.

Returns:each a dictionary of name to parameters (in NDArray) mapping. This is a merged dictionary of all the parameters in the modules.
Return type:(arg_params, aux_params)
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize parameters.

Parameters:
  • initializer (Initializer) –
  • arg_params (dict) – Default None. Existing parameters. This has higher priority than initializer.
  • aux_params (dict) – Default None. Existing auxiliary states. This has higher priority than initializer.
  • allow_missing (bool) – Allow missing values in arg_params and aux_params (if not None). In this case, missing values will be filled with initializer.
  • force_init (bool) – Default False.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. Currently shared module is not supported for SequentialModule.
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default (('learning_rate', 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
forward(data_batch, is_train=None)

Forward computation.

Parameters:
  • data_batch (DataBatch) –
  • is_train (bool) – Default is None, in which case is_train is take as self.for_training.
backward(out_grads=None)

Backward computation.

update()

Update parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

get_outputs(merge_multi_context=True)

Get outputs from a previous forward computation.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are numpy arrays.
Return type:list of NDArray or list of list of NDArray
get_input_grads(merge_multi_context=True)

Get the gradients with respect to the inputs of the module.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements are NDArray.
Return type:list of NDArray or list of list of NDArray
update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
install_monitor(mon)

Install monitor on all executors.

class mxnet.module.PythonModule(data_names, label_names, output_names, logger=<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>)

A convenient module class that implements many of the module APIs as empty functions.

Parameters:
  • data_names (list of str) – Names of the data expected by the module.
  • label_names (list of str) – Names of the labels expected by the module. Could be None if the module does not need labels.
  • output_names (list of str) – Names of the outputs.
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not bound for training, then this should return an empty list []`.

output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

get_params()

Get parameters, those are potentially copies of the the actual parameters used to do computation on the device.

Returns:
  • ({}, {}), a pair of empty dict. Subclass should override this method if
  • contains parameters.
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize the parameters and auxiliary states. By default this function does nothing. Subclass should override this method if contains parameters.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If True, will force re-initialize even if already initialized.
update()

Update parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch. Currently we do nothing here. Subclass should override this method if contains parameters.

update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation. ubclass should override this method if needed.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers. By default we do nothing. Subclass should

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
class mxnet.module.PythonLossModule(name='pyloss', data_names=('data', ), label_names=('softmax_label', ), logger=<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>, grad_func=None)

A convenient module class that implements many of the module APIs as empty functions.

Parameters:
  • name (str) – Names of the module. The outputs will be named [name + ‘_output’].
  • data_names (list of str) – Defaults to ['data']. Names of the data expected by this module. Should be a list of only one name.
  • label_names (list of str) – Default ['softmax_label']. Names of the labels expected by the module. Should be a list of only one name.
  • grad_func (function) – Optional. If not None, should be a function that takes scores and labels, both of type NDArray, and return the gradients with respect to the scores according to this loss function. The return value could be a numpy array or an NDArray.
forward(data_batch, is_train=None)

Forward computation. Here we do nothing but to keep a reference to the scores and the labels so that we can do backward computation.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.
get_outputs(merge_multi_context=True)

Get outputs of the previous forward computation. As a output loss module, we treat the inputs to this module as scores, and simply return them.

Parameters:merge_multi_context (bool) – Should always be True, because we do not use multiple contexts for computing.
backward(out_grads=None)

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
get_input_grads(merge_multi_context=True)

Get the gradients to the inputs, computed in the previous backward computation.

Parameters:merge_multi_context (bool) – Should always be True because we do not use multiple context for computation.
install_monitor(mon)

Install monitor on all executors.