Module Interface

The module API provides an intermediate- and high-level interface for performing computation with neural networks in MXNet. A module is an instance of subclasses of BaseModule. The most widely used module class is simply called Module, which wraps a Symbol and one or more Executors. For a full list of functions, see BaseModule. Each subclass of modules might have some extra interface functions. In this topic, we provide some examples of common use cases. All of the module APIs are in the mxnet.module namespace, simply called mxnet.mod.

Preparing a Module for Computation

To construct a module, refer to the constructors for the specific module class. For example, the Module class takes a Symbol as input:

    import mxnet as mx

    # construct a simple MLP
    data = mx.symbol.Variable('data')
    fc1  = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
    act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu")
    fc2  = mx.symbol.FullyConnected(act1, name = 'fc2', num_hidden = 64)
    act2 = mx.symbol.Activation(fc2, name='relu2', act_type="relu")
    fc3  = mx.symbol.FullyConnected(act2, name='fc3', num_hidden=10)
    out  = mx.symbol.SoftmaxOutput(fc3, name = 'softmax')
 
    # construct the module
    mod = mx.mod.Module(out)

Also specify the data_names and label_names of your Symbol. We’ll skip those parameters because our Symbol follows naming conventions, so the default behavior (data named as data, and label named as softmax_label) is okay. context, which by default is the CPU, is another important parameter. You can specify a GPU context or even a list of GPU contexts if you need data parallelization.

Before you can compute with a module, you need to call bind() to allocate the device memory and init_params() or set_params() to initialize the parameters.

    mod.bind(data_shapes=train_dataiter.provide_data,
         label_shapes=train_dataiter.provide_label)
    mod.init_params()

Now you can compute with the module using functions like forward(), backward(), etc. If you simply want to fit a module, you don’t need to call bind() and init_params() explicitly, because the fit() function automatically calls them if they are needed.

Training, Predicting, and Evaluating

Modules provide high-level APIs for training, predicting, and evaluating. To fit a module, call the fit() function with some DataIters:

    mod = mx.mod.Module(softmax)
    mod.fit(train_dataiter, eval_data=eval_dataiter,
            optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
            num_epoch=n_epoch)

The interface is very similar to the old FeedForward class. You can pass in batch-end callbacks and epoch-end callbacks. To predict with a module, call predict() with a DataIter:

    mod.predict(val_dataiter)

The module collects and returns all of the prediction results. For more details about the format of the return values, see the documentation for the predict() function.

When prediction results might be too large to fit in memory, use the iter_predict API:

    for preds, i_batch, batch in mod.iter_predict    (val_dataiter):
        pred_label = preds[0].asnumpy().argmax(axis=1)
        label = batch.label[0].asnumpy().astype('int32')
        # do something...

If you need to evaluate on a test set and don’t need the prediction output, call the score() function with a DataIter and an EvalMetric:

    mod.score(val_dataiter, metric)

This runs predictions on each batch in the provided DataIter and computes the evaluation score using the provided EvalMetric. The evaluation results are stored in metric so that you can query later.

Saving and Loading Module Parameters

To save the module parameters in each training epoch, use a checkpoint callback:

    model_prefix = 'mymodel'
    checkpoint = mx.callback.do_checkpoint(model_prefix)

    mod.fit(..., epoch_end_callback=checkpoint)

To load the saved module parameters, call the load_checkpoint function:

    sym, arg_params, aux_params = \
        mx.model.load_checkpoint(model_prefix, n_epoch_load)

    # assign parameters
    mod.set_params(arg_params, aux_params)

To resume training from a saved checkpoint, instead of calling set_params(), directly call fit(), passing the loaded parameters, so that fit() knows to start from those parameters instead of initializing randomly:

    mod.fit(..., arg_params=arg_params, aux_params=aux_params,
        begin_epoch=n_epoch_load)

Pass in begin_epoch so that fit() knows to resume from a saved epoch.

Module Interface API

BaseModule Interface API

BaseModule defines an API for modules.

class mxnet.module.base_module.BaseModule(logger=<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>)

The base class of a modules. A module represents a computation component. The design purpose of a module is that it abstract a computation “machine”, that one can run forward, backward, update parameters, etc. We aim to make the APIs easy to use, especially in the case when we need to use imperative API to work with multiple modules (e.g. stochastic depth network).

A module has several states:

  • Initial state. Memory is not allocated yet, not ready for computation yet.
  • Binded. Shapes for inputs, outputs, and parameters are all known, memory allocated, ready for computation.
  • Parameter initialized. For modules with parameters, doing computation before initializing the parameters might result in undefined outputs.
  • Optimizer installed. An optimizer can be installed to a module. After this, the parameters of the module can be updated according to the optimizer after gradients are computed (forward-backward).

In order for a module to interact with others, a module should be able to report the following information in its raw stage (before binded)

  • data_names: list of string indicating the names of required data.
  • output_names: list of string indicating the names of required outputs.

And also the following richer information after binded:

  • state information
    • binded: bool, indicating whether the memory buffers needed for computation has been allocated.
    • for_training: whether the module is binded for training (if binded).
    • params_initialized: bool, indicating whether the parameters of this modules has been initialized.
    • optimizer_initialized: ‘bool`, indicating whether an optimizer is defined and initialized.
    • inputs_need_grad: bool, indicating whether gradients with respect to the input data is needed. Might be useful when implementing composition of modules.
  • input/output information
    • data_shapes: a list of (name, shape). In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelization, the data arrays might not be of the same shape as viewed from the external world.
    • label_shapes: a list of (name, shape). This might be [] if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not binded for training.
    • output_shapes: a list of (name, shape) for outputs of the module.
  • parameters (for modules with parameters)
    • get_params(): return a tuple (arg_params, aux_params). Each of those is a dictionary of name to NDArray mapping. Those NDArray always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters. Therefore, modifying
    • set_params(arg_params, aux_params): assign parameters to the devices doing the computation.
    • init_params(...): a more flexible interface to assign or initialize the parameters.
  • setup
    • bind(): prepare environment for computation.
    • init_optimizer(): install optimizer for parameter updating.
  • computation
    • forward(data_batch): forward operation.
    • backward(out_grads=None): backward operation.
    • update(): update parameters according to installed optimizer.
    • get_outputs(): get outputs of the previous forward operation.
    • get_input_grads(): get the gradients with respect to the inputs computed in the previous backward operation.
    • update_metric(metric, labels): update performance metric for the previous forward
      computed results.
  • other properties (mostly for backward compatability)
    • symbol: the underlying symbolic graph for this module (if any) This property is not necessarily constant. For example, for BucketingModule, this property is simply the current symbol being used. For other modules, this value might not be well defined.

When those intermediate-level API are implemented properly, the following high-level API will be automatically available for a module:

  • fit: train the module parameters on a data set
  • predict: run prediction on a data set and collect outputs
  • score: run prediction on a data set and evaluate performance

Examples

An example of creating a mxnet module::
>>> import mxnet as mx
>>> data = mx.symbol.Variable('data')
>>> fc1  = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
>>> act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu")
>>> fc2  = mx.symbol.FullyConnected(act1, name = 'fc2', num_hidden = 64)
>>> act2 = mx.symbol.Activation(fc2, name='relu2', act_type="relu")
>>> fc3  = mx.symbol.FullyConnected(act2, name='fc3', num_hidden=10)
>>> out  = mx.symbol.SoftmaxOutput(fc3, name = 'softmax')
>>> mod = mx.mod.Module(out)
forward_backward(data_batch)

A convenient function that calls both forward and backward.

score(eval_data, eval_metric, num_batch=None, batch_end_callback=None, score_end_callback=None, reset=True, epoch=0)

Run prediction on eval_data and evaluate the performance according to eval_metric.

Parameters:
  • eval_data (DataIter) –
  • eval_metric (EvalMetric) –
  • num_batch (int) – Number of batches to run. Default is None, indicating run until the DataIter finishes.
  • batch_end_callback (function) – Could also be a list of functions.
  • reset (bool) – Default True, indicating whether we should reset eval_data before starting evaluating.
  • epoch (int) – Default 0. For compatibility, this will be passed to callbacks (if any). During training, this will correspond to the training epoch number.

Examples

An example of using score for prediction::
>>> #Evaluate accuracy on val_dataiter
>>> metric = mx.metric.Accuracy()
>>> mod.score(val_dataiter, metric)
iter_predict(eval_data, num_batch=None, reset=True)

Iterate over predictions.

for pred, i_batch, batch in module.iter_predict(eval_data):
# pred is a list of outputs from the module # i_batch is a integer # batch is the data batch from the data iterator
Parameters:
  • eval_data (DataIter) –
  • num_batch (int) – Default is None, indicating running all the batches in the data iterator.
  • reset (bool) – Default is True, indicating whether we should reset the data iter before start doing prediction.
predict(eval_data, num_batch=None, merge_batches=True, reset=True, always_output_list=False)

Run prediction and collect the outputs.

Parameters:
  • eval_data (DataIter) –
  • num_batch (int) – Default is None, indicating running all the batches in the data iterator.
  • merge_batches (bool) – Default is True, see the doc for return values.
  • reset (bool) – Default is True, indicating whether we should reset the data iter before start doing prediction.
  • always_output_list (bool) – Default is False, see the doc for return values.
Returns:

  • When merge_batches is True (by default), the return value will be a list
  • [out1, out2, out3]. Where each element is concatenation of the outputs for
  • all the mini-batches. If further that always_output_list is False (by default),
  • then in the case of a single output, out1 is returned instead of [out1].
  • When merge_batches is False, the return value will be a nested list like
  • [[out1_batch1, out2_batch1], [out1_batch2], ...]. This mode is useful because
  • in some cases (e.g. bucketing), the module does not necessarily produce the same
  • number of outputs.
  • The objects in the results are `NDArray`s. If you need to work with numpy array,
  • just call .asnumpy() on each of the NDArray.

Examples

An example of using predict for prediction::
>>> #Predict on the first 10 batches of val_dataiter
>>> mod.predict(eval_data=val_dataiter, num_batch=10)
fit(train_data, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), eval_end_callback=None, eval_batch_end_callback=None, initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_rebind=False, force_init=False, begin_epoch=0, num_epoch=None, validation_metric=None, monitor=None)

Train the module parameters.

Parameters:
  • train_data (DataIter) –
  • eval_data (DataIter) – If not None, will be used as validation set and evaluate the performance after each epoch.
  • eval_metric (str or EvalMetric) – Default ‘acc’. The performance measure used to display during training.
  • epoch_end_callback (function or list of function) – Each callback will be called with the current epoch, symbol, arg_params and aux_params.
  • batch_end_callback (function or list of function) – Each callback will be called with a BatchEndParam.
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The parameters for the optimizer constructor. The default value is not a dict, just to avoid pylint warning on dangerous default values.
  • eval_end_callback (function or list of function) – These will be called at the end of each full evaluation, with the metrics over the entire evaluation set.
  • eval_batch_end_callback (function or list of function) – These will be called at the end of each minibatch during evaluation
  • initializer (Initializer) – Will be called to initialize the module parameters if not already initialized.
  • arg_params (dict) – Default None, if not None, should be existing parameters from a trained model or loaded from a checkpoint (previously saved model). In this case, the value here will be used to initialize the module parameters, unless they are already initialized by the user via a call to init_params or fit. arg_params has higher priority to initializer.
  • aux_params (dict) – Default None. Similar to arg_params, except for auxiliary states.
  • allow_missing (bool) – Default False. Indicate whether we allow missing parameters when arg_params and aux_params are not None. If this is True, then the missing parameters will be initialized via the initializer.
  • force_rebind (bool) – Default False. Whether to force rebinding the executors if already binded.
  • force_init (bool) – Default False. Indicate whether we should force initialization even if the parameters are already initialized.
  • begin_epoch (int) – Default 0. Indicate the starting epoch. Usually, if we are resuming from a checkpoint saved at a previous training phase at epoch N, then we should specify this value as N+1.
  • num_epoch (int) – Number of epochs to run training.

Examples

An example of using fit for training::
>>> #Assume training dataIter and validation dataIter are ready
>>> mod.fit(train_data=train_dataiter, eval_data=val_dataiter,
            optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
            num_epoch=10)
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not binded for training, then this should return an empty list [].

output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

get_params()

Get parameters, those are potentially copies of the the actual parameters used to do computation on the device.

Returns:
Return type:(arg_params, aux_params), a pair of dictionary of name to value mapping.

Examples

An example of getting module parameters::
>>> print mod.get_params()
({'fc2_weight': <NDArray 64x128 @cpu(0)>, 'fc1_weight': <NDArray 128x100 @cpu(0)>,
'fc3_bias': <NDArray 10 @cpu(0)>, 'fc3_weight': <NDArray 10x64 @cpu(0)>,
'fc2_bias': <NDArray 64 @cpu(0)>, 'fc1_bias': <NDArray 128 @cpu(0)>}, {})
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize the parameters and auxiliary states.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If true, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If true, will force re-initialize even if already initialized.

Examples

An example of initializing module parameters::
>>> mod.init_params()
set_params(arg_params, aux_params, allow_missing=False, force_init=True)

Assign parameter and aux state values.

Parameters:
  • arg_params (dict) – Dictionary of name to value (NDArray) mapping.
  • aux_params (dict) – Dictionary of name to value (NDArray) mapping.
  • allow_missing (bool) – If true, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If true, will force re-initialize even if already initialized.

Examples

An example of setting module parameters::
>>> sym, arg_params, aux_params =             >>>     mx.model.load_checkpoint(model_prefix, n_epoch_load)
>>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
save_params(fname)

Save model parameters to file.

Parameters:fname (str) – Path to output param file.

Examples

An example of saving module parameters::
>>> mod.save_params('myfile')
load_params(fname)

Load model parameters from file.

Parameters:fname (str) – Path to input param file.

Examples

An example of loading module parameters
>>> mod.load_params('myfile')
install_monitor(mon)

Install monitor on all executors

forward(data_batch, is_train=None)

Forward computation.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.

Examples

An example of forward computation::
>>> from collections import namedtuple
>>> Batch = namedtuple('Batch', ['data'])
>>> mod.bind(data_shapes=[('data', (1, 10, 10))])
>>> mod.init_params()
>>> data1 = [mx.nd.ones([1, 10, 10])]
>>> mod.forward(Batch(data1))
>>> print mod.get_outputs()[0].asnumpy()
[[ 0.09999977  0.10000153  0.10000716  0.10000195  0.09999853  0.09999743
   0.10000272  0.10000113  0.09999088  0.09999888]]
backward(out_grads=None)

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.

Examples

An example of backward computation::
>>> mod.backward()
>>> print mod.get_input_grads()[0].asnumpy()
[[[  1.10182791e-05   5.12257748e-06   4.01927764e-06   8.32566820e-06
    -1.59775993e-06   7.24269375e-06   7.28067835e-06  -1.65902311e-05
     5.46342608e-06   8.44196393e-07]
     ...]]
get_outputs(merge_multi_context=True)

Get outputs of the previous forward computation.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [out1, out2]. Otherwise, it
  • is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output
  • elements are NDArray. When merge_multi_context is False, those NDArray
  • might live on different devices.

Examples

An example of getting forward output::
>>> print mod.get_outputs()[0].asnumpy()
[[ 0.09999977  0.10000153  0.10000716  0.10000195  0.09999853  0.09999743
   0.10000272  0.10000113  0.09999088  0.09999888]]
get_input_grads(merge_multi_context=True)

Get the gradients to the inputs, computed in the previous backward computation.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the gradients will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it
  • is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output
  • elements are NDArray. When merge_multi_context is False, those NDArray
  • might live on different devices.

Examples

An example of getting input gradients::
>>> print mod.get_input_grads()[0].asnumpy()
[[[  1.10182791e-05   5.12257748e-06   4.01927764e-06   8.32566820e-06
    -1.59775993e-06   7.24269375e-06   7.28067835e-06  -1.65902311e-05
    5.46342608e-06   8.44196393e-07]
    ...]]
update()

Update parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

Examples

An example of updating module parameters::
>>> mod.init_optimizer(kvstore='local', optimizer='sgd',
>>>                    optimizer_params=(('learning_rate', 0.01), ))
>>> mod.backward()
>>> mod.update()
>>> print mod.get_params()[0]['fc3_weight'].asnumpy()
[[  5.86930104e-03   5.28078526e-03  -8.88729654e-03  -1.08308345e-03
    6.13054074e-03   4.27560415e-03   1.53817423e-03   4.62131854e-03
    4.69872449e-03  -2.42400169e-03   9.94111411e-04   1.12386420e-03
    ...]]
update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.

Examples

An example of updating evaluation metric::
>>> mod.forward(data_batch)
>>> mod.update_metric(metric, data_batch.label)
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already binded. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).

Examples

An example of binding symbols::
>>> mod.bind(data_shapes=[('data', (1, 10, 10))])
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.

Examples

An example of initializing optimizer::
>>> mod.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate', 0.005),))
symbol

Get the symbol associated with this module.

Except for Module, for other types of modules (e.g. BucketingModule), this property might not be a constant throughout its life time. Some modules might not even be associated with any symbols.

Built-in Modules API

A Module implement the BaseModule API by wrapping a Symbol and one or more Executor for data parallelization.

members:

A BucketingModule implement the BaseModule API, and allows multiple symbols to be used depending on the bucket_key provided by each different mini-batch of data.

members:

SequentialModule is a container module that chains a number of modules together.

class mxnet.module.sequential_module.SequentialModule(logger=<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>)

A SequentialModule is a container module that can chain multiple modules together. Note building a computation graph with this kind of imperative container is less flexible and less efficient than the symbolic graph. So this should be only used as a handy utility.

add(module, **kwargs)

Add a module to the chain.

Parameters:
  • module (BaseModule) – The new module to add.
  • kwargs (**keywords) –

    All the keyword arguments are saved as meta information for the added module. The currently known meta includes

    • take_labels: indicating whether the module expect to take labels when doing computation. Note any module in the chain can take labels (not necessarily only the top most one), and they all take the same labels passed from the original data batch for the SequentialModule.
Returns:

  • This function returns self to allow us to easily chain a
  • series of add calls.

Examples

An example of addinging two modules to a chain::
>>> seq_mod = mx.mod.SequentialModule()
>>> seq_mod.add(mod1)
>>> seq_mod.add(mod2)
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

Get data shapes. :returns: * A list of (name, shape) pairs. The data shapes of the

  • first module is the data shape of a SequentialModule.
label_shapes

Get label shapes. :returns: * A list of (name, shape) pairs. The return value could be None if

  • the module does not need labels, or if the module is not binded for
  • training (in this case, label information is not available).
output_shapes

Get output shapes. :returns: * A list of (name, shape) pairs. The output shapes of the last

  • module is the output shape of a SequentialModule.
get_params()

Get current parameters. :returns: * (arg_params, aux_params), each a dictionary of name to parameters (in

  • NDArray) mapping. This is a merged dictionary of all the parameters
  • in the modules.
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize parameters.

Parameters:
  • initializer (Initializer) –
  • arg_params (dict) – Default None. Existing parameters. This has higher priority than initializer.
  • aux_params (dict) – Default None. Existing auxiliary states. This has higher priority than initializer.
  • allow_missing (bool) – Allow missing values in arg_params and aux_params (if not None). In this case, missing values will be filled with initializer.
  • force_init (bool) – Default False.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already binded. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. Currently shared module is not supported for SequentialModule.
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
forward(data_batch, is_train=None)

Forward computation.

Parameters:
  • data_batch (DataBatch) –
  • is_train (bool) – Default is None, in which case is_train is take as self.for_training.
backward(out_grads=None)

Backward computation.

update()

Update parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

get_outputs(merge_multi_context=True)

Get outputs from a previous forward computation.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [out1, out2]. Otherwise, it
  • is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output
  • elements are numpy arrays.
get_input_grads(merge_multi_context=True)

Get the gradients with respect to the inputs of the module.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:
  • If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it
  • is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output
  • elements are NDArray.
update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
install_monitor(mon)

Install monitor on all executors

Writing Modules in Python

Provide some handy classes for user to implement a simple computation module in Python easily.

class mxnet.module.python_module.PythonModule(data_names, label_names, output_names, logger=<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>)

A convenient module class that implements many of the module APIs as empty functions.

Parameters:
  • data_names (list of str) – Names of the data expected by the module.
  • label_names (list of str) – Names of the labels expected by the module. Could be None if the module does not need labels.
  • output_names (list of str) – Names of the outputs.
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not binded for training, then this should return an empty list [].

output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

get_params()

Get parameters, those are potentially copies of the the actual parameters used to do computation on the device.

Returns:
  • ({}, {}), a pair of empty dict. Subclass should override this method if
  • contains parameters.
init_params(initializer=<mxnet.initializer.Uniform object>, arg_params=None, aux_params=None, allow_missing=False, force_init=False)

Initialize the parameters and auxiliary states. By default this function does nothing. Subclass should override this method if contains parameters.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If true, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If true, will force re-initialize even if already initialized.
update()

Update parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch. Currently we do nothing here. Subclass should override this method if contains parameters.

update_metric(eval_metric, labels)

Evaluate and accumulate evaluation metric on outputs of the last forward computation. ubclass should override this method if needed.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')

Bind the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already binded. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)

Install and initialize optimizers. By default we do nothing. Subclass should

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
class mxnet.module.python_module.PythonLossModule(name='pyloss', data_names=('data', ), label_names=('softmax_label', ), logger=<module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>, grad_func=None)

A convenient module class that implements many of the module APIs as empty functions.

Parameters:
  • name (str) – Names of the module. The outputs will be named [name + ‘_output’].
  • data_names (list of str) – Default [‘data’]. Names of the data expected by this module. Should be a list of only one name.
  • label_names (list of str) – Default [‘softmax_label’]. Names of the labels expected by the module. Should be a list of only one name.
  • grad_func (function) – Optional. If not None, should be a function that takes scores and labels, both of type NDArray, and return the gradients with respect to the scores according to this loss function. The return value could be a numpy array or an NDArray.
forward(data_batch, is_train=None)

Forward computation. Here we do nothing but to keep a reference to the scores and the labels so that we can do backward computation.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.
get_outputs(merge_multi_context=True)

Get outputs of the previous forward computation. As a output loss module, we treat the inputs to this module as scores, and simply return them.

Parameters:merge_multi_context (bool) – Should always be True, because we do not use multiple contexts for computing.
backward(out_grads=None)

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
get_input_grads(merge_multi_context=True)

Get the gradients to the inputs, computed in the previous backward computation.

Parameters:merge_multi_context (bool) – Should always be True because we do not use multiple context for computation.
install_monitor(mon)

Install monitor on all executors

Next Steps

  • See Model API for an alternative simple high-level interface for training neural networks.
  • See Symbolic API for operations on NDArrays that assemble neural networks from layers.
  • See IO Data Loading API for parsing and loading data.
  • See NDArray API for vector/matrix/tensor operations.
  • See KVStore API for multi-GPU and multi-host distributed training.