Model API

The model API provides a simplified way to train neural networks using common best practices. It’s a thin wrapper built on top of the ndarray and symbolic modules that make neural network training easy.

Topics:

Train the Model

To train a model, perform two steps: configure the model using the symbol parameter, then call model.Feedforward.create to create the model. The following example creates a two-layer neural network.

    # configure a two layer neuralnetwork
    data = mx.symbol.Variable('data')
    fc1 = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
    act1 = mx.symbol.Activation(fc1, name='relu1', act_type='relu')
    fc2 = mx.symbol.FullyConnected(act1, name='fc2', num_hidden=64)
    softmax = mx.symbol.SoftmaxOutput(fc2, name='sm')
    # create a model
    model = mx.model.FeedForward.create(
         softmax,
         X=data_set,
         num_epoch=num_epoch,
         learning_rate=0.01)

You can also use the scikit-learn-style construct and fit function to create a model.

    # create a model using sklearn-style two-step way
    model = mx.model.FeedForward(
         softmax,
         num_epoch=num_epoch,
         learning_rate=0.01)

    model.fit(X=data_set)

For more information, see Model API Reference.

Save the Model

After the job is done, save your work. To save the model, you can directly pickle it with Python. We also provide save and load functions.

    # save a model to mymodel-symbol.json and mymodel-0100.params
    prefix = 'mymodel'
    iteration = 100
    model.save(prefix, iteration)

    # load model back
    model_loaded = mx.model.FeedForward.load(prefix, iteration)

The advantage of these two save and load functions are that they are language agnostic. You should be able to save and load directly into cloud storage, such as Amazon S3 and HDFS.

Periodic Checkpointing

We recommend checkpointing your model after each iteration. To do this, add a checkpoint callback do_checkpoint(path) to the function. The training process automatically checkpoints the specified location after each iteration.

    prefix='models/chkpt'
    model = mx.model.FeedForward.create(
         softmax,
         X=data_set,
         iter_end_callback=mx.callback.do_checkpoint(prefix),
         ...)

You can load the model checkpoint later using Feedforward.load.

Use Multiple Devices

Set ctx to the list of devices that you want to train on.

    devices = [mx.gpu(i) for i in range(num_device)]
    model = mx.model.FeedForward.create(
         softmax,
         X=dataset,
         ctx=devices,
         ...)

Training occurs in parallel on the GPUs that you specify.

Initializer API Reference

Weight initializer.

class mxnet.initializer.InitDesc

Descriptor for the initialization pattern.

name
: str
Name of variable.
attrs
: dict of str to str
Attributes of this variable taken from Symbol.attr_dict.
global_init
: Initializer
Global initializer to fallback to.
mxnet.initializer.register(klass)

Register an intializer to the initializer factory.

class mxnet.initializer.Initializer(**kwargs)

The base class of an initializer.

dumps()

Save the initializer to string

class mxnet.initializer.Load(param, default_init=None, verbose=False)

Initialize by loading data from file or dict.

Parameters:
  • param (str or dict of str->NDArray) – Parameter file or dict mapping name to NDArray.
  • default_init (Initializer) – Default initializer when name is not found in param.
  • verbose (bool) – Log source when initializing.
class mxnet.initializer.Mixed(patterns, initializers)

Initialize parameters using multiple initializers.

Parameters:
  • patterns (list of str) – List of regular expressions matching parameter names.
  • initializers (list of Initializer) – List of initializers corresponding to patterns.

Example

>>> # Given 'module', an instance of 'mxnet.module.Module', initialize biases to zero
... # and every other parameter to random values with uniform distribution.
...
>>> init = mx.initializer.Mixed(['bias', '.*'], [mx.init.Zero(), mx.init.Uniform(0.1)])
>>> module.init_params(init)
>>>
>>> for dictionary in module.get_params():
...     for key in dictionary:
...         print(key)
...         print(dictionary[key].asnumpy())
...
fullyconnected1_weight
[[ 0.0097627   0.01856892  0.04303787]]
fullyconnected1_bias
[ 0.]
class mxnet.initializer.Zero

Initialize the weight to 0.

class mxnet.initializer.One

Initialize the weight to 1.

class mxnet.initializer.Constant(value)

Initialize the weight to a scalar value.

class mxnet.initializer.Uniform(scale=0.07)

Initialize the weight with value uniformly sampled from [-scale, scale].

Parameters:scale (float, optional) – The scale of uniform distribution.
class mxnet.initializer.Normal(sigma=0.01)

Initialize the weight with value sampled according to normal(0, sigma).

Parameters:sigma (float, optional) – Standard deviation for gaussian distribution.
class mxnet.initializer.Orthogonal(scale=1.414, rand_type='uniform')

Initialize weight as orthogonal matrix.

This initializer implements Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, available at https://arxiv.org/abs/1312.6120.

Parameters:
  • scale (float optional) – Scaling factor of weight.
  • rand_type (string optional) – Use “uniform” or “normal” random number to initialize weight.
class mxnet.initializer.Xavier(rnd_type='uniform', factor_type='avg', magnitude=3)

Initialize the weight with Xavier or other similar schemes.

Parameters:
  • rnd_type (str, optional) – Random generator type, can be `gaussian or uniform.
  • factor_type (str, optional) – Can be avg, in, or out.
  • magnitude (float, optional) – Scale of random number range.
class mxnet.initializer.MSRAPrelu(factor_type='avg', slope=0.25)

Initialize the weight according to a MSRA paper.

This initializer implements Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, available at https://arxiv.org/abs/1502.01852.

Parameters:
  • factor_type (str, optional) – Can be avg, in, or out.
  • slope (float, optional) – initial slope of any PReLU (or similar) nonlinearities.
class mxnet.initializer.Bilinear

Initialize weight for upsampling layers.

class mxnet.initializer.LSTMBias(forget_bias)

Initialize all bias of an LSTMCell to 0.0 except for the forget gate whose bias is set to custom value.

Parameters:
  • forget_bias (float, bias for the forget gate.) –
  • et al. 2015 recommends setting this to 1.0. (Jozefowicz) –
class mxnet.initializer.FusedRNN(init, num_hidden, num_layers, mode, bidirectional=False, forget_bias=1.0)

Initialize parameters for fused rnn layers.

Parameters:
  • init (Initializer) – intializer applied to unpacked weights. Fall back to global initializer if None.
  • num_hidden (int) – should be the same with arguments passed to FusedRNNCell.
  • num_layers (int) – should be the same with arguments passed to FusedRNNCell.
  • mode (str) – should be the same with arguments passed to FusedRNNCell.
  • bidirectional (bool) – should be the same with arguments passed to FusedRNNCell.
  • forget_bias (float) – should be the same with arguments passed to FusedRNNCell.

Evaluation Metric API Reference

Online evaluation metric module.

mxnet.metric.check_label_shapes(labels, preds, shape=0)

Check to see if the two arrays are the same size.

class mxnet.metric.EvalMetric(name, num=None)

Base class of all evaluation metrics.

update(label, pred)

Update the internal evaluation.

Parameters:
  • labels (list of NDArray) – The labels of the data.
  • preds (list of NDArray) – Predicted values.
reset()

Clear the internal statistics to initial state.

get()

Get the current evaluation result.

Returns:
  • name (str) – Name of the metric.
  • value (float) – Value of the evaluation.
get_name_value()

Get zipped name and value pairs.

class mxnet.metric.CompositeEvalMetric(**kwargs)

Manage multiple evaluation metrics.

add(metric)

Add a child metric.

get_metric(index)

Get a child metric.

class mxnet.metric.Accuracy

Calculate accuracy.

class mxnet.metric.TopKAccuracy(**kwargs)

Calculate top k predictions accuracy.

class mxnet.metric.F1

Calculate the F1 score of a binary classification problem.

class mxnet.metric.Perplexity(ignore_label, axis=-1)

Calculate perplexity.

Parameters:
  • ignore_label (int or None) – Index of invalid label to ignore when counting. Usually should be -1. Include all entries if None.
  • axis (int (default -1)) – The axis from prediction that was used to compute softmax. By default use the last axis.
class mxnet.metric.MAE

Calculate Mean Absolute Error (MAE) loss.

class mxnet.metric.MSE

Calculate Mean Squared Error (MSE) loss.

class mxnet.metric.RMSE

Calculate Root Mean Squred Error (RMSE) loss.

class mxnet.metric.CrossEntropy(eps=1e-08)

Calculate Cross Entropy loss.

class mxnet.metric.Torch(name='torch')

Dummy metric for torch criterions.

class mxnet.metric.Caffe

Dummy metric for caffe criterions

class mxnet.metric.CustomMetric(feval, name=None, allow_extra_outputs=False)

Custom evaluation metric that takes a NDArray function.

Parameters:
  • feval (callable(label, pred)) – Customized evaluation function.
  • name (str, optional) – The name of the metric.
  • allow_extra_outputs (bool) – If true, the prediction outputs can have extra outputs. This is useful in RNN, where the states are also produced in outputs for forwarding.
mxnet.metric.np(numpy_feval, name=None, allow_extra_outputs=False)

Create a customized metric from numpy function.

Parameters:
  • numpy_feval (callable(label, pred)) – Customized evaluation function. This will get called with the labels and predictions for a minibatch, each as NumPy arrays. This function should return a single float.
  • name (str, optional) – The name of the metric.
  • allow_extra_outputs (bool) – If true, the prediction outputs can have extra outputs. This is useful in RNN, where the states are also produced in outputs for forwarding.
mxnet.metric.create(metric, **kwargs)

Create an evaluation metric.

Parameters:metric (str or callable) – The name of the metric, or a function providing statistics given pred, label NDArray.

Optimizer API Reference

Weight updating functions.

class mxnet.optimizer.Optimizer(rescale_grad=1.0, param_idx2name=None, wd=0.0, clip_gradient=None, learning_rate=0.01, lr_scheduler=None, sym=None, begin_num_update=0)

The base class inherited by all optimizers.

Parameters:
  • rescale_grad (float, optional) – Multiply the gradient with rescale_grad before updating. Often choose to be 1.0/batch_size.
  • param_idx2name (dict from int to string, optional) – A dictionary that maps int index to string name.
  • clip_gradient (float, optional) – Clip the gradient by projecting onto the box [-clip_gradient, clip_gradient].
  • learning_rate (float, optional) – The initial learning rate.
  • lr_scheduler (LRScheduler, optional) – The learning rate scheduler.
  • wd (float, optional) – The weight decay (or L2 regularization) coefficient. Modifies objective by adding a penalty for having large weights.
  • sym (Symbol, optional) – The Symbol this optimizer is applying to.
  • begin_num_update (int, optional) – The initial number of updates
static register(klass)

Register a new optimizer.

Once an optimizer is registered, we can create an instance of this optimizer with create_optimizer later.

Examples

>>> @mx.optimizer.Optimizer.register
... class MyOptimizer(mx.optimizer.Optimizer):
...     pass
>>> optim = mx.optimizer.Optimizer.create_optimizer('MyOptimizer')
>>> print(type(optim))
<class '__main__.MyOptimizer'>
static create_optimizer(name, **kwargs)

Instantiate an optimizer with a given name and kwargs.

Notes

We can use the alias create for Optimizer.create_optimizer

Parameters:
  • name (str) – Name of the optimizer. Should be the name of a subclass of Optimizer. Case insensitive.
  • kwargs (dict) – Parameters for the optimizer.
Returns:

An instantiated optimizer.

Return type:

Optimizer

Examples

>>> sgd = mx.optimizer.Optimizer.create_optimizer('sgd')
>>> type(sgd)
<class 'mxnet.optimizer.SGD'>
>>> adam = mx.optimizer.create('adam', learning_rate=.1)
>>> type(adam)
<class 'mxnet.optimizer.Adam'>
create_state(index, weight)

Create auxiliary state for a given weight

Some optimizers require additional states, e.g. as momentum, in addition to gradients in order to update weights. This function creates state for a given weight which will be used in update. This function is called only once for each weight.

Parameters:
  • index (int) – An unique index to identify the weight.
  • weight (NDArray) – The weight.
Returns:

state – The state associated with the weight.

Return type:

any obj

update(index, weight, grad, state)

Update the weight given the corresponding gradient and state.

Parameters:
  • index (int) – An unique index to identify the weight.
  • weight (NDArray) – The weight
  • grad (NDArray) – The gradient of the objective with respect to this weight.
  • state (any obj) – The state associated with this weight.
set_lr_scale(args_lrscale)

[DEPRECATED] set lr scale. Use set_lr_mult instead.

set_lr_mult(args_lr_mult)

Set individual learning rate for each weight.

Parameters:args_lr_mult (dict of string/int to float) – Set the lr multipler for name/index to float. Setting multipler by index is supported for backward compatibility, but we recommend using name and symbol.
set_wd_mult(args_wd_mult)

Set individual weight decay for each weight.

By default wd multipler is 0 for all params whose name doesn’t end with _weight, if param_idx2name is provided.

Parameters:args_wd_mult (dict of string/int to float) – Set the wd multipler for name/index to float. Setting multipler by index is supported for backward compatibility, but we recommend using name and symbol.
mxnet.optimizer.register(klass)

Register a new optimizer.

Once an optimizer is registered, we can create an instance of this optimizer with create_optimizer later.

Examples

>>> @mx.optimizer.Optimizer.register
... class MyOptimizer(mx.optimizer.Optimizer):
...     pass
>>> optim = mx.optimizer.Optimizer.create_optimizer('MyOptimizer')
>>> print(type(optim))
<class '__main__.MyOptimizer'>
class mxnet.optimizer.SGD(momentum=0.0, **kwargs)

The SGD optimizer with momentum and weight decay.

The optimizer updates the weight by:

state = momentum * state + lr * rescale_grad * clip(grad, clip_gradient) + wd * weight weight = weight - state

This optimizer accepts the following parameters in addition to those accepted by Optimizer:

Parameters:momentum (float, optional) – The momentum value.
class mxnet.optimizer.DCASGD(momentum=0.0, lamda=0.04, **kwargs)

The DCASGD optimizer

This class implements the optimizer described in Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning, available at https://arxiv.org/abs/1609.08326

This optimizer accepts the following parameters in addition to those accepted by Optimizer:

Parameters:
  • momentum (float, optional) – The momentum value.
  • lamda (float, optional) – Scale DC value.
class mxnet.optimizer.NAG(**kwargs)

Nesterov accelerated SGD.

This optimizer updates each weight by:

state = momentum * state + grad + wd * weight weight = weight - (lr * (grad + momentum * state))

This optimizer accepts the same arguments as SGD.

class mxnet.optimizer.SGLD(**kwargs)

Stochastic Gradient Riemannian Langevin Dynamics.

This class implements the optimizer described in the paper Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex, available at https://papers.nips.cc/paper/4883-stochastic-gradient-riemannian-langevin-dynamics-on-the-probability-simplex.pdf

class mxnet.optimizer.ccSGD(*args, **kwargs)

[Deprecated] Same as sgd. Left here for backward compatibility.

class mxnet.optimizer.Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, **kwargs)

The Adam optimizer.

This class implements the optimizer described in Adam: A Method for Stochastic Optimization, available at http://arxiv.org/abs/1412.6980

This optimizer accepts the following parameters in addition to those accepted by Optimizer:

Parameters:
  • beta1 (float, optional) – Exponential decay rate for the first moment estimates.
  • beta2 (float, optional) – Exponential decay rate for the second moment estimates.
  • epsilon (float, optional) – Small value to avoid divided by 0.
class mxnet.optimizer.AdaGrad(eps=1e-07, **kwargs)

AdaGrad optimizer

This calss implements the AdaGrad optiizer described in Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, and available at http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

This optimizer accepts the following parameters in addition to those accepted by Optimizer:

Parameters:eps (float, optional) – Small value to avoid division by 0.
class mxnet.optimizer.RMSProp(learning_rate=0.001, gamma1=0.9, gamma2=0.9, epsilon=1e-08, centered=False, clip_weights=None, **kwargs)

The RMSProp optimizer.

Two versions of RMSProp are implemented:

If centered=False, we follow http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf by Tieleman & Hinton, 2012.

If centered=True, we follow http://arxiv.org/pdf/1308.0850v5.pdf (38)-(45) by Alex Graves, 2013.

This optimizer accepts the following parameters in addition to those accepted by Optimizer:

Parameters:
  • gamma1 (float, optional) – Decay factor of moving average for gradient^2.
  • gamma2 (float, optional) – A “momentum” factor. Only used if centered=True.
  • epsilon (float, optional) – Small value to avoid division by 0.
  • centered (bool, optional) – Use Graves’ or Tieleman & Hinton’s version of RMSProp.
  • clip_weights (float, optional) – clip weights into range [-clip_weights, clip_weights]
class mxnet.optimizer.AdaDelta(rho=0.9, epsilon=1e-05, **kwargs)

The AdaDelta optimizer.

This class implements AdaDelta, an optimizer described in ADADELTA: An adaptive learning rate method, available at https://arxiv.org/abs/1212.5701

This optimizer accepts the following parameters in addition to those accepted by Optimizer:

Parameters:
  • rho (float) – Decay rate for both squared gradients and delta.
  • epsilon (float) – Small value to avoid division by 0.
class mxnet.optimizer.Ftrl(lamda1=0.01, learning_rate=0.1, beta=1, **kwargs)

Reference:Ad Click Prediction: a View from the Trenches

Parameters:
  • lamda1 (float, optional) – L1 regularization coefficient.
  • learning_rate (float, optional) – The initial learning rate.
  • beta (float, optional) – Per-coordinate learning rate correlation parameter.
  • eta_{t,i}=frac{learning_rate}{beta+sqrt{sum_{s=1^}tg_{s,i}^t}
mxnet.optimizer.create(name, **kwargs)

Instantiate an optimizer with a given name and kwargs.

Notes

We can use the alias create for Optimizer.create_optimizer

Parameters:
  • name (str) – Name of the optimizer. Should be the name of a subclass of Optimizer. Case insensitive.
  • kwargs (dict) – Parameters for the optimizer.
Returns:

An instantiated optimizer.

Return type:

Optimizer

Examples

>>> sgd = mx.optimizer.Optimizer.create_optimizer('sgd')
>>> type(sgd)
<class 'mxnet.optimizer.SGD'>
>>> adam = mx.optimizer.create('adam', learning_rate=.1)
>>> type(adam)
<class 'mxnet.optimizer.Adam'>
class mxnet.optimizer.Updater(optimizer)

Updater for kvstore.

set_states(states)

Set updater states.

get_states()

Get updater states.

mxnet.optimizer.get_updater(optimizer)

Return a clossure of the updater needed for kvstore.

Parameters:optimizer (Optimizer) – The optimizer.
Returns:updater – The clossure of the updater.
Return type:function

Model API Reference

MXNet model module

mxnet.model.BatchEndParam

alias of BatchEndParams

mxnet.model.save_checkpoint(prefix, epoch, symbol, arg_params, aux_params)

Checkpoint the model data into file.

Parameters:
  • prefix (str) – Prefix of model name.
  • epoch (int) – The epoch number of the model.
  • symbol (Symbol) – The input Symbol.
  • arg_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s weights.
  • aux_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s auxiliary states.

Notes

  • prefix-symbol.json will be saved for symbol.
  • prefix-epoch.params will be saved for parameters.
mxnet.model.load_checkpoint(prefix, epoch)

Load model checkpoint from file.

Parameters:
  • prefix (str) – Prefix of model name.
  • epoch (int) – Epoch number of model we would like to load.
Returns:

  • symbol (Symbol) – The symbol configuration of computation network.
  • arg_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s weights.
  • aux_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s auxiliary states.

Notes

  • Symbol will be loaded from prefix-symbol.json.
  • Parameters will be loaded from prefix-epoch.params.
class mxnet.model.FeedForward(symbol, ctx=None, num_epoch=None, epoch_size=None, optimizer='sgd', initializer=<mxnet.initializer.Uniform object>, numpy_batch_size=128, arg_params=None, aux_params=None, allow_extra_params=False, begin_epoch=0, **kwargs)

Model class of MXNet for training and predicting feedforward nets. This class is designed for a single-data single output supervised network.

Parameters:
  • symbol (Symbol) – The symbol configuration of computation network.
  • ctx (Context or list of Context, optional) – The device context of training and prediction. To use multi GPU training, pass in a list of gpu contexts.
  • num_epoch (int, optional) – Training parameter, number of training epochs(epochs).
  • epoch_size (int, optional) – Number of batches in a epoch. In default, it is set to ceil(num_train_examples / batch_size).
  • optimizer (str or Optimizer, optional) – Training parameter, name or optimizer object for training.
  • initializer (initializer function, optional) – Training parameter, the initialization scheme used.
  • numpy_batch_size (int, optional) – The batch size of training data. Only needed when input array is numpy.
  • arg_params (dict of str to NDArray, optional) – Model parameter, dict of name to NDArray of net’s weights.
  • aux_params (dict of str to NDArray, optional) – Model parameter, dict of name to NDArray of net’s auxiliary states.
  • allow_extra_params (boolean, optional) – Whether allow extra parameters that are not needed by symbol to be passed by aux_params and arg_params. If this is True, no error will be thrown when aux_params and arg_params contain more parameters than needed.
  • begin_epoch (int, optional) – The begining training epoch.
  • kwargs (dict) – The additional keyword arguments passed to optimizer.
predict(X, num_batch=None, return_data=False, reset=True)

Run the prediction, always only use one device.

Parameters:
  • X (mxnet.DataIter) –
  • num_batch (int or None) – The number of batch to run. Go though all batches if None.
Returns:

y – The predicted value of the output.

Return type:

numpy.ndarray or a list of numpy.ndarray if the network has multiple outputs.

score(X, eval_metric='acc', num_batch=None, batch_end_callback=None, reset=True)

Run the model given an input and calculate the score as assessed by an evaluation metric.

Parameters:
  • X (mxnet.DataIter) –
  • eval_metric (metric.metric) – The metric for calculating score.
  • num_batch (int or None) – The number of batches to run. Go though all batches if None.
Returns:

s – The final score.

Return type:

float

fit(X, y=None, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', logger=None, work_load_list=None, monitor=None, eval_end_callback=<mxnet.callback.LogValidationMetricsCallback object>, eval_batch_end_callback=None)

Fit the model.

Parameters:
  • X (DataIter, or numpy.ndarray/NDArray) – Training data. If X is a DataIter, the name or (if name not available) the position of its outputs should match the corresponding variable names defined in the symbolic graph.
  • y (numpy.ndarray/NDArray, optional) – Training set label. If X is numpy.ndarray or NDArray, y is required to be set. While y can be 1D or 2D (with 2nd dimension as 1), its first dimension must be the same as X, i.e. the number of data points and labels should be equal.
  • eval_data (DataIter or numpy.ndarray/list/NDArray pair) – If eval_data is numpy.ndarray/list/NDArray pair, it should be (valid_data, valid_label).
  • eval_metric (metric.EvalMetric or str or callable) – The evaluation metric. This could be the name of evaluation metric or a custom evaluation function that returns statistics based on a minibatch.
  • epoch_end_callback (callable(epoch, symbol, arg_params, aux_states)) – A callback that is invoked at end of each epoch. This can be used to checkpoint model each epoch.
  • batch_end_callback (callable(epoch)) – A callback that is invoked at end of each batch for purposes of printing.
  • kvstore (KVStore or str, optional) – The KVStore or a string kvstore type: ‘local’, ‘dist_sync’, ‘dist_async’ In default uses ‘local’, often no need to change for single machiine.
  • logger (logging logger, optional) – When not specified, default logger will be used.
  • work_load_list (float or int, optional) – The list of work load for different devices, in the same order as ctx.

Note

KVStore behavior - ‘local’, multi-devices on a single machine, will automatically choose best type. - ‘dist_sync’, multiple machines communicating via BSP. - ‘dist_async’, multiple machines with asynchronous communication.

save(prefix, epoch=None)

Checkpoint the model checkpoint into file. You can also use pickle to do the job if you only work on Python. The advantage of load and save (as compared to pickle) is that the resulting file can be loaded from other MXNet language bindings. One can also directly load/save from/to cloud storage(S3, HDFS)

Parameters:prefix (str) – Prefix of model name.

Notes

  • prefix-symbol.json will be saved for symbol.
  • prefix-epoch.params will be saved for parameters.
static load(prefix, epoch, ctx=None, **kwargs)

Load model checkpoint from file.

Parameters:
  • prefix (str) – Prefix of model name.
  • epoch (int) – epoch number of model we would like to load.
  • ctx (Context or list of Context, optional) – The device context of training and prediction.
  • kwargs (dict) – Other parameters for model, including num_epoch, optimizer and numpy_batch_size.
Returns:

model – The loaded model that can be used for prediction.

Return type:

FeedForward

Notes

  • prefix-symbol.json will be saved for symbol.
  • prefix-epoch.params will be saved for parameters.
static create(symbol, X, y=None, ctx=None, num_epoch=None, epoch_size=None, optimizer='sgd', initializer=<mxnet.initializer.Uniform object>, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', logger=None, work_load_list=None, eval_end_callback=<mxnet.callback.LogValidationMetricsCallback object>, eval_batch_end_callback=None, **kwargs)

Functional style to create a model. This function is more consistent with functional languages such as R, where mutation is not allowed.

Parameters:
  • symbol (Symbol) – The symbol configuration of a computation network.
  • X (DataIter) – Training data.
  • y (numpy.ndarray, optional) – If X is a numpy.ndarray, y must be set.
  • ctx (Context or list of Context, optional) – The device context of training and prediction. To use multi-GPU training, pass in a list of GPU contexts.
  • num_epoch (int, optional) – The number of training epochs(epochs).
  • epoch_size (int, optional) – Number of batches in a epoch. In default, it is set to ceil(num_train_examples / batch_size).
  • optimizer (str or Optimizer, optional) – The name of the chosen optimizer, or an optimizer object, used for training.
  • initializier (initializer function, optional) – The initialization scheme used.
  • eval_data (DataIter or numpy.ndarray pair) – If eval_set is numpy.ndarray pair, it should be (valid_data, valid_label).
  • eval_metric (metric.EvalMetric or str or callable) – The evaluation metric. Can be the name of an evaluation metric or a custom evaluation function that returns statistics based on a minibatch.
  • epoch_end_callback (callable(epoch, symbol, arg_params, aux_states)) – A callback that is invoked at end of each epoch. This can be used to checkpoint model each epoch.
  • batch_end_callback (callable(epoch)) – A callback that is invoked at end of each batch for print purposes.
  • kvstore (KVStore or str, optional) – The KVStore or a string kvstore type: ‘local’, ‘dist_sync’, ‘dis_async’. Defaults to ‘local’, often no need to change for single machiine.
  • logger (logging logger, optional) – When not specified, default logger will be used.
  • work_load_list (list of float or int, optional) – The list of work load for different devices, in the same order as ctx.

Next Steps