MXNet Python Model API

The model API provides a simplified way to train neural networks using common best practices. It’s a thin wrapper built on top of the ndarray and symbolic modules that makes neural network training easy.

Topics:

Train the Model

To train a model, perform two steps: configure the model using the symbol parameter, then call model.Feedforward.create to create the model. The following example creates a two-layer neural network.

    # configure a two layer neuralnetwork
    data = mx.symbol.Variable('data')
    fc1 = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
    act1 = mx.symbol.Activation(fc1, name='relu1', act_type='relu')
    fc2 = mx.symbol.FullyConnected(act1, name='fc2', num_hidden=64)
    softmax = mx.symbol.SoftmaxOutput(fc2, name='sm')
    # create a model
    model = mx.model.FeedForward.create(
         softmax,
         X=data_set,
         num_epoch=num_epoch,
         learning_rate=0.01)

You can also use the scikit-learn-style construct and fit function to create a model.

    # create a model using sklearn-style two step way
    model = mx.model.FeedForward(
         softmax,
         num_epoch=num_epoch,
         learning_rate=0.01)

    model.fit(X=data_set)

For more information, see Model API Reference.

Save the Model

After the job is done, save your work. To save the model, you can directly pickle it with Python. We also provide save and load functions.

    # save a model to mymodel-symbol.json and mymodel-0100.params
    prefix = 'mymodel'
    iteration = 100
    model.save(prefix, iteration)

    # load model back
    model_loaded = mx.model.FeedForward.load(prefix, iteration)

The advantage of these save and load functions is that they are language agnostic. You should be able to save and load directly into cloud storage, such as Amazon S3 and HDFS.

Periodic Checkpointing

We recommend checkpointing your model after each iteration. To do this, add a checkpoint callback do_checkpoint(path) to the function. The training process automatically checkpoints the specified location after each iteration.

    prefix='models/chkpt'
    model = mx.model.FeedForward.create(
         softmax,
         X=data_set,
         iter_end_callback=mx.callback.do_checkpoint(prefix),
         ...)

You can load the model checkpoint later using Feedforward.load.

Use Multiple Devices

Set ctx to the list of devices that you want to train on.

    devices = [mx.gpu(i) for i in range(num_device)]
    model = mx.model.FeedForward.create(
         softmax,
         X=dataset,
         ctx=devices,
         ...)

Training occurs in parallel on the GPUs that you specify.

Initializer API Reference

Initialization helper for mxnet

class mxnet.initializer.Initializer

Base class for Initializer.

class mxnet.initializer.Load(param, default_init=None, verbose=False)

Initialize by loading pretrained param from file or dict

Parameters:
  • param (str or dict of str->NDArray) – param file or dict mapping name to NDArray.
  • default_init (Initializer) – default initializer when name is not found in param.
  • verbose (bool) – log source when initializing.
class mxnet.initializer.Mixed(patterns, initializers)

Initialize with mixed Initializer

Parameters:
  • patterns (list of str) – list of regular expression patterns to match parameter names.
  • initializers (list of Initializer) – list of Initializer corrosponding to patterns
class mxnet.initializer.Uniform(scale=0.07)

Initialize the weight with uniform [-scale, scale]

Parameters:scale (float, optional) – The scale of uniform distribution
class mxnet.initializer.Normal(sigma=0.01)

Initialize the weight with normal(0, sigma)

Parameters:sigma (float, optional) – Standard deviation for gaussian distribution.
class mxnet.initializer.Orthogonal(scale=1.414, rand_type='uniform')

Intialize weight as Orthogonal matrix

Parameters:
  • scale (float optional) – scaling factor of weight
  • rand_type (string optional) – use “uniform” or “normal” random number to initialize weight
  • Reference
  • ---------
  • solutions to the nonlinear dynamics of learning in deep linear neural networks (Exact) –
  • preprint arXiv (arXiv) –
class mxnet.initializer.Xavier(rnd_type='uniform', factor_type='avg', magnitude=3)

Initialize the weight with Xavier or similar initialization scheme.

Parameters:
  • rnd_type (str, optional) – Use `gaussian` or `uniform` to init
  • factor_type (str, optional) – Use `avg`, `in`, or `out` to init
  • magnitude (float, optional) – scale of random number range
class mxnet.initializer.MSRAPrelu(factor_type='avg', slope=0.25)
Initialize the weight with initialization scheme from
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.
Parameters:
  • factor_type (str, optional) – Use `avg`, `in`, or `out` to init
  • slope (float, optional) – initial slope of any PReLU (or similar) nonlinearities.

Evaluation Metric API Reference

Online evaluation metric module.

mxnet.metric.check_label_shapes(labels, preds, shape=0)

Check to see if the two arrays are the same size.

class mxnet.metric.EvalMetric(name, num=None)

Base class of all evaluation metrics.

update(label, pred)

Update the internal evaluation.

Parameters:
  • labels (list of NDArray) – The labels of the data.
  • preds (list of NDArray) – Predicted values.
reset()

Clear the internal statistics to initial state.

get()

Get the current evaluation result.

Returns:
  • name (str) – Name of the metric.
  • value (float) – Value of the evaluation.
get_name_value()

Get zipped name and value pairs

class mxnet.metric.CompositeEvalMetric(**kwargs)

Manage multiple evaluation metrics.

add(metric)

Add a child metric.

get_metric(index)

Get a child metric.

class mxnet.metric.Accuracy

Calculate accuracy

class mxnet.metric.TopKAccuracy(**kwargs)

Calculate top k predictions accuracy

class mxnet.metric.F1

Calculate the F1 score of a binary classification problem.

class mxnet.metric.MAE

Calculate Mean Absolute Error loss

class mxnet.metric.MSE

Calculate Mean Squared Error loss

class mxnet.metric.RMSE

Calculate Root Mean Squred Error loss

class mxnet.metric.CrossEntropy(eps=1e-08)

Calculate Cross Entropy loss

class mxnet.metric.Torch(name='torch')

Dummy metric for torch criterions

class mxnet.metric.Caffe

Dummy metric for caffe criterions

class mxnet.metric.CustomMetric(feval, name=None, allow_extra_outputs=False)

Custom evaluation metric that takes a NDArray function.

Parameters:
  • feval (callable(label, pred)) – Customized evaluation function.
  • name (str, optional) – The name of the metric
  • allow_extra_outputs (bool) – If true, the prediction outputs can have extra outputs. This is useful in RNN, where the states are also produced in outputs for forwarding.
mxnet.metric.np(numpy_feval, name=None, allow_extra_outputs=False)

Create a customized metric from numpy function.

Parameters:
  • numpy_feval (callable(label, pred)) – Customized evaluation function. This will get called with the labels and predictions for a minibatch, each as numpy arrays. This function should return a single float.
  • name (str, optional) – The name of the metric.
  • allow_extra_outputs (bool) – If true, the prediction outputs can have extra outputs. This is useful in RNN, where the states are also produced in outputs for forwarding.
mxnet.metric.create(metric, **kwargs)

Create an evaluation metric.

Parameters:metric (str or callable) – The name of the metric, or a function providing statistics given pred, label NDArray.

Optimizer API Reference

Common Optimization algorithms with regularizations.

class mxnet.optimizer.Optimizer(rescale_grad=1.0, param_idx2name=None, wd=0.0, clip_gradient=None, learning_rate=0.01, lr_scheduler=None, sym=None, begin_num_update=0)

Base class of all optimizers.

static register(klass)

Register optimizers to the optimizer factory

static create_optimizer(name, rescale_grad=1, **kwargs)

Create an optimizer with specified name.

Parameters:
  • name (str) – Name of required optimizer. Should be the name of a subclass of Optimizer. Case insensitive.
  • rescale_grad (float) – Rescaling factor on gradient.
  • kwargs (dict) – Parameters for optimizer
Returns:

opt – The result optimizer.

Return type:

Optimizer

create_state(index, weight)

Create additional optimizer state such as momentum. override in implementations.

update(index, weight, grad, state)

Update the parameters. override in implementations

set_lr_scale(args_lrscale)

set lr scale is deprecated. Use set_lr_mult instead.

set_lr_mult(args_lr_mult)

Set individual learning rate multipler for parameters

Parameters:args_lr_mult (dict of string/int to float) – set the lr multipler for name/index to float. setting multipler by index is supported for backward compatibility, but we recommend using name and symbol.
set_wd_mult(args_wd_mult)

Set individual weight decay multipler for parameters. By default wd multipler is 0 for all params whose name doesn’t end with _weight, if param_idx2name is provided.

Parameters:args_wd_mult (dict of string/int to float) – set the wd multipler for name/index to float. setting multipler by index is supported for backward compatibility, but we recommend using name and symbol.
mxnet.optimizer.register(klass)

Register optimizers to the optimizer factory

class mxnet.optimizer.SGD(momentum=0.0, **kwargs)

A very simple SGD optimizer with momentum and weight regularization.

Parameters:
  • learning_rate (float, optional) – learning_rate of SGD
  • momentum (float, optional) – momentum value
  • wd (float, optional) – L2 regularization coefficient add to all the weights
  • rescale_grad (float, optional) – rescaling factor of gradient.
  • clip_gradient (float, optional) – clip gradient in range [-clip_gradient, clip_gradient]
  • param_idx2name (dict of string/int to float, optional) – special treat weight decay in parameter ends with bias, gamma, and beta
create_state(index, weight)

Create additional optimizer state such as momentum.

Parameters:weight (NDArray) – The weight data
update(index, weight, grad, state)

Update the parameters.

Parameters:
  • index (int) – An unique integer key used to index the parameters
  • weight (NDArray) – weight ndarray
  • grad (NDArray) – grad ndarray
  • state (NDArray or other objects returned by init_state) – The auxiliary state used in optimization.
class mxnet.optimizer.DCASGD(momentum=0.0, lamda=0.04, **kwargs)

DCASGD optimizer with momentum and weight regularization.

implement paper “Asynchronous Stochastic Gradient Descent with
Delay Compensation for Distributed Deep Learning”
Parameters:
  • learning_rate (float, optional) – learning_rate of SGD
  • momentum (float, optional) – momentum value
  • lamda (float, optional) – scale DC value
  • wd (float, optional) – L2 regularization coefficient add to all the weights
  • rescale_grad (float, optional) – rescaling factor of gradient.
  • clip_gradient (float, optional) – clip gradient in range [-clip_gradient, clip_gradient]
  • param_idx2name (dict of string/int to float, optional) – special treat weight decay in parameter ends with bias, gamma, and beta
create_state(index, weight)

Create additional optimizer state such as momentum.

Parameters:weight (NDArray) – The weight data
update(index, weight, grad, state)

Update the parameters.

Parameters:
  • index (int) – An unique integer key used to index the parameters
  • weight (NDArray) – weight ndarray
  • grad (NDArray) – grad ndarray
  • state (NDArray or other objects returned by init_state) – The auxiliary state used in optimization.
class mxnet.optimizer.NAG(**kwargs)

SGD with nesterov It is implemented according to https://github.com/torch/optim/blob/master/sgd.lua

update(index, weight, grad, state)

Update the parameters.

Parameters:
  • index (int) – An unique integer key used to index the parameters
  • weight (NDArray) – weight ndarray
  • grad (NDArray) – grad ndarray
  • state (NDArray or other objects returned by init_state) – The auxiliary state used in optimization.
class mxnet.optimizer.SGLD(**kwargs)

Stochastic Langevin Dynamics Updater to sample from a distribution.

Parameters:
  • learning_rate (float, optional) – learning_rate of SGD
  • wd (float, optional) – L2 regularization coefficient add to all the weights
  • rescale_grad (float, optional) – rescaling factor of gradient.
  • clip_gradient (float, optional) – clip gradient in range [-clip_gradient, clip_gradient]
  • param_idx2name (dict of string/int to float, optional) – special treat weight decay in parameter ends with bias, gamma, and beta
create_state(index, weight)

Create additional optimizer state such as momentum.

Parameters:weight (NDArray) – The weight data
update(index, weight, grad, state)

Update the parameters.

Parameters:
  • index (int) – An unique integer key used to index the parameters
  • weight (NDArray) – weight ndarray
  • grad (NDArray) – grad ndarray
  • state (NDArray or other objects returned by init_state) – The auxiliary state used in optimization.
class mxnet.optimizer.ccSGD(*args, **kwargs)

[Deprecated] Same as sgd. Left here for backward compatibility.

class mxnet.optimizer.Adam(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, decay_factor=0.99999999, **kwargs)

Adam optimizer as described in [King2014].

[King2014]Diederik Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization, http://arxiv.org/abs/1412.6980

the code in this class was adapted from https://github.com/mila-udem/blocks/blob/master/blocks/algorithms/__init__.py#L765

Parameters:
  • learning_rate (float, optional) – Step size. Default value is set to 0.002.
  • beta1 (float, optional) – Exponential decay rate for the first moment estimates. Default value is set to 0.9.
  • beta2 (float, optional) – Exponential decay rate for the second moment estimates. Default value is set to 0.999.
  • epsilon (float, optional) – Default value is set to 1e-8.
  • decay_factor (float, optional) – Default value is set to 1 - 1e-8.
  • wd (float, optional) – L2 regularization coefficient add to all the weights
  • rescale_grad (float, optional) – rescaling factor of gradient.
  • clip_gradient (float, optional) – clip gradient in range [-clip_gradient, clip_gradient]
create_state(index, weight)

Create additional optimizer state: mean, variance

Parameters:weight (NDArray) – The weight data
update(index, weight, grad, state)

Update the parameters.

Parameters:
  • index (int) – An unique integer key used to index the parameters
  • weight (NDArray) – weight ndarray
  • grad (NDArray) – grad ndarray
  • state (NDArray or other objects returned by init_state) – The auxiliary state used in optimization.
class mxnet.optimizer.AdaGrad(eps=1e-07, **kwargs)

AdaGrad optimizer of Duchi et al., 2011,

This code follows the version in http://arxiv.org/pdf/1212.5701v1.pdf Eq(5) by Matthew D. Zeiler, 2012. AdaGrad will help the network to converge faster in some cases.

Parameters:
  • learning_rate (float, optional) – Step size. Default value is set to 0.05.
  • wd (float, optional) – L2 regularization coefficient add to all the weights
  • rescale_grad (float, optional) – rescaling factor of gradient.
  • eps (float, optional) – A small float number to make the updating processing stable Default value is set to 1e-7.
  • clip_gradient (float, optional) – clip gradient in range [-clip_gradient, clip_gradient]
class mxnet.optimizer.RMSProp(gamma1=0.95, gamma2=0.9, **kwargs)

RMSProp optimizer of Tieleman & Hinton, 2012,

This code follows the version in http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.

Parameters:
  • learning_rate (float, optional) – Step size. Default value is set to 0.002.
  • gamma1 (float, optional) – decay factor of moving average for gradient, gradient^2. Default value is set to 0.95.
  • gamma2 (float, optional) – “momentum” factor. Default value if set to 0.9.
  • wd (float, optional) – L2 regularization coefficient add to all the weights
  • rescale_grad (float, optional) – rescaling factor of gradient.
  • clip_gradient (float, optional) – clip gradient in range [-clip_gradient, clip_gradient]
create_state(index, weight)

Create additional optimizer state: mean, variance

Parameters:weight (NDArray) – The weight data
update(index, weight, grad, state)

Update the parameters.

Parameters:
  • index (int) – An unique integer key used to index the parameters
  • weight (NDArray) – weight ndarray
  • grad (NDArray) – grad ndarray
  • state (NDArray or other objects returned by init_state) – The auxiliary state used in optimization.
class mxnet.optimizer.AdaDelta(rho=0.9, epsilon=1e-05, **kwargs)

AdaDelta optimizer as described in Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method.

http://arxiv.org/abs/1212.5701

Parameters:
  • rho (float) – Decay rate for both squared gradients and delta x
  • epsilon (float) – The constant as described in the thesis
  • wd (float) – L2 regularization coefficient add to all the weights
  • rescale_grad (float, optional) – rescaling factor of gradient.
  • clip_gradient (float, optional) – clip gradient in range [-clip_gradient, clip_gradient]
class mxnet.optimizer.Test(**kwargs)

For test use

create_state(index, weight)

Create a state to duplicate weight

update(index, weight, grad, state)

performs w += rescale_grad * grad

mxnet.optimizer.create(name, rescale_grad=1, **kwargs)

Create an optimizer with specified name.

Parameters:
  • name (str) – Name of required optimizer. Should be the name of a subclass of Optimizer. Case insensitive.
  • rescale_grad (float) – Rescaling factor on gradient.
  • kwargs (dict) – Parameters for optimizer
Returns:

opt – The result optimizer.

Return type:

Optimizer

class mxnet.optimizer.Updater(optimizer)

updater for kvstore

set_states(states)

set updater states

get_states()

get updater states

mxnet.optimizer.get_updater(optimizer)

Return a clossure of the updater needed for kvstore

Parameters:optimizer (Optimizer) – The optimizer
Returns:updater – The clossure of the updater
Return type:function

Model API Reference

MXNet model module

mxnet.model.BatchEndParam

alias of BatchEndParams

mxnet.model.save_checkpoint(prefix, epoch, symbol, arg_params, aux_params)

Checkpoint the model data into file.

Parameters:
  • prefix (str) – Prefix of model name.
  • epoch (int) – The epoch number of the model.
  • symbol (Symbol) – The input symbol
  • arg_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s weights.
  • aux_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s auxiliary states.

Notes

  • prefix-symbol.json will be saved for symbol.
  • prefix-epoch.params will be saved for parameters.
mxnet.model.load_checkpoint(prefix, epoch)

Load model checkpoint from file.

Parameters:
  • prefix (str) – Prefix of model name.
  • epoch (int) – Epoch number of model we would like to load.
Returns:

  • symbol (Symbol) – The symbol configuration of computation network.
  • arg_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s weights.
  • aux_params (dict of str to NDArray) – Model parameter, dict of name to NDArray of net’s auxiliary states.

Notes

  • symbol will be loaded from prefix-symbol.json.
  • parameters will be loaded from prefix-epoch.params.
class mxnet.model.FeedForward(symbol, ctx=None, num_epoch=None, epoch_size=None, optimizer='sgd', initializer=<mxnet.initializer.Uniform object>, numpy_batch_size=128, arg_params=None, aux_params=None, allow_extra_params=False, begin_epoch=0, **kwargs)

Model class of MXNet for training and predicting feedforward nets. This class is designed for a single-data single output supervised network.

Parameters:
  • symbol (Symbol) – The symbol configuration of computation network.
  • ctx (Context or list of Context, optional) – The device context of training and prediction. To use multi GPU training, pass in a list of gpu contexts.
  • num_epoch (int, optional) – Training parameter, number of training epochs(epochs).
  • epoch_size (int, optional) – Number of batches in a epoch. In default, it is set to ceil(num_train_examples / batch_size)
  • optimizer (str or Optimizer, optional) – Training parameter, name or optimizer object for training.
  • initializer (initializer function, optional) – Training parameter, the initialization scheme used.
  • numpy_batch_size (int, optional) – The batch size of training data. Only needed when input array is numpy.
  • arg_params (dict of str to NDArray, optional) – Model parameter, dict of name to NDArray of net’s weights.
  • aux_params (dict of str to NDArray, optional) – Model parameter, dict of name to NDArray of net’s auxiliary states.
  • allow_extra_params (boolean, optional) – Whether allow extra parameters that are not needed by symbol to be passed by aux_params and arg_params. If this is True, no error will be thrown when aux_params and arg_params contain extra parameters than needed.
  • begin_epoch (int, optional) – The begining training epoch.
  • kwargs (dict) – The additional keyword arguments passed to optimizer.
predict(X, num_batch=None, return_data=False, reset=True)

Run the prediction, always only use one device.

Parameters:
  • X (mxnet.DataIter) –
  • num_batch (int or None) – the number of batch to run. Go though all batches if None
Returns:

y – The predicted value of the output.

Return type:

numpy.ndarray or a list of numpy.ndarray if the network has multiple outputs.

score(X, eval_metric='acc', num_batch=None, batch_end_callback=None, reset=True)

Run the model on X and calculate the score with eval_metric

Parameters:
  • X (mxnet.DataIter) –
  • eval_metric (metric.metric) – The metric for calculating score
  • num_batch (int or None) – the number of batch to run. Go though all batches if None
Returns:

s – the final score

Return type:

float

fit(X, y=None, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', logger=None, work_load_list=None, monitor=None, eval_end_callback=<mxnet.callback.LogValidationMetricsCallback object>, eval_batch_end_callback=None)

Fit the model.

Parameters:
  • X (DataIter, or numpy.ndarray/NDArray) – Training data. If X is an DataIter, the name or, if not available, position, of its outputs should match the corresponding variable names defined in the symbolic graph.
  • y (numpy.ndarray/NDArray, optional) – Training set label. If X is numpy.ndarray/NDArray, y is required to be set. While y can be 1D or 2D (with 2nd dimension as 1), its 1st dimension must be the same as X, i.e. the number of data points and labels should be equal.
  • eval_data (DataIter or numpy.ndarray/list/NDArray pair) – If eval_data is numpy.ndarray/list/NDArray pair, it should be (valid_data, valid_label).
  • eval_metric (metric.EvalMetric or str or callable) – The evaluation metric, name of evaluation metric. Or a customize evaluation function that returns the statistics based on minibatch.
  • epoch_end_callback (callable(epoch, symbol, arg_params, aux_states)) – A callback that is invoked at end of each epoch. This can be used to checkpoint model each epoch.
  • batch_end_callback (callable(epoch)) – A callback that is invoked at end of each batch For print purpose
  • kvstore (KVStore or str, optional) – The KVStore or a string kvstore type: ‘local’, ‘dist_sync’, ‘dist_async’ In default uses ‘local’, often no need to change for single machiine.
  • logger (logging logger, optional) – When not specified, default logger will be used.
  • work_load_list (float or int, optional) – The list of work load for different devices, in the same order as ctx

Note

KVStore behavior - ‘local’, multi-devices on a single machine, will automatically choose best type. - ‘dist_sync’, multi-machines with BSP - ‘dist_async’, multi-machines with partical asynchronous

save(prefix, epoch=None)

Checkpoint the model checkpoint into file. You can also use pickle to do the job if you only work on python. The advantage of load/save is the file is language agnostic. This means the file saved using save can be loaded by other language binding of mxnet. You also get the benefit being able to directly load/save from cloud storage(S3, HDFS)

Parameters:prefix (str) – Prefix of model name.

Notes

  • prefix-symbol.json will be saved for symbol.
  • prefix-epoch.params will be saved for parameters.
static load(prefix, epoch, ctx=None, **kwargs)

Load model checkpoint from file.

Parameters:
  • prefix (str) – Prefix of model name.
  • epoch (int) – epoch number of model we would like to load.
  • ctx (Context or list of Context, optional) – The device context of training and prediction.
  • kwargs (dict) – other parameters for model, including num_epoch, optimizer and numpy_batch_size
Returns:

model – The loaded model that can be used for prediction.

Return type:

FeedForward

Notes

  • prefix-symbol.json will be saved for symbol.
  • prefix-epoch.params will be saved for parameters.
static create(symbol, X, y=None, ctx=None, num_epoch=None, epoch_size=None, optimizer='sgd', initializer=<mxnet.initializer.Uniform object>, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', logger=None, work_load_list=None, eval_end_callback=<mxnet.callback.LogValidationMetricsCallback object>, eval_batch_end_callback=None, **kwargs)

Functional style to create a model. This function will be more consistent with functional languages such as R, where mutation is not allowed.

Parameters:
  • symbol (Symbol) – The symbol configuration of computation network.
  • X (DataIter) – Training data
  • y (numpy.ndarray, optional) – If X is numpy.ndarray y is required to set
  • ctx (Context or list of Context, optional) – The device context of training and prediction. To use multi GPU training, pass in a list of gpu contexts.
  • num_epoch (int, optional) – Training parameter, number of training epochs(epochs).
  • epoch_size (int, optional) – Number of batches in a epoch. In default, it is set to ceil(num_train_examples / batch_size)
  • optimizer (str or Optimizer, optional) – Training parameter, name or optimizer object for training.
  • initializier (initializer function, optional) – Training parameter, the initialization scheme used.
  • eval_data (DataIter or numpy.ndarray pair) – If eval_set is numpy.ndarray pair, it should be (valid_data, valid_label)
  • eval_metric (metric.EvalMetric or str or callable) – The evaluation metric, name of evaluation metric. Or a customize evaluation function that returns the statistics based on minibatch.
  • epoch_end_callback (callable(epoch, symbol, arg_params, aux_states)) – A callback that is invoked at end of each epoch. This can be used to checkpoint model each epoch.
  • batch_end_callback (callable(epoch)) – A callback that is invoked at end of each batch For print purpose
  • kvstore (KVStore or str, optional) – The KVStore or a string kvstore type: ‘local’, ‘dist_sync’, ‘dis_async’ In default uses ‘local’, often no need to change for single machiine.
  • logger (logging logger, optional) – When not specified, default logger will be used.
  • work_load_list (list of float or int, optional) – The list of work load for different devices, in the same order as ctx

Next Steps