Linear Regression

In this tutorial we’ll walk through how one can implement linear regression using MXNet APIs.

The function we are trying to learn is: y = x1 + 2x2, where (x1,x2) are input features and y is the corresponding label.


To complete this tutorial, we need:

$ pip install jupyter

To begin, the following code imports the necessary packages we’ll need for this exercise.

import mxnet as mx
import numpy as np

Preparing the Data

In MXNet, data is input via Data Iterators. Here we will illustrate how to encode a dataset into an iterator that MXNet can use. The data used in the example is made up of 2D data points with corresponding integer labels.

#Training data
train_data = np.random.uniform(0, 1, [100, 2])
train_label = np.array([train_data[i][0] + 2 * train_data[i][1] for i in range(100)])
batch_size = 1

#Evaluation Data
eval_data = np.array([[7,2],[6,10],[12,2]])
eval_label = np.array([11,26,16])

Once we have the data ready, we need to put it into an iterator and specify parameters such as batch_size and shuffle. batch_size specifies the number of examples shown to the model each time we update its parameters and shuffle tells the iterator to randomize the order in which examples are shown to the model.

train_iter =,train_label, batch_size, shuffle=True,label_name='lin_reg_label')
eval_iter =, eval_label, batch_size, shuffle=False)

In the above example, we have made use of NDArrayIter, which is useful for iterating over both numpy ndarrays and MXNet NDArrays. In general, there are different types of iterators in MXNet and you can use one based on the type of data you are processing. Documentation for iterators can be found here.

MXNet Classes

  1. IO: The IO class as we already saw works on the data and carries out operations such as feeding data in batches and shuffling.
  2. Symbol: The actual MXNet neural network is composed using symbols. MXNet has different types of symbols, including variable placeholders for input data, neural network layers, and operators that manipulate NDArrays.
  3. Module: The module class in MXNet is used to define the overall computation. It is initialized with the model we want to train, the training inputs (data and labels) and some additional parameters such as learning rate and the optimization algorithm to use.

Defining the Model

MXNet uses Symbols for defining a model. Symbols are the building blocks and make up various components of the model. Symbols are used to define:

  1. Variables: A variable is a placeholder for future data. This symbol is used to define a spot which will be filled with training data/labels in the future when we commence training.
  2. Neural Network Layers: The layers of a network or any other type of model are also defined by Symbols. Such a symbol takes one or more previous symbols as inputs, performs some transformations on them, and creates one or more outputs. One such example is the FullyConnected symbol which specifies a fully connected layer of a neural network.
  3. Outputs: Output symbols are MXNet’s way of defining a loss. They are suffixed with the word “Output” (eg. the SoftmaxOutput layer). You can also create your own loss function. Some examples of existing losses are: LinearRegressionOutput, which computes the l2-loss between it’s input symbol and the labels provided to it; SoftmaxOutput, which computes the categorical cross-entropy.

The ones described above and other symbols are chained together with the output of one symbol serving as input to the next to build the network topology. More information about the different types of symbols can be found here.

X = mx.sym.Variable('data')
Y = mx.symbol.Variable('lin_reg_label')
fully_connected_layer  = mx.sym.FullyConnected(data=X, name='fc1', num_hidden = 1)
lro = mx.sym.LinearRegressionOutput(data=fully_connected_layer, label=Y, name="lro")

The above network uses the following layers:

  1. FullyConnected: The fully connected symbol represents a fully connected layer of a neural network (without any activation being applied), which in essence, is just a linear regression on the input attributes. It takes the following parameters:
    • data: Input to the layer (specifies the symbol whose output should be fed here)
    • num_hidden: Number of hidden neurons in the layer, which is same as the dimensionality of the layer’s output
  2. LinearRegressionOutput: Output layers in MXNet compute training loss, which is the measure of inaccuracy in the model’s predictions. The goal of training is to minimize the training loss. In our example, the LinearRegressionOutput layer computes the l2 loss against its input and the labels provided to it. The parameters to this layer are:
    • data: Input to this layer (specifies the symbol whose output should be fed here)
    • label: The training labels against which we will compare the input to the layer for calculation of l2 loss

Note on naming convention: the label variable’s name should be the same as the label_name parameter passed to your training data iterator. The default value of this is softmax_label, but we have updated it to lin_reg_label in this tutorial as you can see in Y = mx.symbol.Variable('lin_reg_label') and train_iter =, label_name='lin_reg_label').

Finally, the network is input to a Module, where we specify the symbol whose output needs to be minimized (in our case, lro or the lin_reg_output), the learning rate to be used while optimization and the number of epochs we want to train our model for.

model = mx.mod.Module(
    symbol = lro ,
    label_names = ['lin_reg_label']# network structure

We can visualize the network we created by plotting it:


Training the model

Once we have defined the model structure, the next step is to train the parameters of the model to fit the training data. This is accomplished using the fit() function of the Module class., eval_iter,
            optimizer_params={'learning_rate':0.005, 'momentum': 0.9},
            batch_end_callback = mx.callback.Speedometer(batch_size, 2))

Using a trained model: (Testing and Inference)

Once we have a trained model, we can do a couple of things with it - we can either use it for inference or we can evaluate the trained model on test data. The latter is shown below:


We can also evaluate our model according to some metric. In this example, we are evaluating our model’s mean squared error (MSE) on the evaluation data.

metric = mx.metric.MSE()
model.score(eval_iter, metric)

Let us try and add some noise to the evaluation data and see how the MSE changes:

eval_data = np.array([[7,2],[6,10],[12,2]])
eval_label = np.array([11.1,26.1,16.1]) #Adding 0.1 to each of the values
eval_iter =, eval_label, batch_size, shuffle=False)
model.score(eval_iter, metric)

We can also create a custom metric and use it to evaluate a model. More information on metrics can be found in the API documentation.