Symbol API

Overview

This document lists the routines of the symbolic expression package:

mxnet.symbol Symbolic configuration API of mxnet.

A symbol declares computation. It is composited by operators, such as simple matrix operations (e.g. “+”), or a neural network layer (e.g. convolution layer). We can bind data to a symbol to execute the computation.

>>> a = mx.sym.Variable('a')
>>> b = mx.sym.Variable('b')
>>> c = 2 * a + b
>>> type(c)
<class 'mxnet.symbol.Symbol'>
>>> e = c.bind(mx.cpu(), {'a': mx.nd.array([1,2]), 'b':mx.nd.array([2,3])})
>>> y = e.forward()
>>> y
[<NDArray 2 @cpu(0)>]
>>> y[0].asnumpy()
array([ 4.,  7.], dtype=float32)

A detailed tutorial is available at http://mxnet.io/tutorials/python/symbol.html.

Note

most operators provided in symbol are similar to ndarray. But also note that symbol differs to ndarray in several aspects:

  • symbol adopts declare programming. In other words, we need to first composite the computations, and then feed with data to execute.
  • Most binary operators such as + and > are not enabled broadcasting. We need to call the broadcasted version such as broadcast_plus explicitly.

In the rest of this document, we first overview the methods provided by the symbol.Symbol class, and then list other routines provided by the symbol package.

The Symbol class

Composition

Composite multiple symbols into a new one by an operator.

Symbol.__call__ Compose symbol on inputs.

Arithmetic operations

Symbol.__add__ x.__add__(y) <=> x+y
Symbol.__sub__ x.__sub__(y) <=> x-y
Symbol.__rsub__ x.__rsub__(y) <=> y-x
Symbol.__neg__ x.__neg__(y) <=> -x
Symbol.__mul__ x.__mul__(y) <=> x*y
Symbol.__div__ x.__div__(y) <=> x/y
Symbol.__rdiv__ x.__rdiv__(y) <=> y/x
Symbol.__pow__ x.__pow__(y) <=> x**y

Comparison operators

Symbol.__lt__ x.__lt__(y) <=> x<y
Symbol.__le__ x.__le__(y) <=> x<=y
Symbol.__gt__ x.__gt__(y) <=> x>y
Symbol.__ge__ x.__ge__(y) <=> x>=y
Symbol.__eq__ x.__eq__(y) <=> x==y
Symbol.__ne__ x.__ne__(y) <=> x!=y

Query information

Symbol.name Get name string from the symbol, this function only works for non-grouped symbol.
Symbol.list_arguments List all the arguments in the symbol.
Symbol.list_outputs List all outputs in the symbol.
Symbol.list_auxiliary_states List all auxiliary states in the symbol.
Symbol.list_attr Get all attributes from the symbol.
Symbol.attr Get attribute string from the symbol.
Symbol.attr_dict Recursively get all attributes from the symbol and its children.

Get internal and output symbol

Symbol.__getitem__ x.__getitem__(i) <=> x[i]
Symbol.__iter__ Return all outputs in a list
Symbol.get_internals Get a new grouped symbol sgroup.
Symbol.get_children Get a new grouped symbol whose output contains

Inference type and shape

Symbol.infer_type Given known types for some arguments, infers the type all arguments and all outputs.
Symbol.infer_shape Given known shapes for some arguments, infers the shapes of all arguments and all outputs.
Symbol.infer_shape_partial Partially infer the shape.

Bind

Symbol.bind Bind current symbol to get an executor.
Symbol.simple_bind Bind current symbol to get an executor, allocate all the ndarrays needed.

Save

Symbol.save Save symbol into file.
Symbol.tojson Save symbol into a JSON string.
Symbol.debug_str Get a debug string.

Symbol creation routines

var Create a symbolic variable with specified name.
zeros Return a new symbol of given shape and type, filled with zeros.
ones Return a new symbol of given shape and type, filled with ones.
arange Return evenly spaced values within a given interval.

Symbol manipulation routines

Changing shape and type

cast Cast to a specified type, element-wise.
reshape Reshape array into a new shape.
flatten Flatten input into a 2-D array by collapsing the higher dimensions.
expand_dims Insert a new axis with size 1 into the array shape

Expanding elements

broadcast_to Broadcast an array to a new shape.
broadcast_axes Broadcast an array over particular axes.
repeat Repeat elements of an array.
tile Repeat the whole array by multiple times.
pad Pad an array.

Rearranging elements

transpose Permute the dimensions of an array.
swapaxes Interchange two axes of an array.
flip Reverse elements of an array with axis

Joining and splitting symbols

concat Concate a list of array along a given axis.
split Split an array along a particular axis into multiple sub-arrays.

Indexing routines

slice Crop a continuous region from the array.
slice_axis Slice along a given axis.
take Take elements from an array along an axis.
batch_take Take elements from a data batch.
one_hot Returns a one-hot array.

Mathematical functions

Arithmetic operations

broadcast_add Add arguments, element-wise with broadcasting.
broadcast_sub Substract arguments, element-wise with broadcasting.
broadcast_mul Multiply arguments, element-wise with broadcasting.
broadcast_div Divide arguments, element-wise with broadcasting.
negative Negate src
dot Dot product of two arrays.
batch_dot Batchwise dot product.
add_n Add all input arguments element-wise.

Trigonometric functions

sin Trigonometric sine, element-wise.
cos Cosine, element-wise.
tan Tangent, element-wise.
arcsin Inverse sine, element-wise.
arccos Inverse cosine, element-wise.
arctan Inverse tangent, element-wise.
hypot minimum left and right
broadcast_hypot Given the “legs” of a right triangle, return its hypotenuse with broadcasting.
degrees Convert angles from radians to degrees.
radians Convert angles from degrees to radians.

Hyperbolic functions

sinh Hyperbolic sine, element-wise.
cosh Hyperbolic cosine, element-wise.
tanh Hyperbolic tangent element-wise.
arcsinh Inverse hyperbolic sine, element-wise.
arccosh Inverse hyperbolic cosine, element-wise.
arctanh Inverse hyperbolic tangent, element-wise.

Reduce functions

sum Compute the sum of array elements over given axes.
nansum Compute the sum of array elements over given axes with NaN ignored
prod Compute the product of array elements over given axes.
nanprod Compute the product of array elements over given axes with NaN ignored
mean Compute the mean of array elements over given axes.
max Compute the max of array elements over given axes.
min Compute the min of array elements over given axes.
norm Compute the L2 norm.

Rounding

round Round elements of the array to the nearest integer, element-wise.
rint Round elements of the array to the nearest integer, element-wise.
fix Round elements of the array to the nearest integer towards zero, element-wise.
floor Return the floor of the input, element-wise.
ceil Return the ceiling of the input, element-wise.

Exponents and logarithms

exp Calculate the exponential of the array, element-wise
expm1 Calculate exp(x) - 1
log Natural logarithm, element-wise.
log10 Calculate the base 10 logarithm of the array, element-wise.
log2 Calculate the base 2 logarithm of the array, element-wise.
log1p Calculate log(1 + x)

Powers

broadcast_power First array elements raised to powers from second array, element-wise with broadcasting.
sqrt Calculate the square-root of an array, element-wise.
rsqrt Calculate the inverse square-root of an array, element-wise.
square Calculate the square of an array, element-wise.

Logic functions

broadcast_equal Return (lhs == rhs), element-wise with broadcasting.
broadcast_not_equal Return (lhs != rhs), element-wise with broadcasting.
broadcast_greater Return (lhs > rhs), element-wise with broadcasting.
broadcast_greater_equal Return (lhs >= rhs), element-wise with broadcasting.
broadcast_lesser Return (lhs < rhs), element-wise with broadcasting.
broadcast_lesser_equal Return (lhs <= rhs), element-wise with broadcasting.

Random sampling

uniform Draw samples from a uniform distribution.
normal Draw random samples from a normal (Gaussian) distribution.
mxnet.random.seed Seed the random number generators in mxnet.

Sorting and searching

sort Return a sorted copy of an array.
topk Return the top k elements in an array.
argsort Returns the indices that can sort an array.
argmax Returns the indices of the maximum values along an axis.
argmin Returns the indices of the minimum values along an axis.

Miscellaneous

maximum maximum left and right
minimum minimum left and right
broadcast_maximum Element-wise maximum of array elements with broadcasting.
broadcast_minimum Element-wise minimum of array elements with broadcasting.
clip Clip (limit) the values in an array, elementwise
abs Returns the absolute value of array elements, element-wise.
sign Returns the indication sign of array elements, element-wise.
gamma The gamma function (extension of the factorial function), element-wise
gammaln Log of the absolute value of the gamma function, element-wise

Neural network

Basic

FullyConnected Apply a linear transformation: \(Y = XW^T + b\).
Convolution Compute N-D convolution on (N+2)-D input.
Activation Elementwise activation function.
BatchNorm Batch normalization.
Pooling Perform pooling on the input.
SoftmaxOutput Softmax with logit loss.
softmax
param data:The input
log_softmax
param data:The input

More

Correlation Apply correlation to inputs
Deconvolution Apply deconvolution to input then add a bias.
RNN Apply a recurrent layer to input.
Embedding Map integer index to vector representations (embeddings).
LeakyReLU Leaky ReLu activation
InstanceNorm An operator taking in a n-dimensional input tensor (n > 2), and normalizing the input by subtracting the mean and variance calculated over the spatial dimensions.
L2Normalization Set the l2 norm of each instance to a constant.
LRN Apply convolution to input then add a bias.
ROIPooling Performs region-of-interest pooling on inputs.
SoftmaxActivation Apply softmax activation to input.
Dropout Apply dropout to input.
BilinearSampler Apply bilinear sampling to input feature map, which is the key of “[NIPS2015] Spatial Transformer Networks” output[batch, channel, y_dst, x_dst] = G(data[batch, channel, y_src, x_src) x_dst, y_dst enumerate all spatial locations in output x_src = grid[batch, 0, y_dst, x_dst] y_src = grid[batch, 1, y_dst, x_dst] G() denotes the bilinear interpolation kernel The out-boundary points will be padded as zeros.
GridGenerator generate sampling grid for bilinear sampling.
UpSampling Perform nearest neighboor/bilinear up sampling to inputs This function support variable length of positional input.
SpatialTransformer Apply spatial transformer to input feature map.
LinearRegressionOutput Use linear regression for final output, this is used on final output of a net.
LogisticRegressionOutput Use Logistic regression for final output, this is used on final output of a net.
MAERegressionOutput Use mean absolute error regression for final output, this is used on final output of a net.
SVMOutput Support Vector Machine based transformation on input, backprop L2-SVM
softmax_cross_entropy Calculate cross_entropy(data, one_hot(label))
smooth_l1 Calculate Smooth L1 Loss(lhs, scalar)
IdentityAttachKLSparseReg Apply a sparse regularization to the output a sigmoid activation function.
MakeLoss Get output from a symbol and pass 1 gradient back.
BlockGrad Get output from a symbol and pass 0 gradient back
Custom Custom operator implemented in frontend.

API Reference

Symbolic configuration API of mxnet.

class mxnet.symbol.Symbol(handle)

Symbol is symbolic graph of the mxnet.

name

Get name string from the symbol, this function only works for non-grouped symbol.

Returns:value – The name of this symbol, returns None for grouped symbol.
Return type:str
attr(key)

Get attribute string from the symbol. This function only works for non-grouped symbols.

Parameters:key (str) – The key corresponding to the desired attribute.
Returns:value – The desired attribute value, returns None if attribute does not exist.
Return type:str
list_attr(recursive=False)

Get all attributes from the symbol.

Returns:ret – a dicitonary mapping attribute keys to values
Return type:dict of str to str
attr_dict()

Recursively get all attributes from the symbol and its children.

Returns:ret – There is a key in the returned dict for every child with non-empty attribute set. For each symbol, the name of the symbol is its key in the dict and the correspond value is that symbol’s attribute list (itself a dictionary).
Return type:dict of str to dict
get_internals()

Get a new grouped symbol sgroup. The output of sgroup is a list of the outputs of all of the internal nodes.

Consider the following code: >>> a = mxnet.sym.var(‘a’) >>> b = mxnet.sym.var(‘b’) >>> c = a + b >>> d = c.get_internals() >>> d <Symbol Grouped> >>> d.list_outputs() [‘a’, ‘b’, ‘_plus4_output’]

Returns:sgroup – A symbol group containing all internal and leaf nodes of the computation graph used to compute the symbol
Return type:Symbol
get_children()

Get a new grouped symbol whose output contains inputs to output nodes of the original symbol

Returns:sgroup – The children of the head node. If the symbol has no inputs None will be returned.
Return type:Symbol or None
list_arguments()

List all the arguments in the symbol.

>>> a = mxnet.sym.var('a')
>>> b = mxnet.sym.var('b')
>>> c = a + b
>>> c.list_arguments
['a', 'b']
~~~~
Returns:args – List containing the names of all the arguments required to compute the symbol.
Return type:list of string
list_outputs()

List all outputs in the symbol.

Returns:returns – List of all the outputs. For most symbols, this list contains only the name of this symbol. For symbol groups, this is a list with the names of all symbols in the group.
Return type:list of string
list_auxiliary_states()

List all auxiliary states in the symbol.

Returns:aux_states – List the names of the auxiliary states.
Return type:list of string

Notes

Auxiliary states are special states of symbols that do not correspond to an argument, and are not updated by gradient descent. Common examples of auxiliary states include the moving_mean and moving_variance in BatchNorm. Most operators do not have auxiliary states.

infer_type(*args, **kwargs)

Given known types for some arguments, infers the type all arguments and all outputs.

You can pass in the known types in either positional way or keyword argument way. A tuple of Nones is returned if there is not enough information to deduce the missing types. Inconsistencies in the known types will cause an error to be raised.

>>> a = mxnet.sym.var('a')
>>> b = mxnet.sym.var('b')
>>> c = a + b
>>> c.infer_type(a=float32)
([numpy.float32, numpy.float32], [numpy.float32], [])
Parameters:
  • *args – Provide type of arguments in a positional way. Unknown type can be marked as None
  • **kwargs – Provide keyword arguments of known types.
Returns:

  • arg_types (list of numpy.dtype or None) – List of types of arguments. The order is in the same order as list_arguments()
  • out_types (list of numpy.dtype or None) – List of types of outputs. The order is in the same order as list_outputs()
  • aux_types (list of numpy.dtype or None) – List of types of outputs. The order is in the same order as list_auxiliary_states()

infer_shape(*args, **kwargs)

Given known shapes for some arguments, infers the shapes of all arguments and all outputs.

You can pass in the known shapes in either positional way or keyword argument way. A tuple of Nones is returned if there is not enough information to deduce the missing shapes. Inconsistencies in the known shapes will cause an error to be raised.

>>> a = mxnet.sym.var('a')
>>> b = mxnet.sym.var('b')
>>> c = a + b
>>> c.infer_shape(a=(3,3))
([(3L, 3L), (3L, 3L)], [(3L, 3L)], [])
Parameters:
  • *args – Provide shape of arguments in a positional way. Unknown shape can be marked as None
  • **kwargs – Provide keyword arguments of known shapes.
Returns:

  • arg_shapes (list of tuple or None) – List of shapes of arguments. The order is in the same order as list_arguments()
  • out_shapes (list of tuple or None) – List of shapes of outputs. The order is in the same order as list_outputs()
  • aux_shapes (list of tuple or None) – List of shapes of outputs. The order is in the same order as list_auxiliary_states()

infer_shape_partial(*args, **kwargs)

Partially infer the shape. The same as infer_shape, except that the partial results can be returned.

debug_str()

Get a debug string.

Returns:debug_str – Debug string of the symbol.
Return type:string
save(fname)

Save symbol into file.

You can also use pickle to do the job if you only work on python. The advantage of load/save is the file is language agnostic. This means the file saved using save can be loaded by other language binding of mxnet. You also get the benefit being able to directly load/save from cloud storage(S3, HDFS)

Parameters:fname (str) – The name of the file - s3://my-bucket/path/my-s3-symbol - hdfs://my-bucket/path/my-hdfs-symbol - /path-to/my-local-symbol

See also

symbol.load()
Used to load symbol from file.
tojson()

Save symbol into a JSON string.

See also

symbol.load_json()
Used to load symbol from JSON string.
simple_bind(ctx, grad_req='write', type_dict=None, group2ctx=None, **kwargs)

Bind current symbol to get an executor, allocate all the ndarrays needed. Allows specifying data types.

This function will ask user to pass in ndarray of position they like to bind to, and it will automatically allocate the ndarray for arguments and auxiliary states that user did not specify explicitly.

Parameters:
  • ctx (Context) – The device context the generated executor to run on.
  • grad_req (string) – {‘write’, ‘add’, ‘null’}, or list of str or dict of str to str, optional Specifies how we should update the gradient to the args_grad. - ‘write’ means everytime gradient is write to specified args_grad NDArray. - ‘add’ means everytime gradient is add to the specified NDArray. - ‘null’ means no action is taken, the gradient may not be calculated.
  • type_dict (dict of str->numpy.dtype) – Input type dictionary, name->dtype
  • group2ctx (dict of string to mx.Context) – The dict mapping the ctx_group attribute to the context assignment.
  • kwargs (dict of str->shape) – Input shape dictionary, name->shape
Returns:

executor – The generated Executor

Return type:

mxnet.Executor

bind(ctx, args, args_grad=None, grad_req='write', aux_states=None, group2ctx=None, shared_exec=None)

Bind current symbol to get an executor.

Parameters:
  • ctx (Context) – The device context the generated executor to run on.
  • args (list of NDArray or dict of str to NDArray) –

    Input arguments to the symbol.

    • If type is list of NDArray, the position is in the same order of list_arguments.
    • If type is dict of str to NDArray, then it maps the name of arguments to the corresponding NDArray.
    • In either case, all the arguments must be provided.
  • args_grad (list of NDArray or dict of str to NDArray, optional) –

    When specified, args_grad provide NDArrays to hold the result of gradient value in backward.

    • If type is list of NDArray, the position is in the same order of list_arguments.
    • If type is dict of str to NDArray, then it maps the name of arguments to the corresponding NDArray.
    • When the type is dict of str to NDArray, users only need to provide the dict for needed argument gradient. Only the specified argument gradient will be calculated.
  • grad_req ({'write', 'add', 'null'}, or list of str or dict of str to str, optional) –

    Specifies how we should update the gradient to the args_grad.

    • ‘write’ means everytime gradient is write to specified args_grad NDArray.
    • ‘add’ means everytime gradient is add to the specified NDArray.
    • ‘null’ means no action is taken, the gradient may not be calculated.
  • aux_states (list of NDArray, or dict of str to NDArray, optional) –

    Input auxiliary states to the symbol, only need to specify when list_auxiliary_states is not empty.

    • If type is list of NDArray, the position is in the same order of list_auxiliary_states
    • If type is dict of str to NDArray, then it maps the name of auxiliary_states to the corresponding NDArray,
    • In either case, all the auxiliary_states need to be provided.
  • group2ctx (dict of string to mx.Context) – The dict mapping the ctx_group attribute to the context assignment.
  • shared_exec (mx.executor.Executor) – Executor to share memory with. This is intended for runtime reshaping, variable length sequences, etc. The returned executor shares state with shared_exec, and should not be used in parallel with it.
Returns:

executor – The generated Executor

Return type:

Executor

Notes

Auxiliary states are special states of symbols that do not corresponds to an argument, and do not have gradient. But still be useful for the specific operations. A common example of auxiliary state is the moving_mean and moving_variance in BatchNorm. Most operators do not have auxiliary states and this parameter can be safely ignored.

User can give up gradient by using a dict in args_grad and only specify gradient they interested in.

grad(wrt)

Get the autodiff of current symbol.

This function can only be used if current symbol is a loss function.

Parameters:wrt (Array of String) – keyword arguments of the symbol that the gradients are taken.
Returns:grad – A gradient Symbol with returns to be the corresponding gradients.
Return type:Symbol
eval(ctx=cpu(0), **kwargs)

Evaluate a symbol given arguments

The eval method combines a call to bind (which returns an executor) with a call to forward (executor method). For the common use case, where you might repeatedly evaluate with same arguments, eval is slow. In that case, you should call bind once and then repeatedly call forward. Eval allows simpler syntax for less cumbersome introspection.

Parameters:
  • ctx (Context) – The device context the generated executor to run on.
  • kwargs (list of NDArray or dict of str to NDArray) –

    Input arguments to the symbol.

    • If type is list of NDArray, the position is in the same order of list_arguments.
    • If type is dict of str to NDArray, then it maps the name of arguments to the corresponding NDArray.
    • In either case, all the arguments must be provided.
Returns:

  • result (a list of NDArrays corresponding to the values)
  • taken by each symbol when evaluated on given args.
  • When called on a single symbol (not a group),
  • the result will be a list with one element.

mxnet.symbol.var(name, attr=None, shape=None, lr_mult=None, wd_mult=None, dtype=None, init=None)

Create a symbolic variable with specified name.

Parameters:
  • name (str) – Name of the variable.
  • attr (dict of string -> string) – Additional attributes to set on the variable.
  • shape (tuple) – The shape of a variable. If specified, this will be used during shape inference. If the user specified a different shape for this variable using a keyword argument when calling shape inference, this shape information will be ignored.
  • lr_mult (float) – The learning rate muliplier for this variable.
  • wd_mult (float) – Weight decay muliplier for this variable.
  • dtype (str or numpy.dtype) – The dtype for this variable. If not specified, this value will be inferred.
  • init (initializer (mxnet.init.*)) – Initializer for this variable to (optionally) override the default initializer
Returns:

variable – A symbol corresponding to an input to the computation graph.

Return type:

Symbol

mxnet.symbol.Variable(name, attr=None, shape=None, lr_mult=None, wd_mult=None, dtype=None, init=None)

Create a symbolic variable with specified name.

Parameters:
  • name (str) – Name of the variable.
  • attr (dict of string -> string) – Additional attributes to set on the variable.
  • shape (tuple) – The shape of a variable. If specified, this will be used during shape inference. If the user specified a different shape for this variable using a keyword argument when calling shape inference, this shape information will be ignored.
  • lr_mult (float) – The learning rate muliplier for this variable.
  • wd_mult (float) – Weight decay muliplier for this variable.
  • dtype (str or numpy.dtype) – The dtype for this variable. If not specified, this value will be inferred.
  • init (initializer (mxnet.init.*)) – Initializer for this variable to (optionally) override the default initializer
Returns:

variable – A symbol corresponding to an input to the computation graph.

Return type:

Symbol

mxnet.symbol.Group(symbols)

Creates a symbol that contains a collection of other symbols, grouped together.

Parameters:symbols (list) – List of symbols to be grouped.
Returns:sym – A group symbol.
Return type:Symbol
mxnet.symbol.load(fname)

Load symbol from a JSON file.

You can also use pickle to do the job if you only work on python. The advantage of load/save is the file is language agnostic. This means the file saved using save can be loaded by other language binding of mxnet. You also get the benefit being able to directly load/save from cloud storage(S3, HDFS)

Parameters:fname (str) –

The name of the file, examples:

  • s3://my-bucket/path/my-s3-symbol
  • hdfs://my-bucket/path/my-hdfs-symbol
  • /path-to/my-local-symbol
Returns:sym – The loaded symbol.
Return type:Symbol

See also

Symbol.save()
Used to save symbol into file.
mxnet.symbol.load_json(json_str)

Load symbol from json string.

Parameters:json_str (str) – A json string.
Returns:sym – The loaded symbol.
Return type:Symbol

See also

Symbol.tojson()
Used to save symbol into json string.
mxnet.symbol.pow(base, exp)

Raise base to an exp.

Parameters:
Returns:

result

Return type:

Symbol or Number

mxnet.symbol.maximum(left, right)

maximum left and right

Parameters:
Returns:

result

Return type:

Symbol or Number

mxnet.symbol.minimum(left, right)

minimum left and right

Parameters:
Returns:

result

Return type:

Symbol or Number

mxnet.symbol.hypot(left, right)

minimum left and right

Parameters:
Returns:

result

Return type:

Symbol or Number

mxnet.symbol.zeros(shape, dtype=None, **kwargs)

Return a new symbol of given shape and type, filled with zeros.

Parameters:
  • shape (int or sequence of ints) – Shape of the new array.
  • dtype (str or numpy.dtype, optional) – The value type of the inner value, default to np.float32
Returns:

out – The created Symbol

Return type:

Symbol

mxnet.symbol.ones(shape, dtype=None, **kwargs)

Return a new symbol of given shape and type, filled with ones.

Parameters:
  • shape (int or sequence of ints) – Shape of the new array.
  • dtype (str or numpy.dtype, optional) – The value type of the inner value, default to np.float32
Returns:

out – The created Symbol

Return type:

Symbol

mxnet.symbol.arange(start, stop=None, step=1.0, repeat=1, name=None, dtype=None)

Return evenly spaced values within a given interval.

Parameters:
  • start (number) – Start of interval. The interval includes this value. The default start value is 0.
  • stop (number, optional) – End of interval. The interval does not include this value.
  • step (number, optional) – Spacing between values
  • repeat (int, optional) – “The repeating time of all elements. E.g repeat=3, the element a will be repeated three times –> a, a, a.
  • dtype (str or numpy.dtype, optional) – The value type of the inner value, default to np.float32
Returns:

out – The created Symbol

Return type:

Symbol

mxnet.symbol.Activation(*args, **kwargs)

Elementwise activation function. The activation operations are applied elementwisely to each array elements. The following types are supported:

  • relu: Rectified Linear Unit, y = max(x, 0)
  • sigmoid: y = 1 / (1 + exp(-x))
  • tanh: Hyperbolic tangent, y = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
  • softrelu: Soft ReLU, or SoftPlus, y = log(1 + exp(x))

Defined in src/operator/activation.cc:L76

Parameters:
  • data (Symbol) – Input data to activation function.
  • act_type ({'relu', 'sigmoid', 'softrelu', 'tanh'}, required) – Activation function to be applied.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

Examples

A one-hidden-layer MLP with ReLU activation:

>>> data = Variable('data')
>>> mlp = FullyConnected(data=data, num_hidden=128, name='proj')
>>> mlp = Activation(data=mlp, act_type='relu', name='activation')
>>> mlp = FullyConnected(data=mlp, num_hidden=10, name='mlp')
>>> mlp
<Symbol mlp>

ReLU activation

>>> test_suites = [
... ('relu', lambda x: numpy.maximum(x, 0)),
... ('sigmoid', lambda x: 1 / (1 + numpy.exp(-x))),
... ('tanh', lambda x: numpy.tanh(x)),
... ('softrelu', lambda x: numpy.log(1 + numpy.exp(x)))
... ]
>>> x = test_utils.random_arrays((2, 3, 4))
>>> for act_type, numpy_impl in test_suites:
... op = Activation(act_type=act_type, name='act')
... y = test_utils.simple_forward(op, act_data=x)
... y_np = numpy_impl(x)
... print('%s: %s' % (act_type, test_utils.almost_equal(y, y_np)))
relu: True
sigmoid: True
tanh: True
softrelu: True
mxnet.symbol.BatchNorm(*args, **kwargs)

Batch normalization.

Normalizes a data batch by mean and variance, and applies a scale gamma as well as offset beta.

Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis:

\[\begin{split}data\_mean[i] = mean(data[:,i,:,...]) \\ data\_var[i] = var(data[:,i,:,...])\end{split}\]

Then compute the normalized output, which has the same shape as input, as following:

\[out[:,i,:,...] = \frac{data[:,i,:,...] - data\_mean[i]}{\sqrt{data\_var[i]+\epsilon}} * gamma[i] + beta[i]\]

Both mean and var returns a scalar by treating the input as a vector.

Assume the input has size k on axis 1, then both gamma and beta have shape (k,). If output_mean_var is set to be true, then outputs both data_mean and data_var as well, which are needed for the backward pass.

Besides the inputs and the outputs, this operator accepts two auxiliary states, moving_mean and moving_var, which are k-length vectors. They are global statistics for the whole dataset, which are updated by:

moving_mean = moving_mean * momentum + data_mean * (1 - momentum)
moving_var = moving_var * momentum + data_var * (1 - momentum)

If use_global_stats is set to be true, then moving_mean and moving_var are used instead of data_mean and data_var to compute the output. It is often used during inference.

Both gamma and beta are learnable parameters. But if fix_gamma is true, then set gamma to 1 and its gradient to 0.

Defined in src/operator/batch_norm.cc:L79

Parameters:
  • data (Symbol) – Input data to batch normalization
  • gamma (Symbol) – gamma array
  • beta (Symbol) – beta array
  • eps (float, optional, default=0.001) – Epsilon to prevent div 0
  • momentum (float, optional, default=0.9) – Momentum for moving average
  • fix_gamma (boolean, optional, default=True) – Fix gamma while training
  • use_global_stats (boolean, optional, default=False) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.
  • output_mean_var (boolean, optional, default=False) – Output All,normal mean and var
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.BilinearSampler(*args, **kwargs)
Apply bilinear sampling to input feature map, which is the key of “[NIPS2015] Spatial Transformer Networks”
output[batch, channel, y_dst, x_dst] = G(data[batch, channel, y_src, x_src) x_dst, y_dst enumerate all spatial locations in output x_src = grid[batch, 0, y_dst, x_dst] y_src = grid[batch, 1, y_dst, x_dst] G() denotes the bilinear interpolation kernel

The out-boundary points will be padded as zeros. (The boundary is defined to be [-1, 1]) The shape of output will be (data.shape[0], data.shape[1], grid.shape[2], grid.shape[3]) The operator assumes that grid has been nomalized. If you want to design a CustomOp to manipulate grid, please refer to GridGeneratorOp.

Parameters:
  • data (Symbol) – Input data to the BilinearsamplerOp.
  • grid (Symbol) – Input grid to the BilinearsamplerOp.grid has two channels: x_src, y_src
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.BlockGrad(*args, **kwargs)

Get output from a symbol and pass 0 gradient back

From:src/operator/tensor/elemwise_unary_op.cc:31

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Cast(*args, **kwargs)

Cast to a specified type, element-wise.

For example:

cast([1e20, 11.1], dtype='float16') = [inf, 11.09375]
cast([300, 11.1, 10.9, -1, -3], dtype='uint8') = [44, 11, 10, 255, 253]

Defined in src/operator/tensor/elemwise_unary_op.cc:L65

Parameters:
  • data (Symbol) – Source input
  • dtype ({'float16', 'float32', 'float64', 'int32', 'uint8'}, required) – Output data type.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Concat(*args, **kwargs)

Concate a list of array along a given axis.

The dimension sizes of the input arrays on the given axis should be the same.

For example:

x = [[1,1],[1,1]]
y = [[2,2],[2,2]]
z = [[3,3],[3,3],[3,3]]

Concat(x,y,z,dim=0) = [[ 1.,  1.],
                       [ 1.,  1.],
                       [ 2.,  2.],
                       [ 2.,  2.],
                       [ 3.,  3.],
                       [ 3.,  3.],
                       [ 3.,  3.]]

Concat(x,y,z,dim=1) = [[ 1.,  1.,  2.,  2.],
                       [ 1.,  1.,  2.,  2.]]

Defined in src/operator/concat.cc:L70 This function support variable length of positional input.

Parameters:
  • data (Symbol[]) – List of tensors to concatenate
  • dim (int, optional, default='1') – the dimension to be concated.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

Examples

Concat two (or more) inputs along a specific dimension:

>>> a = Variable('a')
>>> b = Variable('b')
>>> c = Concat(a, b, dim=1, name='my-concat')
>>> c
<Symbol my-concat>
>>> SymbolDoc.get_output_shape(c, a=(128, 10, 3, 3), b=(128, 15, 3, 3))
{'my-concat_output': (128L, 25L, 3L, 3L)}

Note the shape should be the same except on the dimension that is being concatenated.

mxnet.symbol.Convolution(*args, **kwargs)

Compute N-D convolution on (N+2)-D input.

In the simplest 2-D convolution, given input data with shape (batch_size, channel, height, weight), the output is computed by

\[out[n,i,:,:] = bias[i] + \sum_{j=0}^{num\_filter} data[n,j,:,:] \star weight[i,j,:,:]\]

where \(\star\) is the 2-D cross-correlation operator.

For general 2-D convolution, the shapes are

  • data: (batch_size, channel, height, weight)
  • weight: (num_filter, channel, kernel[0], kernel[1])
  • bias: (num_filter,)
  • out: (batch_size, num_filter, out_height, out_weight).

Define:

f(x,k,p,s,d) = floor((x+2*p-d*(k-1)-1)/s)+1

then we have:

out_height=f(height, kernel[0], pad[0], stride[0], dilate[0])
out_weight=f(weight, kernel[1], pad[1], stride[1], dilate[1])

If no_bias is set to be true, then the bias term is ignored.

The default data layout is NCHW, namely (batch_size, channle, height, weight). We can choose other layouts such as NHWC.

If num_group is larger than 1, denoted by g, then split the input data evenly into g parts along the channel axis, and also evenly split weight along the first dimension. Next compute the convolution on the i-th part of the data with the i-th weight part. The output is obtained by concating all the g results.

To perform 1-D convolution, simply use 2-D convolution but set the last axis size to be 1 for both data and weight.

3-D convolution adds an additional depth dimension besides height and weight. The shapes are

  • data: (batch_size, channel, depth, height, weight)
  • weight: (num_filter, channel, kernel[0], kernel[1], kernel[2])
  • bias: (num_filter,)
  • out: (batch_size, num_filter, out_depth, out_height, out_weight).

Both weight and bias are learnable parameters.

There are other options to tune the performance.

  • cudnn_tune: enable this option leads to higher startup time but may give faster speed. Options are
    • off: no tuning
    • limited_workspace:run test and pick the fastest algorithm that doesn’t exceed workspace limit.
    • fastest: pick the fastest algorithm and ignore workspace limit.
    • None (default): the behavior is determined by environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT. 0 for off, 1 for limited workspace (default), 2 for fastest.
  • workspace: A large number leads to more (GPU) memory usage but may improve the performance.

Defined in src/operator/convolution.cc:L150

Parameters:
  • data (Symbol) – Input data to the ConvolutionOp.
  • weight (Symbol) – Weight matrix.
  • bias (Symbol) – Bias parameter.
  • kernel (Shape(tuple), required) – convolution kernel size: (h, w) or (d, h, w)
  • stride (Shape(tuple), optional, default=()) – convolution stride: (h, w) or (d, h, w)
  • dilate (Shape(tuple), optional, default=()) – convolution dilate: (h, w) or (d, h, w)
  • pad (Shape(tuple), optional, default=()) – pad for convolution: (h, w) or (d, h, w)
  • num_filter (int (non-negative), required) – convolution filter(channel) number
  • num_group (int (non-negative), optional, default=1) – Number of group partitions.
  • workspace (long (non-negative), optional, default=1024) – Maximum temperal workspace allowed for convolution (MB).
  • no_bias (boolean, optional, default=False) – Whether to disable bias parameter.
  • cudnn_tune ({None, 'fastest', 'limited_workspace', 'off'},optional, default='None') – Whether to pick convolution algo by running performance test.
  • cudnn_off (boolean, optional, default=False) – Turn off cudnn for this layer.
  • layout ({None, 'NCDHW', 'NCHW', 'NCW', 'NDHWC', 'NHWC'},optional, default='None') – Set layout for input, output and weight. Empty for default layout: NCW for 1d, NCHW for 2d and NCDHW for 3d.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Convolution_v1(*args, **kwargs)

Apply convolution to input then add a bias.

Parameters:
  • data (Symbol) – Input data to the ConvolutionV1Op.
  • weight (Symbol) – Weight matrix.
  • bias (Symbol) – Bias parameter.
  • kernel (Shape(tuple), required) – convolution kernel size: (h, w) or (d, h, w)
  • stride (Shape(tuple), optional, default=()) – convolution stride: (h, w) or (d, h, w)
  • dilate (Shape(tuple), optional, default=()) – convolution dilate: (h, w) or (d, h, w)
  • pad (Shape(tuple), optional, default=()) – pad for convolution: (h, w) or (d, h, w)
  • num_filter (int (non-negative), required) – convolution filter(channel) number
  • num_group (int (non-negative), optional, default=1) – Number of group partitions. Equivalent to slicing input into num_group partitions, apply convolution on each, then concatenate the results
  • workspace (long (non-negative), optional, default=1024) – Maximum tmp workspace allowed for convolution (MB).
  • no_bias (boolean, optional, default=False) – Whether to disable bias parameter.
  • cudnn_tune ({None, 'fastest', 'limited_workspace', 'off'},optional, default='None') – Whether to pick convolution algo by running performance test. Leads to higher startup time but may give faster speed. Options are: ‘off’: no tuning ‘limited_workspace’: run test and pick the fastest algorithm that doesn’t exceed workspace limit. ‘fastest’: pick the fastest algorithm and ignore workspace limit. If set to None (default), behavior is determined by environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT: 0 for off, 1 for limited workspace (default), 2 for fastest.
  • cudnn_off (boolean, optional, default=False) – Turn off cudnn for this layer.
  • layout ({None, 'NCDHW', 'NCHW', 'NDHWC', 'NHWC'},optional, default='None') – Set layout for input, output and weight. Empty for default layout: NCHW for 2d and NCDHW for 3d.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Correlation(*args, **kwargs)

Apply correlation to inputs

Parameters:
  • data1 (Symbol) – Input data1 to the correlation.
  • data2 (Symbol) – Input data2 to the correlation.
  • kernel_size (int (non-negative), optional, default=1) – kernel size for Correlation must be an odd number
  • max_displacement (int (non-negative), optional, default=1) – Max displacement of Correlation
  • stride1 (int (non-negative), optional, default=1) – stride1 quantize data1 globally
  • stride2 (int (non-negative), optional, default=1) – stride2 quantize data2 within the neighborhood centered around data1
  • pad_size (int (non-negative), optional, default=0) – pad for Correlation
  • is_multiply (boolean, optional, default=True) – operation type is either multiplication or subduction
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Crop(*args, **kwargs)

Crop the 2nd and 3rd dim of input data, with the corresponding size of h_w or with width and height of the second input symbol, i.e., with one input, we need h_w to specify the crop height and width, otherwise the second input symbol’s size will be used This function support variable length of positional input.

Parameters:
  • data (Symbol or Symbol[]) – Tensor or List of Tensors, the second input will be used as crop_like shape reference
  • offset (Shape(tuple), optional, default=(0,0)) – crop offset coordinate: (y, x)
  • h_w (Shape(tuple), optional, default=(0,0)) – crop height and weight: (h, w)
  • center_crop (boolean, optional, default=False) – If set to true, then it will use be the center_crop,or it will crop using the shape of crop_like
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Custom(*args, **kwargs)

Custom operator implemented in frontend.

Parameters:
  • op_type (string) – Type of custom operator. Must be registered first.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Deconvolution(*args, **kwargs)

Apply deconvolution to input then add a bias.

Parameters:
  • data (Symbol) – Input data to the DeconvolutionOp.
  • weight (Symbol) – Weight matrix.
  • bias (Symbol) – Bias parameter.
  • kernel (Shape(tuple), required) – deconvolution kernel size: (y, x)
  • stride (Shape(tuple), optional, default=(1,1)) – deconvolution stride: (y, x)
  • pad (Shape(tuple), optional, default=(0,0)) – pad for deconvolution: (y, x), a good number is : (kernel-1)/2, if target_shape set, pad will be ignored and will be computed automatically
  • adj (Shape(tuple), optional, default=(0,0)) – adjustment for output shape: (y, x), if target_shape set, adj will be ignored and will be computed automatically
  • target_shape (Shape(tuple), optional, default=(0,0)) – output shape with targe shape : (y, x)
  • num_filter (int (non-negative), required) – deconvolution filter(channel) number
  • num_group (int (non-negative), optional, default=1) – number of groups partition
  • workspace (long (non-negative), optional, default=512) – Tmp workspace for deconvolution (MB)
  • no_bias (boolean, optional, default=True) – Whether to disable bias parameter.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Dropout(*args, **kwargs)

Apply dropout to input. During training, each element of the input is randomly set to zero with probability p. And then the whole tensor is rescaled by 1/(1-p) to keep the expectation the same as before applying dropout. During the test time, this behaves as an identity map.

Parameters:
  • data (Symbol) – Input data to dropout.
  • p (float, optional, default=0.5) – Fraction of the input that gets dropped out at training time
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

Examples

Apply dropout to corrupt input as zero with probability 0.2:

>>> data = Variable('data')
>>> data_dp = Dropout(data=data, p=0.2)
>>> shape = (100, 100)  # take larger shapes to be more statistical stable
>>> x = numpy.ones(shape)
>>> op = Dropout(p=0.5, name='dp')
>>> # dropout is identity during testing
>>> y = test_utils.simple_forward(op, dp_data=x, is_train=False)
>>> test_utils.almost_equal(x, y, threshold=0)
True
>>> y = test_utils.simple_forward(op, dp_data=x, is_train=True)
>>> # expectation is (approximately) unchanged
>>> numpy.abs(x.mean() - y.mean()) < 0.1
True
>>> set(numpy.unique(y)) == set([0, 2])
True
mxnet.symbol.ElementWiseSum(*args, **kwargs)

Add all input arguments element-wise.

\[add\_n(a_1, a_2, ..., a_n) = a_1 + a_2 + ... + a_n\]

add_n is potentially more efficient than calling add by n times.

Defined in src/operator/tensor/elemwise_sum.cc:L63 This function support variable length of positional input.

Parameters:
  • args (Symbol[]) – Positional input arguments
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Embedding(*args, **kwargs)

Map integer index to vector representations (embeddings). Those embeddings are learnable parameters. For a input of shape (d1, ..., dK), the output shape is (d1, ..., dK, output_dim). All the input values should be integers in the range [0, input_dim).

From:src/operator/tensor/indexing_op.cc:19

Parameters:
  • data (Symbol) – Input data to the EmbeddingOp.
  • weight (Symbol) – Embedding weight matrix.
  • input_dim (int, required) – vocabulary size of the input indices.
  • output_dim (int, required) – dimension of the embedding vectors.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

Examples

Assume we want to map the 26 English alphabet letters to 16-dimensional vectorial representations.

>>> vocabulary_size = 26
>>> embed_dim = 16
>>> seq_len, batch_size = (10, 64)
>>> input = Variable('letters')
>>> op = Embedding(data=input, input_dim=vocabulary_size, output_dim=embed_dim,
...name='embed')
>>> SymbolDoc.get_output_shape(op, letters=(seq_len, batch_size))
{'embed_output': (10L, 64L, 16L)}
>>> vocab_size, embed_dim = (26, 16)
>>> batch_size = 12
>>> word_vecs = test_utils.random_arrays((vocab_size, embed_dim))
>>> op = Embedding(name='embed', input_dim=vocab_size, output_dim=embed_dim)
>>> x = numpy.random.choice(vocab_size, batch_size)
>>> y = test_utils.simple_forward(op, embed_data=x, embed_weight=word_vecs)
>>> y_np = word_vecs[x]
>>> test_utils.almost_equal(y, y_np)
True
mxnet.symbol.Flatten(*args, **kwargs)

Flatten input into a 2-D array by collapsing the higher dimensions.

Assume the input array has shape (d1, d2, ..., dk), then flatten reshapes the input array into shape (d1, d2*...*dk).

Defined in src/operator/tensor/matrix_op.cc:L101

Parameters:
  • data (Symbol) – Input data to reshape.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

Examples

Flatten is usually applied before FullyConnected, to reshape the 4D tensor produced by convolutional layers to 2D matrix:

>>> data = Variable('data')  # say this is 4D from some conv/pool
>>> flatten = Flatten(data=data, name='flat')  # now this is 2D
>>> SymbolDoc.get_output_shape(flatten, data=(2, 3, 4, 5))
{'flat_output': (2L, 60L)}
>>> test_dims = [(2, 3, 4, 5), (2, 3), (2,)]
>>> op = Flatten(name='flat')
>>> for dims in test_dims:
... x = test_utils.random_arrays(dims)
... y = test_utils.simple_forward(op, flat_data=x)
... y_np = x.reshape((dims[0], numpy.prod(dims[1:])))
... print('%s: %s' % (dims, test_utils.almost_equal(y, y_np)))
(2, 3, 4, 5): True
(2, 3): True
(2,): True
mxnet.symbol.FullyConnected(*args, **kwargs)

Apply a linear transformation: \(Y = XW^T + b\).

Shapes:

  • data: (batch_size, input_dim)
  • weight: (num_hidden, input_dim)
  • bias: (num_hidden,)
  • out: (batch_size, num_hidden)

The learnable parameters include both weight and bias.

If no_bias is set to be true, then the bias term is ignored.

Defined in src/operator/fully_connected.cc:L94

Parameters:
  • data (Symbol) – Input data.
  • weight (Symbol) – Weight matrix.
  • bias (Symbol) – Bias parameter.
  • num_hidden (int, required) – Number of hidden nodes of the output.
  • no_bias (boolean, optional, default=False) – Whether to disable bias parameter.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

Examples

Construct a fully connected operator with target dimension 512.

>>> data = Variable('data')  # or some constructed NN
>>> op = FullyConnected(data=data,
... num_hidden=512,
... name='FC1')
>>> op
<Symbol FC1>
>>> SymbolDoc.get_output_shape(op, data=(128, 100))
{'FC1_output': (128L, 512L)}

A simple 3-layer MLP with ReLU activation:

>>> net = Variable('data')
>>> for i, dim in enumerate([128, 64]):
... net = FullyConnected(data=net, num_hidden=dim, name='FC%d' % i)
... net = Activation(data=net, act_type='relu', name='ReLU%d' % i)
>>> # 10-class predictor (e.g. MNIST)
>>> net = FullyConnected(data=net, num_hidden=10, name='pred')
>>> net
<Symbol pred>
>>> dim_in, dim_out = (3, 4)
>>> x, w, b = test_utils.random_arrays((10, dim_in), (dim_out, dim_in), (dim_out,))
>>> op = FullyConnected(num_hidden=dim_out, name='FC')
>>> out = test_utils.simple_forward(op, FC_data=x, FC_weight=w, FC_bias=b)
>>> # numpy implementation of FullyConnected
>>> out_np = numpy.dot(x, w.T) + b
>>> test_utils.almost_equal(out, out_np)
True
mxnet.symbol.GridGenerator(*args, **kwargs)

generate sampling grid for bilinear sampling.

Parameters:
  • data (Symbol) – Input data to the GridGeneratorOp.
  • transform_type ({'affine', 'warp'}, required) – transformation type if transformation type is affine, data is affine matrix : (batch, 6) if transformation type is warp, data is optical flow : (batch, 2, h, w)
  • target_shape (Shape(tuple), optional, default=(0,0)) – if transformation type is affine, the operator need a target_shape : (H, W) if transofrmation type is warp, the operator will ignore target_shape
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.IdentityAttachKLSparseReg(*args, **kwargs)

Apply a sparse regularization to the output a sigmoid activation function.

Parameters:
  • data (Symbol) – Input data.
  • sparseness_target (float, optional, default=0.1) – The sparseness target
  • penalty (float, optional, default=0.001) – The tradeoff parameter for the sparseness penalty
  • momentum (float, optional, default=0.9) – The momentum for running average
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.InstanceNorm(*args, **kwargs)

An operator taking in a n-dimensional input tensor (n > 2), and normalizing the input by subtracting the mean and variance calculated over the spatial dimensions. This is an implemention of the operator described in “Instance Normalization: The Missing Ingredient for Fast Stylization”, D. Ulyanov, A. Vedaldi, V. Lempitsky, 2016 (arXiv:1607.08022v2). This layer is similar to batch normalization, with two differences: first, the normalization is carried out per example (‘instance’), not over a batch. Second, the same normalization is applied both at test and train time. This operation is also known as ‘contrast normalization’.

Parameters:
  • data (Symbol) – A n-dimensional tensor (n > 2) of the form [batch, channel, spatial_dim1, spatial_dim2, ...].
  • gamma (Symbol) – A vector of length ‘channel’, which multiplies the normalized input.
  • beta (Symbol) – A vector of length ‘channel’, which is added to the product of the normalized input and the weight.
  • eps (float, optional, default=0.001) – Epsilon to prevent division by 0.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.L2Normalization(*args, **kwargs)

Set the l2 norm of each instance to a constant.

Parameters:
  • data (Symbol) – Input data to the L2NormalizationOp.
  • eps (float, optional, default=1e-10) – Epsilon to prevent div 0
  • mode ({'channel', 'instance', 'spatial'},optional, default='instance') – Normalization Mode. If set to instance, this operator will compute a norm for each instance in the batch; this is the default mode. If set to channel, this operator will compute a cross channel norm at each position of each instance. If set to spatial, this operator will compute a norm for each channel.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.LRN(*args, **kwargs)

Apply convolution to input then add a bias.

Parameters:
  • data (Symbol) – Input data to the ConvolutionOp.
  • alpha (float, optional, default=0.0001) – value of the alpha variance scaling parameter in the normalization formula
  • beta (float, optional, default=0.75) – value of the beta power parameter in the normalization formula
  • knorm (float, optional, default=2) – value of the k parameter in normalization formula
  • nsize (int (non-negative), required) – normalization window width in elements.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.LeakyReLU(*args, **kwargs)

Leaky ReLu activation

The following types are supported:

  • elu: y = x > 0 ? x : slop * (exp(x)-1)
  • leaky: y = x > 0 ? x : slope * x
  • prelu: same as leaky but the slope is learnable.
  • rrelu: same as leaky but the slope is uniformly randomly chosen from [lower_bound, upper_bound) for training, while fixed to be (lower_bound+upper_bound)/2 for inference.

Defined in src/operator/leaky_relu.cc:L36

Parameters:
  • data (Symbol) – Input data to activation function.
  • act_type ({'elu', 'leaky', 'prelu', 'rrelu'},optional, default='leaky') – Activation function to be applied.
  • slope (float, optional, default=0.25) – Init slope for the activation. (For leaky and elu only)
  • lower_bound (float, optional, default=0.125) – Lower bound of random slope. (For rrelu only)
  • upper_bound (float, optional, default=0.334) – Upper bound of random slope. (For rrelu only)
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.LinearRegressionOutput(*args, **kwargs)

Use linear regression for final output, this is used on final output of a net.

Parameters:
  • data (Symbol) – Input data to function.
  • label (Symbol) – Input label to function.
  • grad_scale (float, optional, default=1) – Scale the gradient by a float factor
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.LogisticRegressionOutput(*args, **kwargs)

Use Logistic regression for final output, this is used on final output of a net. Logistic regression is suitable for binary classification or probability prediction tasks.

Parameters:
  • data (Symbol) – Input data to function.
  • label (Symbol) – Input label to function.
  • grad_scale (float, optional, default=1) – Scale the gradient by a float factor
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.MAERegressionOutput(*args, **kwargs)

Use mean absolute error regression for final output, this is used on final output of a net.

Parameters:
  • data (Symbol) – Input data to function.
  • label (Symbol) – Input label to function.
  • grad_scale (float, optional, default=1) – Scale the gradient by a float factor
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.MakeLoss(*args, **kwargs)

Get output from a symbol and pass 1 gradient back. This is used as a terminal loss if unary and binary operator are used to composite a loss with no declaration of backward dependency

Parameters:
  • data (Symbol) – Input data.
  • grad_scale (float, optional, default=1) – gradient scale as a supplement to unary and binary operators
  • valid_thresh (float, optional, default=0) – regard element valid when x > valid_thresh, this is used only in valid normalization mode.
  • normalization ({'batch', 'null', 'valid'},optional, default='null') – If set to null, op will not normalize on output gradient.If set to batch, op will normalize gradient by divide batch size.If set to valid, op will normalize gradient by divide # sample marked as valid
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Pad(*args, **kwargs)

Pad an array.

Only supports 4-D and 5-D input array.

Defined in src/operator/pad.cc:L407

Parameters:
  • data (Symbol) – An n-dimensional input tensor.
  • mode ({'constant', 'edge'}, required) – Padding type to use. “constant” pads all values with a constant value, the value of which can be specified with the constant_value option. “edge” uses the boundary values of the array as padding.
  • pad_width (Shape(tuple), required) – A tuple of padding widths of length 2*r, where r is the rank of the input tensor, specifying number of values padded to the edges of each axis. (before_1, after_1, ... , before_N, after_N) unique pad widths for each axis. Equivalent to pad_width in numpy.pad, but flattened.
  • constant_value (double, optional, default=0) – This option is only used when mode is “constant”. This value will be used as the padding value. Defaults to 0 if not specified.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Pooling(*args, **kwargs)

Perform pooling on the input.

The shapes for 1-D pooling are - data: (batch_size, channel, width), - out: (batch_size, num_filter, out_width).

The shapes for 2-D pooling is

  • data: (batch_size, channel, height, width)

  • out: (batch_size, num_filter, out_height, out_width), with:

    out_height = f(height, kernel[0], pad[0], stride[0])
    out_width = f(width, kernel[1], pad[1], stride[1])
    

The defintion of f depends on pooling_convention, which has two options:

  • valid (default):

    f(x, k, p, s) = floor(x+2*p-k)/s+1
    
  • full, which is compatible with Caffe:

    f(x, k, p, s) = ceil(x+2*p-k)/s+1
    

But global_pool is set to be true, then do a global pooling, namely reset kernel=(height, width).

Three pooling options are supported by pool_type:

  • avg: average pooling
  • max: max pooling
  • sum: sum pooling

1-D pooling is special case of 2-D pooling with width=1 and kernel[1]=1.

For 3-D pooling, an additional depth dimension is added before height. Namely the input data will have shape (batch_size, channel, depth, height, width).

Defined in src/operator/pooling.cc:L126

Parameters:
  • data (Symbol) – Input data to the pooling operator.
  • global_pool (boolean, optional, default=False) – Ignore kernel size, do global pooling based on current input feature map.
  • cudnn_off (boolean, optional, default=False) – Turn off cudnn pooling and use MXNet pooling operator.
  • kernel (Shape(tuple), required) – pooling kernel size: (y, x) or (d, y, x)
  • pool_type ({'avg', 'max', 'sum'}, required) – Pooling type to be applied.
  • pooling_convention ({'full', 'valid'},optional, default='valid') – Pooling convention to be applied.
  • stride (Shape(tuple), optional, default=()) – stride: for pooling (y, x) or (d, y, x)
  • pad (Shape(tuple), optional, default=()) – pad for pooling: (y, x) or (d, y, x)
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Pooling_v1(*args, **kwargs)

Perform pooling on the input.

The shapes for 2-D pooling is

  • data: (batch_size, channel, height, width)

  • out: (batch_size, num_filter, out_height, out_width), with:

    out_height = f(height, kernel[0], pad[0], stride[0])
    out_width = f(width, kernel[1], pad[1], stride[1])
    

The defintion of f depends on pooling_convention, which has two options:

  • valid (default):

    f(x, k, p, s) = floor(x+2*p-k)/s+1
    
  • full, which is compatible with Caffe:

    f(x, k, p, s) = ceil(x+2*p-k)/s+1
    

But global_pool is set to be true, then do a global pooling, namely reset kernel=(height, width).

Three pooling options are supported by pool_type:

  • avg: average pooling
  • max: max pooling
  • sum: sum pooling

1-D pooling is special case of 2-D pooling with weight=1 and kernel[1]=1.

For 3-D pooling, an additional depth dimension is added before height. Namely the input data will have shape (batch_size, channel, depth, height, width).

Defined in src/operator/pooling_v1.cc:L84

Parameters:
  • data (Symbol) – Input data to the pooling operator.
  • global_pool (boolean, optional, default=False) – Ignore kernel size, do global pooling based on current input feature map.
  • kernel (Shape(tuple), required) – pooling kernel size: (y, x) or (d, y, x)
  • pool_type ({'avg', 'max', 'sum'}, required) – Pooling type to be applied.
  • pooling_convention ({'full', 'valid'},optional, default='valid') – Pooling convention to be applied.
  • stride (Shape(tuple), optional, default=()) – stride: for pooling (y, x) or (d, y, x)
  • pad (Shape(tuple), optional, default=()) – pad for pooling: (y, x) or (d, y, x)
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.RNN(*args, **kwargs)

Apply a recurrent layer to input.

Parameters:
  • data (Symbol) – Input data to RNN
  • parameters (Symbol) – Vector of all RNN trainable parameters concatenated
  • state (Symbol) – initial hidden state of the RNN
  • state_cell (Symbol) – initial cell state for LSTM networks (only for LSTM)
  • state_size (int (non-negative), required) – size of the state for each layer
  • num_layers (int (non-negative), required) – number of stacked layers
  • bidirectional (boolean, optional, default=False) – whether to use bidirectional recurrent layers
  • mode ({'gru', 'lstm', 'rnn_relu', 'rnn_tanh'}, required) – the type of RNN to compute
  • p (float, optional, default=0) – Dropout probability, fraction of the input that gets dropped out at training time
  • state_outputs (boolean, optional, default=False) – Whether to have the states as symbol outputs.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.ROIPooling(*args, **kwargs)

Performs region-of-interest pooling on inputs. Resize bounding box coordinates by spatial_scale and crop input feature maps accordingly. The cropped feature maps are pooled by max pooling to a fixed size output indicated by pooled_size. batch_size will change to the number of region bounding boxes after ROIPooling

Parameters:
  • data (Symbol) – Input data to the pooling operator, a 4D Feature maps
  • rois (Symbol) – Bounding box coordinates, a 2D array of [[batch_index, x1, y1, x2, y2]]. (x1, y1) and (x2, y2) are top left and down right corners of designated region of interest. batch_index indicates the index of corresponding image in the input data
  • pooled_size (Shape(tuple), required) – fix pooled size: (h, w)
  • spatial_scale (float, required) – Ratio of input feature map height (or w) to raw image height (or w). Equals the reciprocal of total stride in convolutional layers
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Reshape(*args, **kwargs)

Reshape array into a new shape.

The shape is a tuple of int such as (2,3,4). The new shape should not change the array size. For example:

reshape([1,2,3,4], shape=(2,2)) = [[1,2], [3,4]]

In addition, we can use special codes, which are integers less than 1, on some shape dimensions. To inference the output shape, we set it to an empty tuple at beginning. When continuously pop dimensions from the original shape starting from the beginning, and then push translated results into the output shape.

Each special code presents a way of translation.

  • 0 for copying one. Pop one input dimension and push into the output. For example:

    - input=(2,3,4), shape=(4,0,2), output=(4,3,2)
    - input=(2,3,4), shape=(2,0,0), output=(2,3,4)
    
  • -1 for inference. Push a placeholder into the output whose value will be inferred later:

    - input=(2,3,4), shape=(6,1,-1), output=(6,1,4)
    - input=(2,3,4), shape=(3,-1,8), output=(3,1,8)
    - input=(2,3,4), shape=(-1,), output=(24,)
    
  • -2 for copying all. Pop all remaining input dimensions and push them into the output:

    - input=(2,3,4), shape=(-2), output=(9,8,7)
    - input=(2,3,4), shape=(2,-2), output=(2,3,4)
    - input=(2,3,4), shape=(-2,1,1), output=(2,3,4,1,1)
    
  • -3 for merging two dimensions. Pop two input dimensions, compute the product and then push into the output:

    - input=(2,3,4), shape=(-3,4), output=(6,4)
    - input=(2,3,4), shape=(0,-3), output=(2,12)
    - input=(2,3,4), shape=(-3,-2), output=(6,4)
    
  • -4 for splitting two dimensions. Pop one input dimensions, next split it according to the next two dimensions (can contain one -1) specified after this code, then push into the output:

    - input=(2,3,4), shape=(-4,1,2,-2), output=(1,2,3,4)
    - input=(2,3,4), shape=(2,-4,-1,3,-2), output=(2,1,3,4)
    

If the argument reverse is set to be true, then translating the input shape from right to left. For example, with input shape (10, 5, 4) target shape (-1, 0), then the output shape will be (50,4) if reverse=1, otherwise it will be (40,5).

Defined in src/operator/tensor/matrix_op.cc:L78

Parameters:
  • data (Symbol) – Input data to reshape.
  • target_shape (Shape(tuple), optional, default=(0,0)) – (Deprecated! Use shape instead.) Target new shape. One and only one dim can be 0, in which case it will be inferred from the rest of dims
  • keep_highest (boolean, optional, default=False) – (Deprecated! Use shape instead.) Whether keep the highest dim unchanged.If set to true, then the first dim in target_shape is ignored,and always fixed as input
  • shape (Shape(tuple), optional, default=()) – The target shape
  • reverse (boolean, optional, default=False) – If true then translating the input shape from right to left
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.SVMOutput(*args, **kwargs)

Support Vector Machine based transformation on input, backprop L2-SVM

Parameters:
  • data (Symbol) – Input data to svm.
  • label (Symbol) – Label data.
  • margin (float, optional, default=1) – Scale the DType(param_.margin) for activation size
  • regularization_coefficient (float, optional, default=1) – Scale the coefficient responsible for balacing coefficient size and error tradeoff
  • use_linear (boolean, optional, default=False) – If set true, uses L1-SVM objective function. Default uses L2-SVM objective
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.SequenceLast(*args, **kwargs)

Takes the last element of a sequence. Takes an n-dimensional tensor of the form [max sequence length, batchsize, other dims] and returns a (n-1)-dimensional tensor of the form [batchsize, other dims]. This operator takes an optional input tensor sequence_length of positive ints of dimension [batchsize] when the sequence_length option is set to true. This allows the operator to handle variable-length sequences. If sequence_length is false, then each example in the batch is assumed to have the max sequence length.

Parameters:
  • data (Symbol) – n-dimensional input tensor of the form [max sequence length, batchsize, other dims]
  • sequence_length (Symbol) – vector of sequence lengths of size batchsize
  • use_sequence_length (boolean, optional, default=False) – If set to true, this layer takes in extra input sequence_length to specify variable length sequence
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.SequenceMask(*args, **kwargs)

Sets all elements outside the sequence to a constant value. Takes an n-dimensional tensor of the form [max sequence length, batchsize, other dims] and returns a tensor of the same shape. This operator takes an optional input tensor sequence_length of positive ints of dimension [batchsize] when the sequence_length option is set to true. This allows the operator to handle variable-length sequences. If sequence_length is false, then each example in the batch is assumed to have the max sequence length, and this operator becomes the identity operator.

Parameters:
  • data (Symbol) – n-dimensional input tensor of the form [max sequence length, batchsize, other dims]
  • sequence_length (Symbol) – vector of sequence lengths of size batchsize
  • use_sequence_length (boolean, optional, default=False) – If set to true, this layer takes in extra input sequence_length to specify variable length sequence
  • value (float, optional, default=0) – The value to be used as a mask.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.SequenceReverse(*args, **kwargs)

Reverses the elements of each sequence. Takes an n-dimensional tensor of the form [max sequence length, batchsize, other dims] and returns a tensor of the same shape. This operator takes an optional input tensor sequence_length of positive ints of dimension [batchsize] when the sequence_length option is set to true. This allows the operator to handle variable-length sequences. If sequence_length is false, then each example in the batch is assumed to have the max sequence length.

Parameters:
  • data (Symbol) – n-dimensional input tensor of the form [max sequence length, batchsize, other dims]
  • sequence_length (Symbol) – vector of sequence lengths of size batchsize
  • use_sequence_length (boolean, optional, default=False) – If set to true, this layer takes in extra input sequence_length to specify variable length sequence
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.SliceChannel(*args, **kwargs)

Split an array along a particular axis into multiple sub-arrays.

Assume the input array has shape (d_0, ..., d_n) and we slice it into m (num_outputs=m) subarrays along axis k, then we will obtain a list of m arrays with each of which has shape (d_0, ..., d_k/m, ..., d_n).

For example:

x = [[1, 2],
     [3, 4],
     [5, 6],
     [7, 8]]  // 4x2 array

y = split(x, axis=0, num_outputs=4) // a list of 4 arrays
y[0] = [[ 1.,  2.]]  // 1x2 array

z = split(x, axis=0, num_outputs=2) // a list of 2 arrays
z[0] = [[ 1.,  2.],
        [ 3.,  4.]]

When setting optional argument squeeze_axis=1, then the k-dimension will be removed from the shape if it becomes 1:

y = split(x, axis=0, num_outputs=4, squeeze_axis=1)
y[0] = [ 1.,  2.]  // (2,) vector

Defined in src/operator/slice_channel.cc:L50

Parameters:
  • num_outputs (int, required) – Number of outputs to be sliced.
  • axis (int, optional, default='1') – Dimension along which to slice.
  • squeeze_axis (boolean, optional, default=False) – If true, the dimension will be squeezed. Also, input.shape[axis] must be the same as num_outputs when squeeze_axis is turned on.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.Softmax(*args, **kwargs)

DEPRECATED: Perform a softmax transformation on input. Please use SoftmaxOutput

Parameters:
  • data (Symbol) – Input data to softmax.
  • grad_scale (float, optional, default=1) – Scale the gradient by a float factor
  • ignore_label (float, optional, default=-1) – the labels with value equals to ignore_label will be ignored during backward (only works if use_ignore is set to be true).
  • multi_output (boolean, optional, default=False) – If set to true, softmax will applied on axis 1
  • use_ignore (boolean, optional, default=False) – If set to true, the ignore_label value will not contribute to the backward gradient
  • preserve_shape (boolean, optional, default=False) – If true, softmax will applied on the last axis
  • normalization ({'batch', 'null', 'valid'},optional, default='null') – Normalize the gradient
  • out_grad (boolean, optional, default=False) – Apply weighting from output gradient
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.SoftmaxActivation(*args, **kwargs)

Apply softmax activation to input. This is intended for internal layers. For output (loss layer) please use SoftmaxOutput. If mode=instance, this operator will compute a softmax for each instance in the batch; this is the default mode. If mode=channel, this operator will compute a num_channel-class softmax at each position of each instance; this can be used for fully convolutional network, image segmentation, etc.

Parameters:
  • data (Symbol) – Input data to activation function.
  • mode ({'channel', 'instance'},optional, default='instance') – Softmax Mode. If set to instance, this operator will compute a softmax for each instance in the batch; this is the default mode. If set to channel, this operator will compute a num_channel-class softmax at each position of each instance; this can be used for fully convolutional network, image segmentation, etc.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.SoftmaxOutput(*args, **kwargs)

Softmax with logit loss.

In the forward pass, the softmax output is returned. Assume the input data has shape (n,k), then the output will have the same shape as the input, which is computed by

\[out[i,:] = softmax(data[i,:])\]

for \(i=0,...,n-1\), where

\[softmax(x) = \left[..., \frac{exp(x[j])}{exp(x[0])+...+exp(x[k-1])}, ...\right]\]

For general N-D input array with shape \((d_1, ..., d_n)\). Denoted by the size \(s=d_1d_2...d_n\). The way to compute softmax various:

  • preserve_shape is false (default). Reshape input into a 2-D array with shape \((d_1, s/d_1)\) beforing computing the softmax, and then reshaped back to the original shape.

  • preserve_shape is true. For all \(i_1, ..., i_{n-1}\), compute

    \[out[i_1, ..., i_{n-1}, :] = softmax(data[i_1, ..., i_{n-1},:])\]
  • multi_output is true. For all \(i_1, ..., i_{n-1}\), compute

    \[out[i_1, :, ..., i_{n-1}] = softmax(data[i_1, :, ..., i_{n-1}])\]

In the backward pass, the logit loss, also called cross-entroy loss, is added. The provided label can be a (N-1)-D label index array or a N-D label probability array.

Examples with a particular label can be ignored during backward by specifying ignore_label (also need use_ignore to be true).

A scale can be applied to the gradient by grad_scale, which is often used in mutli-loss object function in which we can given each loss different weight. It also supports various ways to normalize the gradient by normalization:

  • null: do nothing
  • batch: divide by batch size (number of examples)
  • valid: divide by the number of examples which are not ignored.

Defined in src/operator/softmax_output.cc:L77

Parameters:
  • data (Symbol) – Input data.
  • label (Symbol) – Ground truth label.
  • grad_scale (float, optional, default=1) – Scale the gradient by a float factor
  • ignore_label (float, optional, default=-1) – the labels with value equals to ignore_label will be ignored during backward (only works if use_ignore is set to be true).
  • multi_output (boolean, optional, default=False) – If set to true, softmax will applied on axis 1
  • use_ignore (boolean, optional, default=False) – If set to true, the ignore_label value will not contribute to the backward gradient
  • preserve_shape (boolean, optional, default=False) – If true, softmax will applied on the last axis
  • normalization ({'batch', 'null', 'valid'},optional, default='null') – Normalize the gradient
  • out_grad (boolean, optional, default=False) – Apply weighting from output gradient
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.SpatialTransformer(*args, **kwargs)

Apply spatial transformer to input feature map.

Parameters:
  • data (Symbol) – Input data to the SpatialTransformerOp.
  • loc (Symbol) – localisation net, the output dim should be 6 when transform_type is affine. You shold initialize the weight and bias with identity tranform.
  • target_shape (Shape(tuple), optional, default=(0,0)) – output shape(h, w) of spatial transformer: (y, x)
  • transform_type ({'affine'}, required) – transformation type
  • sampler_type ({'bilinear'}, required) – sampling type
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.SwapAxis(*args, **kwargs)

Interchange two axes of an array.

Examples:

 x = [[1, 2, 3]])
 swapaxes(x, 0, 1) = [[ 1],
                      [ 2],
                      [ 3]]

 x = [[[ 0, 1],
       [ 2, 3]],
      [[ 4, 5],
       [ 6, 7]]]  // (2,2,2) array

swapaxes(x, 0, 2) = [[[ 0, 4],
                      [ 2, 6]],
                     [[ 1, 5],
                      [ 3, 7]]]

Defined in src/operator/swapaxis.cc:L55

Parameters:
  • data (Symbol) – Input array.
  • dim1 (int (non-negative), optional, default=0) – the first axis to be swapped.
  • dim2 (int (non-negative), optional, default=0) – the second axis to be swapped.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.UpSampling(*args, **kwargs)

Perform nearest neighboor/bilinear up sampling to inputs This function support variable length of positional input.

Parameters:
  • data (Symbol[]) – Array of tensors to upsample
  • scale (int (non-negative), required) – Up sampling scale
  • num_filter (int (non-negative), optional, default=0) – Input filter. Only used by bilinear sample_type.
  • sample_type ({'bilinear', 'nearest'}, required) – upsampling method
  • multi_input_mode ({'concat', 'sum'},optional, default='concat') – How to handle multiple input. concat means concatenate upsampled images along the channel dimension. sum means add all images together, only available for nearest neighbor upsampling.
  • workspace (long (non-negative), optional, default=512) – Tmp workspace for deconvolution (MB)
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.abs(*args, **kwargs)

Returns the absolute value of array elements, element-wise.

For example:
abs([-2, 0, 3]) = [2, 0, 3]

Defined in src/operator/tensor/elemwise_unary_op.cc:L95

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.adam_update(*args, **kwargs)

Updater function for adam optimizer

Parameters:
  • lr (float, required) – learning_rate
  • beta1 (float, optional, default=0.9) – beta1
  • beta2 (float, optional, default=0.999) – beta2
  • epsilon (float, optional, default=1e-08) – epsilon
  • wd (float, optional, default=0) – weight decay
  • rescale_grad (float, optional, default=1) – rescale gradient as grad = rescale_grad*grad.
  • clip_gradient (float, optional, default=-1) – If greater than 0, clip gradient to grad = max(min(grad, -clip_gradient), clip_gradient). Otherwise turned off.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.add_n(*args, **kwargs)

Add all input arguments element-wise.

\[add\_n(a_1, a_2, ..., a_n) = a_1 + a_2 + ... + a_n\]

add_n is potentially more efficient than calling add by n times.

Defined in src/operator/tensor/elemwise_sum.cc:L63 This function support variable length of positional input.

Parameters:
  • args (Symbol[]) – Positional input arguments
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.arccos(*args, **kwargs)

Inverse cosine, element-wise.

The input should be in range \([-1, 1]\). The output is in the closed interval \([0, \pi]\)

\[arccos([-1, -.707, 0, .707, 1]) = [\pi, 3\pi/4, \pi/2, \pi/4, 0]\]

Defined in src/operator/tensor/elemwise_unary_op.cc:L354

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.arccosh(*args, **kwargs)

Inverse hyperbolic cosine, element-wise.

Defined in src/operator/tensor/elemwise_unary_op.cc:L460

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.arcsin(*args, **kwargs)

Inverse sine, element-wise.

The input should be in range \([-1, 1]\). The output is in the closed interval \([-\pi/2, \pi/2]\)

\[arcsin([-1, -.707, 0, .707, 1]) = [-\pi/2, -\pi/4, 0, \pi/4, \pi/2]\]

Defined in src/operator/tensor/elemwise_unary_op.cc:L337

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.arcsinh(*args, **kwargs)

Inverse hyperbolic sine, element-wise.

Defined in src/operator/tensor/elemwise_unary_op.cc:L450

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.arctan(*args, **kwargs)

Inverse tangent, element-wise.

The output is in the closed interval \([-\pi/2, \pi/2]\)

\[arccos([-1, 0, 1]) = [-\pi/4, 0, \pi/4]\]

Defined in src/operator/tensor/elemwise_unary_op.cc:L370

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.arctanh(*args, **kwargs)

Inverse hyperbolic tangent, element-wise.

Defined in src/operator/tensor/elemwise_unary_op.cc:L470

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.argmax(*args, **kwargs)

Returns the indices of the maximum values along an axis.

From:src/operator/tensor/broadcast_reduce_op_index.cc:11

Parameters:
  • data (Symbol) – The input
  • axis (int, optional, default='-1') – Empty or unsigned. The axis to perform the reduction.If left empty, a global reduction will be performed.
  • keepdims (boolean, optional, default=False) – If true, the axis which is reduced is left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.argmax_channel(*args, **kwargs)
Parameters:
  • src (Symbol) – Source input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.argmin(*args, **kwargs)

Returns the indices of the minimum values along an axis.

From:src/operator/tensor/broadcast_reduce_op_index.cc:16

Parameters:
  • data (Symbol) – The input
  • axis (int, optional, default='-1') – Empty or unsigned. The axis to perform the reduction.If left empty, a global reduction will be performed.
  • keepdims (boolean, optional, default=False) – If true, the axis which is reduced is left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.argsort(*args, **kwargs)

Returns the indices that can sort an array.

Examples:

x = [[ 0.3,  0.2,  0.4],
     [ 0.1,  0.3,  0.2]]

// sort along axis -1
argsort(x) = [[ 1.,  0.,  2.],
              [ 0.,  2.,  1.]]

// sort along axis 0
argsort(x, axis=0) = [[ 1.,  0.,  1.]
                      [ 0.,  1.,  0.]]

// flatten and then sort
argsort(x, axis=None) = [ 3.,  1.,  5.,  0.,  4.,  2.]

Defined in src/operator/tensor/ordering_op.cc:L146

Parameters:
  • src (Symbol) – Source input
  • axis (int or None, optional, default='-1') – Axis along which to sort the input tensor. If not given, the flattened array is used. Default is -1.
  • is_ascend (boolean, optional, default=True) – Whether sort in ascending or descending order.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.batch_dot(*args, **kwargs)

Batchwise dot product.

batch_dot is used to compute dot product of x and y when x and y are data in batch, namely 3D arrays in shape of (batch_size, :, :).

For example, given x with shape (batch_size, n, m) and y with shape (batch_size, m, k), the result array will have shape (batch_size, n, k), which is computed by:

batch_dot(x,y)[i,:,:] = dot(x[i,:,:], y[i,:,:])

Defined in src/operator/tensor/matrix_op.cc:L354

Parameters:
  • lhs (Symbol) – The first input
  • rhs (Symbol) – The second input
  • transpose_a (boolean, optional, default=False) – If true then transpose the first input before dot.
  • transpose_b (boolean, optional, default=False) – If true then transpose the second input before dot.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.batch_take(*args, **kwargs)

Take elements from a data batch.

Given an (d0, d1) input array, and (d0,) indices, the output will be a (d0,) computed by:

output[i] = input[i, indices[i]]

Examples:

x = [[ 1.,  2.],
     [ 3.,  4.],
     [ 5.,  6.]]

batch_take(x, [0,1,0]) = [ 1.  4.  5.]

Defined in src/operator/tensor/indexing_op.cc:L131

Parameters:
  • a (Symbol) – Input data array
  • indices (Symbol) – index array
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_add(*args, **kwargs)

Add arguments, element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_basic.cc:L16

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_axes(*args, **kwargs)

Broadcast an array over particular axes.

Broadcasting is allowed on axes which size 1, such as from (2,1,3,1) to (2,8,3,9). Elemenets will be duplicated on the broadcasted axes.

For example:

// given (1,2,1) shape x
x = [[[ 1.],
      [ 2.]]]

// broadcast on axis 2
broadcast_axis(x, axis=2, size=3) = [[[ 1.,  1.,  1.],
                                      [ 2.,  2.,  2.]]]
// broadcast on axes 0 and 2
broadcast_axis(x, axis=(0,2), size=(2,3)) = [[[ 1.,  1.,  1.],
                                              [ 2.,  2.,  2.]],
                                             [[ 1.,  1.,  1.],
                                              [ 2.,  2.,  2.]]]

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L137

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the broadcasting.
  • size (Shape(tuple), optional, default=()) – Target sizes of the broadcasting axes.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_axis(*args, **kwargs)

Broadcast an array over particular axes.

Broadcasting is allowed on axes which size 1, such as from (2,1,3,1) to (2,8,3,9). Elemenets will be duplicated on the broadcasted axes.

For example:

// given (1,2,1) shape x
x = [[[ 1.],
      [ 2.]]]

// broadcast on axis 2
broadcast_axis(x, axis=2, size=3) = [[[ 1.,  1.,  1.],
                                      [ 2.,  2.,  2.]]]
// broadcast on axes 0 and 2
broadcast_axis(x, axis=(0,2), size=(2,3)) = [[[ 1.,  1.,  1.],
                                              [ 2.,  2.,  2.]],
                                             [[ 1.,  1.,  1.],
                                              [ 2.,  2.,  2.]]]

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L137

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the broadcasting.
  • size (Shape(tuple), optional, default=()) – Target sizes of the broadcasting axes.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_div(*args, **kwargs)

Divide arguments, element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_basic.cc:L79

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_equal(*args, **kwargs)

Return (lhs == rhs), element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_logic.cc:L16

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_greater(*args, **kwargs)

Return (lhs > rhs), element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_logic.cc:L30

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_greater_equal(*args, **kwargs)

Return (lhs >= rhs), element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_logic.cc:L37

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_hypot(*args, **kwargs)

Given the “legs” of a right triangle, return its hypotenuse with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_extended.cc:L71

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_lesser(*args, **kwargs)

Return (lhs < rhs), element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_logic.cc:L44

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_lesser_equal(*args, **kwargs)

Return (lhs <= rhs), element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_logic.cc:L51

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_maximum(*args, **kwargs)

Element-wise maximum of array elements with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_extended.cc:L34

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_minimum(*args, **kwargs)

Element-wise minimum of array elements with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_extended.cc:L52

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_minus(*args, **kwargs)

Substract arguments, element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_basic.cc:L39

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_mul(*args, **kwargs)

Multiply arguments, element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_basic.cc:L61

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_not_equal(*args, **kwargs)

Return (lhs != rhs), element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_logic.cc:L23

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_plus(*args, **kwargs)

Add arguments, element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_basic.cc:L16

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_power(*args, **kwargs)

First array elements raised to powers from second array, element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_extended.cc:L16

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_sub(*args, **kwargs)

Substract arguments, element-wise with broadcasting.

Defined in src/operator/tensor/elemwise_binary_broadcast_op_basic.cc:L39

Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.broadcast_to(*args, **kwargs)

Broadcast an array to a new shape.

Broadcasting is allowed on axes which size 1, such as from (2,1,3,1) to (2,8,3,9). Elemenets will be duplicated on the broadcasted axes.

For example:

broadcast_to([[1,2,3]], shape=(2,3)) = [[ 1.,  2.,  3.],
                                        [ 1.,  2.,  3.]])

The dimensions that will not be changed can also use the special code 0 that means copy the original value. So with shape=(2,0) we will obtain the same results in the above example.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L158

Parameters:
  • data (Symbol) – The input
  • shape (Shape(tuple), optional, default=()) – The shape of the desired array. We can set the dim to zero if it’s same as the original. E.g A = broadcast_to(B, shape=(10, 0, 0)) has the same meaning as A = broadcast_axis(B, axis=0, size=10).
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.cast(*args, **kwargs)

Cast to a specified type, element-wise.

For example:

cast([1e20, 11.1], dtype='float16') = [inf, 11.09375]
cast([300, 11.1, 10.9, -1, -3], dtype='uint8') = [44, 11, 10, 255, 253]

Defined in src/operator/tensor/elemwise_unary_op.cc:L65

Parameters:
  • data (Symbol) – Source input
  • dtype ({'float16', 'float32', 'float64', 'int32', 'uint8'}, required) – Output data type.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.ceil(*args, **kwargs)

Return the ceiling of the input, element-wise.

For example::
ceil([-2.1, -1.9, 1.5, 1.9, 2.1]) = [-2., -1., 2., 2., 3.]

Defined in src/operator/tensor/elemwise_unary_op.cc:L132

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.choose_element_0index(*args, **kwargs)

Choose one element from each line(row for python, column for R/Julia) in lhs according to index indicated by rhs. This function assume rhs uses 0-based index.

Parameters:
  • lhs (NDArray) – Left operand to the function.
  • rhs (NDArray) – Right operand to the function.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.clip(*args, **kwargs)

Clip (limit) the values in an array, elementwise

Given an interval, values outside the interval are clipped to the interval edges. That is:

clip(x) = max(min(x, a_max)), a_min)

Defined in src/operator/tensor/matrix_op.cc:L393

Parameters:
  • data (Symbol) – Source input
  • a_min (float, required) – Minimum value
  • a_max (float, required) – Maximum value
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.concat(*args, **kwargs)

Concate a list of array along a given axis.

The dimension sizes of the input arrays on the given axis should be the same.

For example:

x = [[1,1],[1,1]]
y = [[2,2],[2,2]]
z = [[3,3],[3,3],[3,3]]

Concat(x,y,z,dim=0) = [[ 1.,  1.],
                       [ 1.,  1.],
                       [ 2.,  2.],
                       [ 2.,  2.],
                       [ 3.,  3.],
                       [ 3.,  3.],
                       [ 3.,  3.]]

Concat(x,y,z,dim=1) = [[ 1.,  1.,  2.,  2.],
                       [ 1.,  1.,  2.,  2.]]

Defined in src/operator/concat.cc:L70 This function support variable length of positional input.

Parameters:
  • data (Symbol[]) – List of tensors to concatenate
  • dim (int, optional, default='1') – the dimension to be concated.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.cos(*args, **kwargs)

Cosine, element-wise.

Then input is in radians (\(2\pi\) rad equals 360 degress).

\[cos([0, \pi/4, \pi/2]) = [1, 0.707, 0]\]

Defined in src/operator/tensor/elemwise_unary_op.cc:L304

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.cosh(*args, **kwargs)

Hyperbolic cosine, element-wise.

For example::
cosh(x) = 0.5times(exp(x) + exp(-x))

Defined in src/operator/tensor/elemwise_unary_op.cc:L426

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.crop(*args, **kwargs)

Crop a continuous region from the array.

Assume the input array has n dimensions, given begin=(b_1, ..., b_n) and end=(e_1, ..., e_n), then crop will return a region with shape (e_1-b_1, ..., e_n-b_n). The result’s k-th dimension contains elements from the k-th dimension of the input array with the open range [b_k, e_k).

For example:

x = [[  1.,   2.,   3.,   4.],
     [  5.,   6.,   7.,   8.],
     [  9.,  10.,  11.,  12.]]

crop(x, begin=(0,1), end=(2,4)) = [[ 2.,  3.,  4.],
                                   [ 6.,  7.,  8.]]

Defined in src/operator/tensor/matrix_op.cc:L207

Parameters:
  • data (Symbol) – Source input
  • begin (Shape(tuple), required) – starting coordinates
  • end (Shape(tuple), required) – ending coordinates
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.degrees(*args, **kwargs)

Convert angles from radians to degrees.

\[degrees([0, \pi/2, \pi, 3\pi/2, 2\pi]) = [0, 90, 180, 270, 360]\]

Defined in src/operator/tensor/elemwise_unary_op.cc:L384

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.dot(*args, **kwargs)

Dot product of two arrays.

dot‘s behavior depends on the input array dimensions:

  • 1-D arrays: inner product of vectors

  • 2-D arrays: matrix multiplication

  • N-D arrays: a sum product over the last axis of the first input and the first axis of the second input

    For example, given 3-D x with shape (n,m,k) and y with shape (k,r,s), the result array will have shape (n,m,r,s). It is computed by:

    dot(x,y)[i,j,a,b] = sum(x[i,j,:]*y[:,a,b])
    

Defined in src/operator/tensor/matrix_op.cc:L318

Parameters:
  • lhs (Symbol) – The first input
  • rhs (Symbol) – The second input
  • transpose_a (boolean, optional, default=False) – If true then transpose the first input before dot.
  • transpose_b (boolean, optional, default=False) – If true then transpose the second input before dot.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.elemwise_add(*args, **kwargs)
Parameters:
  • lhs (Symbol) – first input
  • rhs (Symbol) – second input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.exp(*args, **kwargs)

Calculate the exponential of the array, element-wise

For example::
exp(x) = e^x approx 2.718^x

Defined in src/operator/tensor/elemwise_unary_op.cc:L215

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.expand_dims(*args, **kwargs)

Insert a new axis with size 1 into the array shape

For example, given x with shape (2,3,4), then expand_dims(x, axis=1) will return a new array with shape (2,1,3,4).

Defined in src/operator/tensor/matrix_op.cc:L175

Parameters:
  • data (Symbol) – Source input
  • axis (int (non-negative), required) – Position (amongst axes) where new axis is to be inserted.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.expm1(*args, **kwargs)

Calculate exp(x) - 1

This function provides greater precision than exp(x) - 1 for small values of x.

Defined in src/operator/tensor/elemwise_unary_op.cc:L288

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.fill_element_0index(*args, **kwargs)

Fill one element of each line(row for python, column for R/Julia) in lhs according to index indicated by rhs and values indicated by mhs. This function assume rhs uses 0-based index.

Parameters:
  • lhs (NDArray) – Left operand to the function.
  • mhs (NDArray) – Middle operand to the function.
  • rhs (NDArray) – Right operand to the function.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.fix(*args, **kwargs)

Round elements of the array to the nearest integer towards zero, element-wise.

For example::
fix([-2.1, -1.9, 1.9, 2.1]) = [-2., -1., 1., 2.]

Defined in src/operator/tensor/elemwise_unary_op.cc:L164

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.flatten(*args, **kwargs)

Flatten input into a 2-D array by collapsing the higher dimensions.

Assume the input array has shape (d1, d2, ..., dk), then flatten reshapes the input array into shape (d1, d2*...*dk).

Defined in src/operator/tensor/matrix_op.cc:L101

Parameters:
  • data (Symbol) – Input data to reshape.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.flip(*args, **kwargs)

Reverse elements of an array with axis

From:src/operator/tensor/matrix_op.cc:512

Parameters:
  • data (NDArray) – Input data array
  • axis (Shape(tuple), required) – The axis which to reverse elements.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.floor(*args, **kwargs)

Return the floor of the input, element-wise.

For example::
floor([-2.1, -1.9, 1.5, 1.9, 2.1]) = [-3., -2., 1., 1., 2.]

Defined in src/operator/tensor/elemwise_unary_op.cc:L141

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.gamma(*args, **kwargs)

The gamma function (extension of the factorial function), element-wise

From:src/operator/tensor/elemwise_unary_op.cc:479

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.gammaln(*args, **kwargs)

Log of the absolute value of the gamma function, element-wise

From:src/operator/tensor/elemwise_unary_op.cc:488

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.identity(*args, **kwargs)

Identity mapping, copy src to output

From:src/operator/tensor/elemwise_unary_op.cc:15

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.log(*args, **kwargs)

Natural logarithm, element-wise.

The natural logarithm is logarithm in base e, so that log(exp(x)) = x

Defined in src/operator/tensor/elemwise_unary_op.cc:L225

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.log10(*args, **kwargs)

Calculate the base 10 logarithm of the array, element-wise.

10**log10(x) = x

Defined in src/operator/tensor/elemwise_unary_op.cc:L235

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.log1p(*args, **kwargs)

Calculate log(1 + x)

This function is more accurate than log(1 + x) for small x so that \(1+x\approx 1\)

Defined in src/operator/tensor/elemwise_unary_op.cc:L275

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.log2(*args, **kwargs)

Calculate the base 2 logarithm of the array, element-wise.

2**log2(x) = x

Defined in src/operator/tensor/elemwise_unary_op.cc:L245

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.log_softmax(*args, **kwargs)
Parameters:
  • data (Symbol) – The input
  • axis (int, optional, default='-1') – The axis along which to compute softmax. By default use the last axis
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.max(*args, **kwargs)

Compute the max of array elements over given axes.

The argument axis specifies the axes to compute over:

  • (): compute over all elements into a scalar array with shape (1,). This is the default option.
  • int: compute over along a particular axis. If input has shape (n, m, k), use axis=0 will result in an array with shape (m, k).
  • tuple of int: compute over multiple axes. Again assume input shape (n, m, k), with axis=(0,2) we obtain a (m,) shape array.

If keepdims = 1, then the result array will has the same number of dimensions as the input, while the reduced axes will have size 1.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L98

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the reduction.
  • keepdims (boolean, optional, default=False) – If true, the axes which are reduced are left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.max_axis(*args, **kwargs)

Compute the max of array elements over given axes.

The argument axis specifies the axes to compute over:

  • (): compute over all elements into a scalar array with shape (1,). This is the default option.
  • int: compute over along a particular axis. If input has shape (n, m, k), use axis=0 will result in an array with shape (m, k).
  • tuple of int: compute over multiple axes. Again assume input shape (n, m, k), with axis=(0,2) we obtain a (m,) shape array.

If keepdims = 1, then the result array will has the same number of dimensions as the input, while the reduced axes will have size 1.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L98

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the reduction.
  • keepdims (boolean, optional, default=False) – If true, the axes which are reduced are left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.mean(*args, **kwargs)

Compute the mean of array elements over given axes.

The argument axis specifies the axes to compute over:

  • (): compute over all elements into a scalar array with shape (1,). This is the default option.
  • int: compute over along a particular axis. If input has shape (n, m, k), use axis=0 will result in an array with shape (m, k).
  • tuple of int: compute over multiple axes. Again assume input shape (n, m, k), with axis=(0,2) we obtain a (m,) shape array.

If keepdims = 1, then the result array will has the same number of dimensions as the input, while the reduced axes will have size 1.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L53

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the reduction.
  • keepdims (boolean, optional, default=False) – If true, the axes which are reduced are left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.min(*args, **kwargs)

Compute the min of array elements over given axes.

The argument axis specifies the axes to compute over:

  • (): compute over all elements into a scalar array with shape (1,). This is the default option.
  • int: compute over along a particular axis. If input has shape (n, m, k), use axis=0 will result in an array with shape (m, k).
  • tuple of int: compute over multiple axes. Again assume input shape (n, m, k), with axis=(0,2) we obtain a (m,) shape array.

If keepdims = 1, then the result array will has the same number of dimensions as the input, while the reduced axes will have size 1.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L108

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the reduction.
  • keepdims (boolean, optional, default=False) – If true, the axes which are reduced are left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.min_axis(*args, **kwargs)

Compute the min of array elements over given axes.

The argument axis specifies the axes to compute over:

  • (): compute over all elements into a scalar array with shape (1,). This is the default option.
  • int: compute over along a particular axis. If input has shape (n, m, k), use axis=0 will result in an array with shape (m, k).
  • tuple of int: compute over multiple axes. Again assume input shape (n, m, k), with axis=(0,2) we obtain a (m,) shape array.

If keepdims = 1, then the result array will has the same number of dimensions as the input, while the reduced axes will have size 1.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L108

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the reduction.
  • keepdims (boolean, optional, default=False) – If true, the axes which are reduced are left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.nanprod(*args, **kwargs)

Compute the product of array elements over given axes with NaN ignored

Refer to prod for more details.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L88

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the reduction.
  • keepdims (boolean, optional, default=False) – If true, the axes which are reduced are left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.nansum(*args, **kwargs)

Compute the sum of array elements over given axes with NaN ignored

Refer to sum for more details.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L75

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the reduction.
  • keepdims (boolean, optional, default=False) – If true, the axes which are reduced are left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.negative(*args, **kwargs)

Negate src

From:src/operator/tensor/elemwise_unary_op.cc:84

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.norm(*args, **kwargs)

Compute the L2 norm.

Flatten then input array and then compute the l2 norm.

Examples:

x = [[1, 2],
     [3, 4]]

norm(x) = [5.47722578]

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L182

Parameters:
  • src (Symbol) – Source input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.normal(*args, **kwargs)

Draw random samples from a normal (Gaussian) distribution.

Examples:

normal(loc=0, scale=1, shape=(2,2)) = [[ 1.89171135, -1.16881478],
                                       [-1.23474145,  1.55807114]]

Defined in src/operator/tensor/sample_op.cc:L35

Parameters:
  • loc (float, optional, default=0) – Mean of the distribution.
  • scale (float, optional, default=1) – Standard deviation of the distribution.
  • shape (Shape(tuple), optional, default=()) – The shape of the output
  • ctx (string, optional, default='') – Context of output, in format [cpu|gpu|cpu_pinned](n).Only used for imperative calls.
  • dtype ({'None', 'float16', 'float32', 'float64'},optional, default='None') – DType of the output. If output given, set to type of output.If output not given and type not defined (dtype=None), set to float32.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.one_hot(*args, **kwargs)

Returns a one-hot array.

The locations represented by indices take value on_value, while all other locations take value off_value.

Assume indices has shape (i0, i1), then the output will have shape (i0, i1, depth) and:

output[i,j,:] = off_value
output[i,j,indices[i,j]] = on_value

Examples:

one_hot([1,0,2,0], 3) = [[ 0.  1.  0.]
                         [ 1.  0.  0.]
                         [ 0.  0.  1.]
                         [ 1.  0.  0.]]

one_hot([1,0,2,0], 3, on_value=8, off_value=1,
        dtype='int32') = [[1 8 1]
                          [8 1 1]
                          [1 1 8]
                          [8 1 1]]

one_hot([[1,0],[1,0],[2,0]], 3) = [[[ 0.  1.  0.]
                                    [ 1.  0.  0.]]

                                   [[ 0.  1.  0.]
                                    [ 1.  0.  0.]]

                                   [[ 0.  0.  1.]
                                    [ 1.  0.  0.]]]

Defined in src/operator/tensor/indexing_op.cc:L177

Parameters:
  • indices (Symbol) – array of locations where to set on_value
  • depth (int, required) – The dimension size at dim = axis.
  • on_value (double, optional, default=1) – The value assigned to the locations represented by indices.
  • off_value (double, optional, default=0) – The value assigned to the locations not represented by indices.
  • dtype ({'float16', 'float32', 'float64', 'int32', 'uint8'},optional, default='float32') – DType of the output
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.pad(*args, **kwargs)

Pad an array.

Only supports 4-D and 5-D input array.

Defined in src/operator/pad.cc:L407

Parameters:
  • data (Symbol) – An n-dimensional input tensor.
  • mode ({'constant', 'edge'}, required) – Padding type to use. “constant” pads all values with a constant value, the value of which can be specified with the constant_value option. “edge” uses the boundary values of the array as padding.
  • pad_width (Shape(tuple), required) – A tuple of padding widths of length 2*r, where r is the rank of the input tensor, specifying number of values padded to the edges of each axis. (before_1, after_1, ... , before_N, after_N) unique pad widths for each axis. Equivalent to pad_width in numpy.pad, but flattened.
  • constant_value (double, optional, default=0) – This option is only used when mode is “constant”. This value will be used as the padding value. Defaults to 0 if not specified.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.prod(*args, **kwargs)

Compute the product of array elements over given axes.

The argument axis specifies the axes to compute over:

  • (): compute over all elements into a scalar array with shape (1,). This is the default option.
  • int: compute over along a particular axis. If input has shape (n, m, k), use axis=0 will result in an array with shape (m, k).
  • tuple of int: compute over multiple axes. Again assume input shape (n, m, k), with axis=(0,2) we obtain a (m,) shape array.

If keepdims = 1, then the result array will has the same number of dimensions as the input, while the reduced axes will have size 1.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L62

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the reduction.
  • keepdims (boolean, optional, default=False) – If true, the axes which are reduced are left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.radians(*args, **kwargs)

Convert angles from degrees to radians.

\[radians([0, 90, 180, 270, 360]) = [0, \pi/2, \pi, 3\pi/2, 2\pi]\]

Defined in src/operator/tensor/elemwise_unary_op.cc:L398

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.repeat(*args, **kwargs)

Repeat elements of an array.

In default, repeat flatten the input array into 1-D and then repeat the elements:

x = [[ 1, 2],
     [ 3, 4]]

repeat(x, repeats=2) = [ 1.,  1.,  2.,  2.,  3.,  3.,  4.,  4.]

We can also choose a particular axis to repeat, in which a negative axis is interpreted counting from the backward:

repeat(x, repeats=2, axis=1) = [[ 1.,  1.,  2.,  2.],
                                [ 3.,  3.,  4.,  4.]]

repeat(x, repeats=2, axis=-1) = [[ 1.,  2.],
                                 [ 1.,  2.],
                                 [ 3.,  4.],
                                 [ 3.,  4.]]

Defined in src/operator/tensor/matrix_op.cc:L432

Parameters:
  • data (Symbol) – Input data array
  • repeats (int, required) – The number of repetitions for each element.
  • axis (int or None, optional, default='None') – The axis along which to repeat values. The negative numbers are interpreted counting from the backward. By default, use the flattened input array, and return a flat output array.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.reshape(*args, **kwargs)

Reshape array into a new shape.

The shape is a tuple of int such as (2,3,4). The new shape should not change the array size. For example:

reshape([1,2,3,4], shape=(2,2)) = [[1,2], [3,4]]

In addition, we can use special codes, which are integers less than 1, on some shape dimensions. To inference the output shape, we set it to an empty tuple at beginning. When continuously pop dimensions from the original shape starting from the beginning, and then push translated results into the output shape.

Each special code presents a way of translation.

  • 0 for copying one. Pop one input dimension and push into the output. For example:

    - input=(2,3,4), shape=(4,0,2), output=(4,3,2)
    - input=(2,3,4), shape=(2,0,0), output=(2,3,4)
    
  • -1 for inference. Push a placeholder into the output whose value will be inferred later:

    - input=(2,3,4), shape=(6,1,-1), output=(6,1,4)
    - input=(2,3,4), shape=(3,-1,8), output=(3,1,8)
    - input=(2,3,4), shape=(-1,), output=(24,)
    
  • -2 for copying all. Pop all remaining input dimensions and push them into the output:

    - input=(2,3,4), shape=(-2), output=(9,8,7)
    - input=(2,3,4), shape=(2,-2), output=(2,3,4)
    - input=(2,3,4), shape=(-2,1,1), output=(2,3,4,1,1)
    
  • -3 for merging two dimensions. Pop two input dimensions, compute the product and then push into the output:

    - input=(2,3,4), shape=(-3,4), output=(6,4)
    - input=(2,3,4), shape=(0,-3), output=(2,12)
    - input=(2,3,4), shape=(-3,-2), output=(6,4)
    
  • -4 for splitting two dimensions. Pop one input dimensions, next split it according to the next two dimensions (can contain one -1) specified after this code, then push into the output:

    - input=(2,3,4), shape=(-4,1,2,-2), output=(1,2,3,4)
    - input=(2,3,4), shape=(2,-4,-1,3,-2), output=(2,1,3,4)
    

If the argument reverse is set to be true, then translating the input shape from right to left. For example, with input shape (10, 5, 4) target shape (-1, 0), then the output shape will be (50,4) if reverse=1, otherwise it will be (40,5).

Defined in src/operator/tensor/matrix_op.cc:L78

Parameters:
  • data (Symbol) – Input data to reshape.
  • target_shape (Shape(tuple), optional, default=(0,0)) – (Deprecated! Use shape instead.) Target new shape. One and only one dim can be 0, in which case it will be inferred from the rest of dims
  • keep_highest (boolean, optional, default=False) – (Deprecated! Use shape instead.) Whether keep the highest dim unchanged.If set to true, then the first dim in target_shape is ignored,and always fixed as input
  • shape (Shape(tuple), optional, default=()) – The target shape
  • reverse (boolean, optional, default=False) – If true then translating the input shape from right to left
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.reverse(*args, **kwargs)

Reverse elements of an array with axis

From:src/operator/tensor/matrix_op.cc:512

Parameters:
  • data (NDArray) – Input data array
  • axis (Shape(tuple), required) – The axis which to reverse elements.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.rint(*args, **kwargs)

Round elements of the array to the nearest integer, element-wise.

For example::
rint([-2.1, -1.9, 1.5, 1.9, 2.1]) = [-2., -2., 1., 2., 2.]

The difference to round is that rint returns n for input n.5 while round returns n+1 for n>=0.

Defined in src/operator/tensor/elemwise_unary_op.cc:L154

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.rmsprop_update(*args, **kwargs)

Updater function for RMSProp optimizer. The RMSProp code follows the version in http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf Tieleman & Hinton, 2012.

Parameters:
  • lr (float, required) – learning_rate
  • gamma1 (float, optional, default=0.95) – gamma1
  • epsilon (float, optional, default=1e-08) – epsilon
  • wd (float, optional, default=0) – weight decay
  • rescale_grad (float, optional, default=1) – rescale gradient as grad = rescale_grad*grad.
  • clip_gradient (float, optional, default=-1) – If greater than 0, clip gradient to grad = max(min(grad, -clip_gradient), clip_gradient). Otherwise turned off.
  • clip_weights (float, optional, default=-1) – If greater than 0, clip weights to weights = max(min(weights, -clip_weights), clip_weights). Otherwise turned off.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.rmspropalex_update(*args, **kwargs)

Updater function for RMSPropAlex optimizer. The RMSPropAlex code follows the version in http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.

Parameters:
  • lr (float, required) – learning_rate
  • gamma1 (float, optional, default=0.95) – gamma1
  • gamma2 (float, optional, default=0.9) – gamma2
  • epsilon (float, optional, default=1e-08) – epsilon
  • wd (float, optional, default=0) – weight decay
  • rescale_grad (float, optional, default=1) – rescale gradient as grad = rescale_grad*grad.
  • clip_gradient (float, optional, default=-1) – If greater than 0, clip gradient to grad = max(min(grad, -clip_gradient), clip_gradient). Otherwise turned off.
  • clip_weights (float, optional, default=-1) – If greater than 0, clip weights to weights = max(min(weights, -clip_weights), clip_weights). Otherwise turned off.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.round(*args, **kwargs)

Round elements of the array to the nearest integer, element-wise.

For example::
round([-2.1, -1.9, 1.5, 1.9, 2.1]) = [-2., -2., 2., 2., 2.]

Defined in src/operator/tensor/elemwise_unary_op.cc:L122

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.rsqrt(*args, **kwargs)

Calculate the inverse square-root of an array, element-wise.

For example::
rsqrt(x) = 1/sqrt{x}

Defined in src/operator/tensor/elemwise_unary_op.cc:L200

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.sgd_mom_update(*args, **kwargs)

Updater function for sgd optimizer

Parameters:
  • lr (float, required) – learning_rate
  • momentum (float, optional, default=0) – momentum
  • wd (float, optional, default=0) – weight decay
  • rescale_grad (float, optional, default=1) – rescale gradient as grad = rescale_grad*grad.
  • clip_gradient (float, optional, default=-1) – If greater than 0, clip gradient to grad = max(min(grad, -clip_gradient), clip_gradient). Otherwise turned off.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.sgd_update(*args, **kwargs)

Updater function for sgd optimizer

Parameters:
  • lr (float, required) – learning_rate
  • wd (float, optional, default=0) – weight decay
  • rescale_grad (float, optional, default=1) – rescale gradient as grad = rescale_grad*grad.
  • clip_gradient (float, optional, default=-1) – If greater than 0, clip gradient to grad = max(min(grad, -clip_gradient), clip_gradient). Otherwise turned off.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.sign(*args, **kwargs)

Returns the indication sign of array elements, element-wise.

For example::
sign([-2, 0, 3]) = [-1, 0, 1]

Defined in src/operator/tensor/elemwise_unary_op.cc:L109

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.sin(*args, **kwargs)

Trigonometric sine, element-wise.

Then input is in radians (\(2\pi\) rad equals 360 degress).

\[sin([0, \pi/4, \pi/2]) = [0, 0.707, 1]\]

Defined in src/operator/tensor/elemwise_unary_op.cc:L261

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.sinh(*args, **kwargs)

Hyperbolic sine, element-wise.

For example::
sinh(x) = 0.5times(exp(x) - exp(-x))

Defined in src/operator/tensor/elemwise_unary_op.cc:L412

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.slice(*args, **kwargs)

Crop a continuous region from the array.

Assume the input array has n dimensions, given begin=(b_1, ..., b_n) and end=(e_1, ..., e_n), then crop will return a region with shape (e_1-b_1, ..., e_n-b_n). The result’s k-th dimension contains elements from the k-th dimension of the input array with the open range [b_k, e_k).

For example:

x = [[  1.,   2.,   3.,   4.],
     [  5.,   6.,   7.,   8.],
     [  9.,  10.,  11.,  12.]]

crop(x, begin=(0,1), end=(2,4)) = [[ 2.,  3.,  4.],
                                   [ 6.,  7.,  8.]]

Defined in src/operator/tensor/matrix_op.cc:L207

Parameters:
  • data (Symbol) – Source input
  • begin (Shape(tuple), required) – starting coordinates
  • end (Shape(tuple), required) – ending coordinates
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.slice_axis(*args, **kwargs)

Slice along a given axis.

Examples

x = [[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.], [ 9., 10., 11., 12.]]
slice_axis(x, axis=0, begin=1, end=3) = [[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.]]
slice_axis(x, axis=1, begin=0, end=2) = [[ 1., 2.],
[ 5., 6.], [ 9., 10.]]
slice_axis(x, axis=1, begin=-3, end=-1) = [[ 2., 3.],
[ 6., 7.], [ 10., 11.]]

Defined in src/operator/tensor/matrix_op.cc:L285

Parameters:
  • data (Symbol) – Source input
  • axis (int, required) – The axis to be sliced. Negative axis means to count from the last to the first axis.
  • begin (int, required) – The beginning index to be sliced. Negative values are interpreted as counting from the backward.
  • end (int or None, required) – The end index to be sliced. The end can be None, in which case all the rest elements are used. Also, negative values are interpreted as counting from the backward.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.smooth_l1(*args, **kwargs)

Calculate Smooth L1 Loss(lhs, scalar)

From:src/operator/tensor/elemwise_binary_scalar_op_extended.cc:63

Parameters:
  • data (Symbol) – source input
  • scalar (float) – scalar input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.softmax(*args, **kwargs)
Parameters:
  • data (Symbol) – The input
  • axis (int, optional, default='-1') – The axis along which to compute softmax. By default use the last axis
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.softmax_cross_entropy(*args, **kwargs)

Calculate cross_entropy(data, one_hot(label))

From:src/operator/loss_binary_op.cc:12

Parameters:
  • data (NDArray) – Input data
  • label (NDArray) – Input label
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.sort(*args, **kwargs)

Return a sorted copy of an array.

Examples:

x = [[ 1, 4],
     [ 3, 1]]

// sort along the last axis
sort(x) = [[ 1.,  4.],
           [ 1.,  3.]]

// flatten and then sort
sort(x, axis=None) = [ 1.,  1.,  3.,  4.]

// sort long the first axis
sort(x, axis=0) = [[ 1.,  1.],
                   [ 3.,  4.]]

// in a descend order
sort(x, is_ascend=0) = [[ 4.,  1.],
                        [ 3.,  1.]]

Defined in src/operator/tensor/ordering_op.cc:L99

Parameters:
  • src (Symbol) – Source input
  • axis (int or None, optional, default='-1') – Axis along which to choose sort the input tensor. If not given, the flattened array is used. Default is -1.
  • is_ascend (boolean, optional, default=True) – Whether sort in ascending or descending order.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.split(*args, **kwargs)

Split an array along a particular axis into multiple sub-arrays.

Assume the input array has shape (d_0, ..., d_n) and we slice it into m (num_outputs=m) subarrays along axis k, then we will obtain a list of m arrays with each of which has shape (d_0, ..., d_k/m, ..., d_n).

For example:

x = [[1, 2],
     [3, 4],
     [5, 6],
     [7, 8]]  // 4x2 array

y = split(x, axis=0, num_outputs=4) // a list of 4 arrays
y[0] = [[ 1.,  2.]]  // 1x2 array

z = split(x, axis=0, num_outputs=2) // a list of 2 arrays
z[0] = [[ 1.,  2.],
        [ 3.,  4.]]

When setting optional argument squeeze_axis=1, then the k-dimension will be removed from the shape if it becomes 1:

y = split(x, axis=0, num_outputs=4, squeeze_axis=1)
y[0] = [ 1.,  2.]  // (2,) vector

Defined in src/operator/slice_channel.cc:L50

Parameters:
  • num_outputs (int, required) – Number of outputs to be sliced.
  • axis (int, optional, default='1') – Dimension along which to slice.
  • squeeze_axis (boolean, optional, default=False) – If true, the dimension will be squeezed. Also, input.shape[axis] must be the same as num_outputs when squeeze_axis is turned on.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.sqrt(*args, **kwargs)

Calculate the square-root of an array, element-wise.

For example::
sqrt(x) = sqrt{x}

Defined in src/operator/tensor/elemwise_unary_op.cc:L187

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.square(*args, **kwargs)

Calculate the square of an array, element-wise.

For example::
square(x) = x^2

Defined in src/operator/tensor/elemwise_unary_op.cc:L174

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.sum(*args, **kwargs)

Compute the sum of array elements over given axes.

The argument axis specifies the axes to compute over:

  • (): compute over all elements into a scalar array with shape (1,). This is the default option.
  • int: compute over along a particular axis. If input has shape (n, m, k), use axis=0 will result in an array with shape (m, k).
  • tuple of int: compute over multiple axes. Again assume input shape (n, m, k), with axis=(0,2) we obtain a (m,) shape array.

If keepdims = 1, then the result array will has the same number of dimensions as the input, while the reduced axes will have size 1.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L44

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the reduction.
  • keepdims (boolean, optional, default=False) – If true, the axes which are reduced are left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.sum_axis(*args, **kwargs)

Compute the sum of array elements over given axes.

The argument axis specifies the axes to compute over:

  • (): compute over all elements into a scalar array with shape (1,). This is the default option.
  • int: compute over along a particular axis. If input has shape (n, m, k), use axis=0 will result in an array with shape (m, k).
  • tuple of int: compute over multiple axes. Again assume input shape (n, m, k), with axis=(0,2) we obtain a (m,) shape array.

If keepdims = 1, then the result array will has the same number of dimensions as the input, while the reduced axes will have size 1.

Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L44

Parameters:
  • data (Symbol) – The input
  • axis (Shape(tuple), optional, default=()) – The axes to perform the reduction.
  • keepdims (boolean, optional, default=False) – If true, the axes which are reduced are left in the result as dimension with size one.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.swapaxes(*args, **kwargs)

Interchange two axes of an array.

Examples:

 x = [[1, 2, 3]])
 swapaxes(x, 0, 1) = [[ 1],
                      [ 2],
                      [ 3]]

 x = [[[ 0, 1],
       [ 2, 3]],
      [[ 4, 5],
       [ 6, 7]]]  // (2,2,2) array

swapaxes(x, 0, 2) = [[[ 0, 4],
                      [ 2, 6]],
                     [[ 1, 5],
                      [ 3, 7]]]

Defined in src/operator/swapaxis.cc:L55

Parameters:
  • data (Symbol) – Input array.
  • dim1 (int (non-negative), optional, default=0) – the first axis to be swapped.
  • dim2 (int (non-negative), optional, default=0) – the second axis to be swapped.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.take(*args, **kwargs)

Take elements from an array along an axis.

Slice along a particular axis with the provided indices. E.g., given an input array with shape (d0, d1, d2) and indices with shape (i0, i1), then the output will have shape (i0, i1, d1, d2), with:

output[i,j,:,:] = input[indices[i,j],:,:]

Examples:

 x = [[ 1.,  2.],
      [ 3.,  4.],
      [ 5.,  6.]]

take(x, [[0,1],[1,2]]) = [[[ 1.,  2.],
                           [ 3.,  4.]],

                          [[ 3.,  4.],
                           [ 5.,  6.]]]

Note

Only slicing axis 0 is supported now.

Defined in src/operator/tensor/indexing_op.cc:L79

Parameters:
  • a (Symbol) – The source array.
  • indices (Symbol) – The indices of the values to extract.
  • axis (int, optional, default='0') – the axis of data tensor to be taken.
  • mode ({'clip', 'raise', 'wrap'},optional, default='raise') – specify how out-of-bound indices bahave.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.tan(*args, **kwargs)

Tangent, element-wise.

Then input is in radians (\(2\pi\) rad equals 360 degress).

\[tan([0, \pi/4, \pi/2]) = [0, 1, -inf]\]

Defined in src/operator/tensor/elemwise_unary_op.cc:L320

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.tanh(*args, **kwargs)

Hyperbolic tangent element-wise.

For example::
tanh(x) = sinh(x) / cosh(x)

Defined in src/operator/tensor/elemwise_unary_op.cc:L440

Parameters:
  • data (Symbol) – The input
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.tile(*args, **kwargs)

Repeat the whole array by multiple times.

If reps has length d, and input array has dimension of n. There are there cases:

  • n=d. Repeat i-th dimension of the input by reps[i] times:

    x = [[1, 2],
         [3, 4]]
    
    tile(x, reps=(2,3)) = [[ 1.,  2.,  1.,  2.,  1.,  2.],
                           [ 3.,  4.,  3.,  4.,  3.,  4.],
                           [ 1.,  2.,  1.,  2.,  1.,  2.],
                           [ 3.,  4.,  3.,  4.,  3.,  4.]]
    
  • n>d. reps is promoted to length n by pre-pending 1’s to it. Thus for an input shape (2,3), repos=(2,) is treated as (1,2):

    tile(x, reps=(2,)) = [[ 1.,  2.,  1.,  2.],
                          [ 3.,  4.,  3.,  4.]]
    
  • n<d. The input is promoted to be d-dimensional by prepending new axes. So a shape (2,2) array is promoted to (1,2,2) for 3-D replication:

    tile(x, reps=(2,2,3)) = [[[ 1.,  2.,  1.,  2.,  1.,  2.],
                              [ 3.,  4.,  3.,  4.,  3.,  4.],
                              [ 1.,  2.,  1.,  2.,  1.,  2.],
                              [ 3.,  4.,  3.,  4.,  3.,  4.]],
    
                             [[ 1.,  2.,  1.,  2.,  1.,  2.],
                              [ 3.,  4.,  3.,  4.,  3.,  4.],
                              [ 1.,  2.,  1.,  2.,  1.,  2.],
                              [ 3.,  4.,  3.,  4.,  3.,  4.]]]
    

Defined in src/operator/tensor/matrix_op.cc:L489

Parameters:
  • data (Symbol) – Input data array
  • reps (Shape(tuple), required) – The number of times for repeating the tensor a. If reps has length d, the result will have dimension of max(d, a.ndim); If a.ndim < d, a is promoted to be d-dimensional by prepending new axes. If a.ndim > d, reps is promoted to a.ndim by pre-pending 1’s to it.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.topk(*args, **kwargs)

Return the top k elements in an array.

Examples:

x = [[ 0.3,  0.2,  0.4],
     [ 0.1,  0.3,  0.2]]

// return the index of the largest element on last axis
topk(x) = [[ 2.],
           [ 1.]]

// return the value of the top-2 elements on last axis
topk(x, ret_typ='value', k=2) = [[ 0.4,  0.3],
                                 [ 0.3,  0.2]]

// flatten and then return both index and value
topk(x, ret_typ='both', k=2, axis=None) = [ 0.4,  0.3], [ 2.,  0.]

Defined in src/operator/tensor/ordering_op.cc:L36

Parameters:
  • src (Symbol) – Source input
  • axis (int or None, optional, default='-1') – Axis along which to choose the top k indices. If not given, the flattened array is used. Default is -1.
  • k (int, optional, default='1') – Number of top elements to select, should be always smaller than or equal to the element number in the given axis. A global sort is performed if set k < 1.
  • ret_typ ({'both', 'indices', 'mask', 'value'},optional, default='indices') – The return type. “value” means returning the top k values, “indices” means returning the indices of the top k values, “mask” means to return a mask array containing 0 and 1. 1 means the top k values. “both” means to return both value and indices.
  • is_ascend (boolean, optional, default=False) – Whether to choose k largest or k smallest. Top K largest elements will be chosen if set to false.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.transpose(*args, **kwargs)

Permute the dimensions of an array.

Examples:

x = [[ 1, 2],
     [ 3, 4]]

transpose(x) = [[ 1.,  3.],
                [ 2.,  4.]]

x = [[[ 1.,  2.],
      [ 3.,  4.]],

     [[ 5.,  6.],
      [ 7.,  8.]]]

transpose(x) = [[[ 1.,  5.],
                 [ 3.,  7.]],

                [[ 2.,  6.],
                 [ 4.,  8.]]]

transpose(x, axes=(1,0,2)) = [[[ 1.,  2.],
                               [ 5.,  6.]],

                              [[ 3.,  4.],
                               [ 7.,  8.]]]

Defined in src/operator/tensor/matrix_op.cc:L142

Parameters:
  • data (Symbol) – Source input
  • axes (Shape(tuple), optional, default=()) – Target axis order. By default the axes will be inverted.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.uniform(*args, **kwargs)

Draw samples from a uniform distribution.

Samples are uniformly distributed over the half-open interval [low, high) (includes low, but excludes high):

nd.uniform(low=0, high=1, shape=(2,2)) = [[ 0.60276335,  0.85794562],
                                          [ 0.54488319,  0.84725171]]

Defined in src/operator/tensor/sample_op.cc:L24

Parameters:
  • low (float, optional, default=0) – The lower bound of distribution
  • high (float, optional, default=1) – The upper bound of distribution
  • shape (Shape(tuple), optional, default=()) – The shape of the output
  • ctx (string, optional, default='') – Context of output, in format [cpu|gpu|cpu_pinned](n).Only used for imperative calls.
  • dtype ({'None', 'float16', 'float32', 'float64'},optional, default='None') – DType of the output. If output given, set to type of output.If output not given and type not defined (dtype=None), set to float32.
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol

mxnet.symbol.where(*args, **kwargs)

Given three ndarrays, condition, x, and y, return an ndarray with the elements from x or y, depending on the elements from condition are true or false. x and y must have the same shape. If condition has the same shape as x, each element in the output array is from x if the corresponding element in the condition is true, and from y if false. If condtion does not have the same shape as x, it must be a 1D array whose size is the same as x’s first dimension size. Each row of the output array is from x’s row if the corresponding element from condition is true, and from y’s row if false.

From:src/operator/tensor/control_flow_op.cc:21

Parameters:
  • condition (NDArray) – condition array
  • x (NDArray) –
  • y (NDArray) –
  • name (string, optional.) – Name of the resulting symbol.
Returns:

The result symbol.

Return type:

Symbol