Data Loading API

Overview

This document summarizes supported data formats and iterator APIs to read the data including

mxnet.io Data iterators for common data formats.
mxnet.recordio Read and write for the RecordIO data format.
mxnet.image Read individual image files and perform augmentations.

First, let’s see how to write an iterator for a new data format. The following iterator can be used to train a symbol whose input data variable has name data and input label variable has name softmax_label. The iterator also provides information about the batch, including the shapes and name.

>>> nd_iter = mx.io.NDArrayIter(data={'data':mx.nd.ones((100,10))},
...                             label={'softmax_label':mx.nd.ones((100,))},
...                             batch_size=25)
>>> print(nd_iter.provide_data)
[DataDesc[data,(25, 10L),<type 'numpy.float32'>,NCHW]]
>>> print(nd_iter.provide_label)
[DataDesc[softmax_label,(25,),<type 'numpy.float32'>,NCHW]]

Let’s see a complete example of how to use data iterator in model training.

>>> data = mx.sym.Variable('data')
>>> label = mx.sym.Variable('softmax_label')
>>> fullc = mx.sym.FullyConnected(data=data, num_hidden=1)
>>> loss = mx.sym.SoftmaxOutput(data=data, label=label)
>>> mod = mx.mod.Module(loss, data_names=['data'], label_names=['softmax_label'])
>>> mod.bind(data_shapes=nd_iter.provide_data, label_shapes=nd_iter.provide_label)
>>> mod.fit(nd_iter, num_epoch=2)

A detailed tutorial is available at Iterators - Loading data.

Data iterators

io.NDArrayIter Returns an iterator for mx.nd.NDArray or numpy.ndarray.
io.CSVIter Returns the CSV file iterator.
io.ImageRecordIter Iterates on image RecordIO files
io.ImageRecordUInt8Iter Iterating on image RecordIO files
io.MNISTIter Iterating on the MNIST dataset.
recordio.MXRecordIO Reads/writes RecordIO data format, supporting sequential read and write.
recordio.MXIndexedRecordIO Reads/writes RecordIO data format, supporting random access.
image.ImageIter Image data iterator with a large number of augmentation choices.

Helper classes and functions

Data structures and other iterators provided in the mxnet.io packages.

io.DataDesc DataDesc is used to store name, shape, type and layout information of the data or the label.
io.DataBatch A data batch.
io.DataIter The base class for an MXNet data iterator.
io.ResizeIter Resize a data iterator to a given number of batches.
io.PrefetchingIter Performs pre-fetch for other data iterators.
io.MXDataIter A python wrapper a C++ data iterator.

A list of image modification functions provided by mxnet.image.

image.imdecode Decode an image to an NDArray.
image.scale_down Scales down crop size if it’s larger than image size.
image.resize_short Resizes shorter edge to size.
image.fixed_crop Crop src at fixed location, and (optionally) resize it to size.
image.random_crop Randomly crop src with size (width, height).
image.center_crop Crops the image src to the given size by trimming on all four sides and preserving the center of the image.
image.color_normalize Normalize src with mean and std.
image.random_size_crop Randomly crop src with size.
image.ResizeAug Make resize shorter edge to size augmenter.
image.RandomCropAug Make random crop augmenter
image.RandomSizedCropAug Make random crop with random resizing and random aspect ratio jitter augmenter.
image.CenterCropAug Make center crop augmenter.
image.RandomOrderAug Apply list of augmenters in random order
image.ColorJitterAug Apply random brightness, contrast and saturation jitter in random order.
image.LightingAug Add PCA based noise.
image.ColorNormalizeAug Mean and std normalization.
image.HorizontalFlipAug Random horizontal flipping.
image.CastAug Cast to float32
image.CreateAugmenter Creates an augmenter list.

Functions to read and write RecordIO files.

recordio.pack Pack a string into MXImageRecord.
recordio.unpack Unpack a MXImageRecord to string.
recordio.unpack_img Unpack a MXImageRecord to image.
recordio.pack_img Pack an image into MXImageRecord.

Develop a new iterator

Writing a new data iterator in Python is straightforward. Most MXNet training/inference programs accept an iterable object with provide_data and provide_label properties. This tutorial explains how to write an iterator from scratch.

The following example demonstrates how to combine multiple data iterators into a single one. It can be used for multiple modality training such as image captioning, in which images are read by ImageRecordIter while documents are read by CSVIter

class MultiIter:
    def __init__(self, iter_list):
        self.iters = iter_list
    def next(self):
        batches = [i.next() for i in self.iters]
        return DataBatch(data=[*b.data for b in batches],
                         label=[*b.label for b in batches])
    def reset(self):
        for i in self.iters:
            i.reset()
    @property
    def provide_data(self):
        return [*i.provide_data for i in self.iters]
    @property
    def provide_label(self):
        return [*i.provide_label for i in self.iters]

iter = MultiIter([mx.io.ImageRecordIter('image.rec'), mx.io.CSVIter('txt.csv')])

Parsing and performing another pre-processing such as augmentation may be expensive. If performance is critical, we can implement a data iterator in C++. Refer to src/io for examples.

Change batch layout

By default, the backend engine treats the first dimension of each data and label variable in data iterators as the batch size (i.e. NCHW or NT layout). In order to override the axis for batch size, the provide_data (and provide_label if there is label) properties should include the layouts. This is especially useful in RNN since TNC layouts are often more efficient. For example:

@property
def provide_data(self):
    return [DataDesc(name='seq_var', shape=(seq_length, batch_size), layout='TN')]

The backend engine will recognize the index of N in the layout as the axis for batch size.

API Reference

Data iterators for common data formats.

class mxnet.io.DataDesc

DataDesc is used to store name, shape, type and layout information of the data or the label.

The layout describes how the axes in shape should be interpreted, for example for image data setting layout=NCHW indicates that the first axis is number of examples in the batch(N), C is number of channels, H is the height and W is the width of the image.

For sequential data, by default layout is set to NTC, where N is number of examples in the batch, T the temporal axis representing time and C is the number of channels.

Parameters:
  • cls (DataDesc) – The class.
  • name (str) – Data name.
  • shape (tuple of int) – Data shape.
  • dtype (np.dtype, optional) – Data type.
  • layout (str, optional) – Data layout.
static get_batch_axis(layout)

Get the dimension that corresponds to the batch size.

When data parallelism is used, the data will be automatically split and concatenated along the batch-size dimension. Axis can be -1, which means the whole array will be copied for each data-parallelism device.

Parameters:layout (str) – layout string. For example, “NCHW”.
Returns:An axis indicating the batch_size dimension.
Return type:int
static get_list(shapes, types)

Get DataDesc list from attribute lists.

Parameters:
  • shapes (a tuple of (name, shape)) –
  • types (a tuple of (name, type)) –
class mxnet.io.DataBatch(data, label=None, pad=None, index=None, bucket_key=None, provide_data=None, provide_label=None)

A data batch.

MXNet’s data iterator returns a batch of data for each next call. This data contains batch_size number of examples.

If the input data consists of images, then shape of these images depend on the layout attribute of DataDesc object in provide_data parameter.

If layout is set to ‘NCHW’ then, images should be stored in a 4-D matrix of shape (batch_size, num_channel, height, width). If layout is set to ‘NHWC’ then, images should be stored in a 4-D matrix of shape (batch_size, height, width, num_channel). The channels are often in RGB order.

Parameters:
  • data (list of NDArray, each array containing batch_size examples.) – A list of input data.
  • label (list of NDArray, each array often containing a 1-dimensional array. optional) – A list of input labels.
  • pad (int, optional) – The number of examples padded at the end of a batch. It is used when the total number of examples read is not divisible by the batch_size. These extra padded examples are ignored in prediction.
  • index (numpy.array, optional) – The example indices in this batch.
  • bucket_key (int, optional) – The bucket key, used for bucketing module.
  • provide_data (list of DataDesc, optional) – A list of DataDesc objects. DataDesc is used to store name, shape, type and layout information of the data. The i-th element describes the name and shape of data[i].
  • provide_label (list of DataDesc, optional) – A list of DataDesc objects. DataDesc is used to store name, shape, type and layout information of the label. The i-th element describes the name and shape of label[i].
class mxnet.io.DataIter(batch_size=0)

The base class for an MXNet data iterator.

All I/O in MXNet is handled by specializations of this class. Data iterators in MXNet are similar to standard-iterators in Python. On each call to next they return a DataBatch which represents the next batch of data. When there is no more data to return, it raises a StopIteration exception.

Parameters:batch_size (int, optional) – The batch size, namely the number of items in the batch.

See also

NDArrayIter
Data-iterator for MXNet NDArray or numpy-ndarray objects.
CSVIter
Data-iterator for csv data.
ImageIter
Data-iterator for images.
reset()

Reset the iterator to the begin of the data.

next()

Get next data batch from iterator.

Returns:The data of next batch.
Return type:DataBatch
Raises:StopIteration – If the end of the data is reached.
iter_next()

Move to the next batch.

Returns:Whether the move is successful.
Return type:boolean
getdata()

Get data of current batch.

Returns:The data of the current batch.
Return type:list of NDArray
getlabel()

Get label of the current batch.

Returns:The label of the current batch.
Return type:list of NDArray
getindex()

Get index of the current batch.

Returns:index – The indices of examples in the current batch.
Return type:numpy.array
getpad()

Get the number of padding examples in the current batch.

Returns:Number of padding examples in the current batch.
Return type:int
class mxnet.io.ResizeIter(data_iter, size, reset_internal=True)

Resize a data iterator to a given number of batches.

Parameters:
  • data_iter (DataIter) – The data iterator to be resized.
  • size (int) – The number of batches per epoch to resize to.
  • reset_internal (bool) – Whether to reset internal iterator on ResizeIter.reset.

Examples

>>> nd_iter = mx.io.NDArrayIter(mx.nd.ones((100,10)), batch_size=25)
>>> resize_iter = mx.io.ResizeIter(nd_iter, 2)
>>> for batch in resize_iter:
...     print(batch.data)
[<NDArray 25x10 @cpu(0)>]
[<NDArray 25x10 @cpu(0)>]
class mxnet.io.PrefetchingIter(iters, rename_data=None, rename_label=None)

Performs pre-fetch for other data iterators.

This iterator will create another thread to perform iter_next and then store the data in memory. It potentially accelerates the data read, at the cost of more memory usage.

Parameters:
  • iters (DataIter or list of DataIter) – The data iterators to be pre-fetched.
  • rename_data (None or list of dict) – The i-th element is a renaming map for the i-th iter, in the form of {‘original_name’ : ‘new_name’}. Should have one entry for each entry in iter[i].provide_data.
  • rename_label (None or list of dict) – Similar to rename_data.

Examples

>>> iter1 = mx.io.NDArrayIter({'data':mx.nd.ones((100,10))}, batch_size=25)
>>> iter2 = mx.io.NDArrayIter({'data':mx.nd.ones((100,10))}, batch_size=25)
>>> piter = mx.io.PrefetchingIter([iter1, iter2],
...                               rename_data=[{'data': 'data_1'}, {'data': 'data_2'}])
>>> print(piter.provide_data)
[DataDesc[data_1,(25, 10L),<type 'numpy.float32'>,NCHW],
 DataDesc[data_2,(25, 10L),<type 'numpy.float32'>,NCHW]]
class mxnet.io.NDArrayIter(data, label=None, batch_size=1, shuffle=False, last_batch_handle='pad', data_name='data', label_name='softmax_label')

Returns an iterator for mx.nd.NDArray or numpy.ndarray.

>>> data = np.arange(40).reshape((10,2,2))
>>> labels = np.ones([10, 1])
>>> dataiter = mx.io.NDArrayIter(data, labels, 3, True, last_batch_handle='discard')
>>> for batch in dataiter:
...     print batch.data[0].asnumpy()
...     batch.data[0].shape
...
[[[ 36.  37.]
  [ 38.  39.]]
 [[ 16.  17.]
  [ 18.  19.]]
 [[ 12.  13.]
  [ 14.  15.]]]
(3L, 2L, 2L)
[[[ 32.  33.]
  [ 34.  35.]]
 [[  4.   5.]
  [  6.   7.]]
 [[ 24.  25.]
  [ 26.  27.]]]
(3L, 2L, 2L)
[[[  8.   9.]
  [ 10.  11.]]
 [[ 20.  21.]
  [ 22.  23.]]
 [[ 28.  29.]
  [ 30.  31.]]]
(3L, 2L, 2L)
>>> dataiter.provide_data # Returns a list of `DataDesc`
[DataDesc[data,(3, 2L, 2L),<type 'numpy.float32'>,NCHW]]
>>> dataiter.provide_label # Returns a list of `DataDesc`
[DataDesc[softmax_label,(3, 1L),<type 'numpy.float32'>,NCHW]]

In the above example, data is shuffled as shuffle parameter is set to True and remaining examples are discarded as last_batch_handle parameter is set to discard.

Usage of last_batch_handle parameter:

>>> dataiter = mx.io.NDArrayIter(data, labels, 3, True, last_batch_handle='pad')
>>> batchidx = 0
>>> for batch in dataiter:
...     batchidx += 1
...
>>> batchidx  # Padding added after the examples read are over. So, 10/3+1 batches are created.
4
>>> dataiter = mx.io.NDArrayIter(data, labels, 3, True, last_batch_handle='discard')
>>> batchidx = 0
>>> for batch in dataiter:
...     batchidx += 1
...
>>> batchidx # Remaining examples are discarded. So, 10/3 batches are created.
3

NDArrayIter also supports multiple input and labels.

>>> data = {'data1':np.zeros(shape=(10,2,2)), 'data2':np.zeros(shape=(20,2,2))}
>>> label = {'label1':np.zeros(shape=(10,1)), 'label2':np.zeros(shape=(20,1))}
>>> dataiter = mx.io.NDArrayIter(data, label, 3, True, last_batch_handle='discard')
Parameters:
  • data (array or list of array or dict of string to array) – The input data.
  • label (array or list of array or dict of string to array, optional) – The input label.
  • batch_size (int) – Batch size of data.
  • shuffle (bool, optional) – Whether to shuffle the data.
  • last_batch_handle (str, optional) – How to handle the last batch. This parameter can be ‘pad’, ‘discard’ or ‘roll_over’. ‘roll_over’ is intended for training and can cause problems if used for prediction.
  • data_name (str, optional) – The data name.
  • label_name (str, optional) – The label name.
provide_data

The name and shape of data provided by this iterator.

provide_label

The name and shape of label provided by this iterator.

hard_reset()

Ignore roll over data and set to start.

class mxnet.io.MXDataIter(handle, data_name='data', label_name='softmax_label', **_)

A python wrapper a C++ data iterator.

This iterator is the Python wrapper to all native C++ data iterators, such as CSVIter, `ImageRecordIter, MNISTIter, etc. When initializing CSVIter for example, you will get an MXDataIter instance to use in your Python code. Calls to next, reset, etc will be delegated to the underlying C++ data iterators.

Usually you don’t need to interact with MXDataIter directly unless you are implementing your own data iterators in C++. To do that, please refer to examples under the src/io folder.

Parameters:
  • handle (DataIterHandle, required) – The handle to the underlying C++ Data Iterator.
  • data_name (str, optional) – Data name. Default to “data”.
  • label_name (str, optional) – Label name. Default to “softmax_label”.

See also

src/io : The underlying C++ data iterator implementation, e.g., CSVIter.

mxnet.io.CSVIter(*args, **kwargs)

Returns the CSV file iterator.

In this function, the data_shape parameter is used to set the shape of each line of the input data. If a row in an input file is 1,2,3,4,5,6` and data_shape is (3,2), that row will be reshaped, yielding the array [[1,2],[3,4],[5,6]] of shape (3,2).

By default, the CSVIter has round_batch parameter set to True. So, if batch_size is 3 and there are 4 total rows in CSV file, 2 more examples are consumed at the first round. If reset function is called after first round, the call is ignored and remaining examples are returned in the second round.

If one wants all the instances in the second round after calling reset, make sure to set round_batch to False.

If data_csv = 'data/' is set, then all the files in this directory will be read.

Examples:

// Contents of CSV file ``data/data.csv``.
1,2,3
2,3,4
3,4,5
4,5,6

// Creates a `CSVIter` with `batch_size`=2 and default `round_batch`=True.
CSVIter = mx.io.CSVIter(data_csv = 'data/data.csv', data_shape = (3,),
batch_size = 2)

// Two batches read from the above iterator are as follows:
[[ 1.  2.  3.]
[ 2.  3.  4.]]
[[ 3.  4.  5.]
[ 4.  5.  6.]]

// Creates a `CSVIter` with default `round_batch` set to True.
CSVIter = mx.io.CSVIter(data_csv = 'data/data.csv', data_shape = (3,),
batch_size = 3)

// Two batches read from the above iterator in the first pass are as follows:
[[1.  2.  3.]
[2.  3.  4.]
[3.  4.  5.]]

[[4.  5.  6.]
[2.  3.  4.]
[3.  4.  5.]]

// Now, `reset` method is called.
CSVIter.reset()

// Batch read from the above iterator in the second pass is as follows:
[[ 3.  4.  5.]
[ 4.  5.  6.]
[ 1.  2.  3.]]

// Creates a `CSVIter` with `round_batch`=False.
CSVIter = mx.io.CSVIter(data_csv = 'data/data.csv', data_shape = (3,),
batch_size = 3, round_batch=False)

// Contents of two batches read from the above iterator in both passes, after calling
// `reset` method before second pass, is as follows:
[[1.  2.  3.]
[2.  3.  4.]
[3.  4.  5.]]

[[4.  5.  6.]
[2.  3.  4.]
[3.  4.  5.]]

Defined in src/io/iter_csv.cc:L202

Parameters:
  • data_csv (string, required) – The input CSV file or a directory path.
  • data_shape (Shape(tuple), required) – The shape of one example.
  • label_csv (string, optional, default='NULL') – The input CSV file or a directory path. If NULL, all labels will be returned as 0.
  • label_shape (Shape(tuple), optional, default=(1,)) – The shape of one label.
  • batch_size (int (non-negative), required) – Batch size.
  • round_batch (boolean, optional, default=True) – Whether to use round robin to handle overflow batch or not.
  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.
  • dtype ({None, 'float16', 'float32', 'float64', 'int32', 'uint8'},optional, default='None') – Output data type. None means no change.
Returns:

The result iterator.

Return type:

MXDataIter

mxnet.io.ImageDetRecordIter(*args, **kwargs)

Create iterator for image detection dataset packed in recordio.

Parameters:
  • path_imglist (string, optional, default='') – Dataset Param: Path to image list.
  • path_imgrec (string, optional, default='./data/imgrec.rec') – Dataset Param: Path to image record file.
  • aug_seq (string, optional, default='det_aug_default') – Augmentation Param: the augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters. Make sure you don’t use normal augmenters for detection tasks.
  • label_width (int, optional, default='-1') – Dataset Param: How many labels for an image, -1 for variable label size.
  • data_shape (Shape(tuple), required) – Dataset Param: Shape of each instance generated by the DataIter.
  • preprocess_threads (int, optional, default='4') – Backend Param: Number of thread to do preprocessing.
  • verbose (boolean, optional, default=True) – Auxiliary Param: Whether to output parser information.
  • num_parts (int, optional, default='1') – partition the data into multiple parts
  • part_index (int, optional, default='0') – the index of the part will read
  • shuffle_chunk_size (long (non-negative), optional, default=0) – the size(MB) of the shuffle chunk, used with shuffle=True, it can enable global shuffling
  • shuffle_chunk_seed (int, optional, default='0') – the seed for chunk shuffling
  • label_pad_width (int, optional, default='0') – pad output label width if set larger than 0, -1 for auto estimate
  • label_pad_value (float, optional, default=-1) – label padding value if enabled
  • shuffle (boolean, optional, default=False) – Augmentation Param: Whether to shuffle data.
  • seed (int, optional, default='0') – Augmentation Param: Random Seed.
  • batch_size (int (non-negative), required) – Batch size.
  • round_batch (boolean, optional, default=True) – Whether to use round robin to handle overflow batch or not.
  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.
  • dtype ({None, 'float16', 'float32', 'float64', 'int32', 'uint8'},optional, default='None') – Output data type. None means no change.
  • resize (int, optional, default='-1') – Augmentation Param: scale shorter edge to size before applying other augmentations, -1 to disable.
  • rand_crop_prob (float, optional, default=0) – Augmentation Param: Probability of random cropping, <= 0 to disable
  • min_crop_scales (, optional, default=(0,)) – Augmentation Param: Min crop scales.
  • max_crop_scales (, optional, default=(1,)) – Augmentation Param: Max crop scales.
  • min_crop_aspect_ratios (, optional, default=(1,)) – Augmentation Param: Min crop aspect ratios.
  • max_crop_aspect_ratios (, optional, default=(1,)) – Augmentation Param: Max crop aspect ratios.
  • min_crop_overlaps (, optional, default=(0,)) – Augmentation Param: Minimum crop IOU between crop_box and ground-truths.
  • max_crop_overlaps (, optional, default=(1,)) – Augmentation Param: Maximum crop IOU between crop_box and ground-truth.
  • min_crop_sample_coverages (, optional, default=(0,)) – Augmentation Param: Minimum ratio of intersect/crop_area between crop box and ground-truths.
  • max_crop_sample_coverages (, optional, default=(1,)) – Augmentation Param: Maximum ratio of intersect/crop_area between crop box and ground-truths.
  • min_crop_object_coverages (, optional, default=(0,)) – Augmentation Param: Minimum ratio of intersect/gt_area between crop box and ground-truths.
  • max_crop_object_coverages (, optional, default=(1,)) – Augmentation Param: Maximum ratio of intersect/gt_area between crop box and ground-truths.
  • num_crop_sampler (int, optional, default='1') – Augmentation Param: Number of crop samplers.
  • crop_emit_mode ({'center', 'overlap'},optional, default='center') – Augmentation Param: Emition mode for invalid ground-truths after crop. center: emit if centroid of object is out of crop region; overlap: emit if overlap is less than emit_overlap_thresh.
  • emit_overlap_thresh (float, optional, default=0.3) – Augmentation Param: Emit overlap thresh for emit mode overlap only.
  • max_crop_trials (Shape(tuple), optional, default=(25,)) – Augmentation Param: Skip cropping if fail crop trail count exceeds this number.
  • rand_pad_prob (float, optional, default=0) – Augmentation Param: Probability for random padding.
  • max_pad_scale (float, optional, default=1) – Augmentation Param: Maximum padding scale.
  • max_random_hue (int, optional, default='0') – Augmentation Param: Maximum random value of H channel in HSL color space.
  • random_hue_prob (float, optional, default=0) – Augmentation Param: Probability to apply random hue.
  • max_random_saturation (int, optional, default='0') – Augmentation Param: Maximum random value of S channel in HSL color space.
  • random_saturation_prob (float, optional, default=0) – Augmentation Param: Probability to apply random saturation.
  • max_random_illumination (int, optional, default='0') – Augmentation Param: Maximum random value of L channel in HSL color space.
  • random_illumination_prob (float, optional, default=0) – Augmentation Param: Probability to apply random illumination.
  • max_random_contrast (float, optional, default=0) – Augmentation Param: Maximum random value of delta contrast.
  • random_contrast_prob (float, optional, default=0) – Augmentation Param: Probability to apply random contrast.
  • rand_mirror_prob (float, optional, default=0) – Augmentation Param: Probability to apply horizontal flip aka. mirror.
  • fill_value (int, optional, default='127') – Augmentation Param: Filled color value while padding.
  • inter_method (int, optional, default='1') – Augmentation Param: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.
  • resize_mode ({'fit', 'force', 'shrink'},optional, default='force') – Augmentation Param: How image data fit in data_shape. force: force reshape to data_shape regardless of aspect ratio; shrink: ensure each side fit in data_shape, preserve aspect ratio; fit: fit image to data_shape, preserve ratio, will upscale if applicable.
  • mean_img (string, optional, default='') – Augmentation Param: Mean Image to be subtracted.
  • mean_r (float, optional, default=0) – Augmentation Param: Mean value on R channel.
  • mean_g (float, optional, default=0) – Augmentation Param: Mean value on G channel.
  • mean_b (float, optional, default=0) – Augmentation Param: Mean value on B channel.
  • mean_a (float, optional, default=0) – Augmentation Param: Mean value on Alpha channel.
  • std_r (float, optional, default=0) – Augmentation Param: Standard deviation on R channel.
  • std_g (float, optional, default=0) – Augmentation Param: Standard deviation on G channel.
  • std_b (float, optional, default=0) – Augmentation Param: Standard deviation on B channel.
  • std_a (float, optional, default=0) – Augmentation Param: Standard deviation on Alpha channel.
  • scale (float, optional, default=1) – Augmentation Param: Scale in color space.
Returns:

The result iterator.

Return type:

MXDataIter

mxnet.io.ImageRecordIter(*args, **kwargs)

Iterates on image RecordIO files

Reads batches of images from .rec RecordIO files. One can use im2rec.py tool (in tools/) to pack raw image files into RecordIO files. This iterator is less flexible to customization but is fast and has lot of language bindings. To iterate over raw images directly use ImageIter instead (in Python).

Example:

data_iter = mx.io.ImageRecordIter(
  path_imgrec="./sample.rec", # The target record file.
  data_shape=(3, 227, 227), # Output data shape; 227x227 region will be cropped from the original image.
  batch_size=4, # Number of items per batch.
  resize=256 # Resize the shorter edge to 256 before cropping.
  # You can specify more augmentation options. Use help(mx.io.ImageRecordIter) to see all the options.
  )
# You can now use the data_iter to access batches of images.
batch = data_iter.next() # first batch.
images = batch.data[0] # This will contain 4 (=batch_size) images each of 3x227x227.
# process the images
...
data_iter.reset() # To restart the iterator from the beginning.

Defined in src/io/iter_image_recordio_2.cc:L583

Parameters:
  • path_imglist (string, optional, default='') – Path to the image list (.lst) file. Generally created with tools/im2rec.py. Format (Tab separated): <index of record> <one or more labels> <relative path from root folder>.
  • path_imgrec (string, optional, default='') – Path to the image RecordIO (.rec) file or a directory path. Created with tools/im2rec.py.
  • aug_seq (string, optional, default='aug_default') – The augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters.
  • label_width (int, optional, default='1') – The number of labels per image.
  • data_shape (Shape(tuple), required) – The shape of one output image in (channels, height, width) format.
  • preprocess_threads (int, optional, default='4') – The number of threads to do preprocessing.
  • verbose (boolean, optional, default=True) – If or not output verbose information.
  • num_parts (int, optional, default='1') – Virtually partition the data into these many parts.
  • part_index (int, optional, default='0') – The i-th virtual partition to be read.
  • shuffle_chunk_size (long (non-negative), optional, default=0) – The data shuffle buffer size in MB. Only valid if shuffle is true.
  • shuffle_chunk_seed (int, optional, default='0') – The random seed for shuffling
  • shuffle (boolean, optional, default=False) – Whether to shuffle data randomly or not.
  • seed (int, optional, default='0') – The random seed.
  • batch_size (int (non-negative), required) – Batch size.
  • round_batch (boolean, optional, default=True) – Whether to use round robin to handle overflow batch or not.
  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.
  • dtype ({None, 'float16', 'float32', 'float64', 'int32', 'uint8'},optional, default='None') – Output data type. None means no change.
  • resize (int, optional, default='-1') – Down scale the shorter edge to a new size before applying other augmentations.
  • rand_crop (boolean, optional, default=False) – If or not randomly crop the image
  • max_rotate_angle (int, optional, default='0') – Rotate by a random degree in [-v, v]
  • max_aspect_ratio (float, optional, default=0) – Change the aspect (namely width/height) to a random value in [1 - max_aspect_ratio, 1 + max_aspect_ratio]
  • max_shear_ratio (float, optional, default=0) – Apply a shear transformation (namely (x,y)->(x+my,y)) with m randomly chose from [-max_shear_ratio, max_shear_ratio]
  • max_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size]
  • min_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size]
  • max_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]
  • min_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]
  • max_img_size (float, optional, default=1e+10) – Set the maximal width and height after all resize and rotate argumentation are applied
  • min_img_size (float, optional, default=0) – Set the minimal width and height after all resize and rotate argumentation are applied
  • random_h (int, optional, default='0') – Add a random value in [-random_h, random_h] to the H channel in HSL color space.
  • random_s (int, optional, default='0') – Add a random value in [-random_s, random_s] to the S channel in HSL color space.
  • random_l (int, optional, default='0') – Add a random value in [-random_l, random_l] to the L channel in HSL color space.
  • rotate (int, optional, default='-1') – Rotate by an angle. If set, it overwrites the max_rotate_angle option.
  • fill_value (int, optional, default='255') – Set the padding pixes value into fill_value.
  • inter_method (int, optional, default='1') – The interpolation method: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.
  • pad (int, optional, default='0') – Change size from [width, height] into [pad + width + pad, pad + height + pad] by padding pixes
  • mirror (boolean, optional, default=False) – Whether to mirror the image or not. If true, images are flipped along the horizontal axis.
  • rand_mirror (boolean, optional, default=False) – Whether to randomly mirror images or not. If true, 50% of the images will be randomly mirrored (flipped along the horizontal axis)
  • mean_img (string, optional, default='') – Filename of the mean image.
  • mean_r (float, optional, default=0) – The mean value to be subtracted on the R channel
  • mean_g (float, optional, default=0) – The mean value to be subtracted on the G channel
  • mean_b (float, optional, default=0) – The mean value to be subtracted on the B channel
  • mean_a (float, optional, default=0) – The mean value to be subtracted on the alpha channel
  • scale (float, optional, default=1) – Multiply the image with a scale value.
  • max_random_contrast (float, optional, default=0) – Change the contrast with a value randomly chosen from [-max_random_contrast, max_random_contrast]
  • max_random_illumination (float, optional, default=0) – Change the illumination with a value randomly chosen from [-max_random_illumination, max_random_illumination]
Returns:

The result iterator.

Return type:

MXDataIter

mxnet.io.ImageRecordIter_v1(*args, **kwargs)

Iterating on image RecordIO files

Read images batches from RecordIO files with a rich of data augmentation options.

One can use tools/im2rec.py to pack individual image files into RecordIO files.

Defined in src/io/iter_image_recordio.cc:L328

Parameters:
  • path_imglist (string, optional, default='') – Path to the image list (.lst) file. Generally created with tools/im2rec.py. Format (Tab separated): <index of record> <one or more labels> <relative path from root folder>.
  • path_imgrec (string, optional, default='') – Path to the image RecordIO (.rec) file or a directory path. Created with tools/im2rec.py.
  • aug_seq (string, optional, default='aug_default') – The augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters.
  • label_width (int, optional, default='1') – The number of labels per image.
  • data_shape (Shape(tuple), required) – The shape of one output image in (channels, height, width) format.
  • preprocess_threads (int, optional, default='4') – The number of threads to do preprocessing.
  • verbose (boolean, optional, default=True) – If or not output verbose information.
  • num_parts (int, optional, default='1') – Virtually partition the data into these many parts.
  • part_index (int, optional, default='0') – The i-th virtual partition to be read.
  • shuffle_chunk_size (long (non-negative), optional, default=0) – The data shuffle buffer size in MB. Only valid if shuffle is true.
  • shuffle_chunk_seed (int, optional, default='0') – The random seed for shuffling
  • shuffle (boolean, optional, default=False) – Whether to shuffle data randomly or not.
  • seed (int, optional, default='0') – The random seed.
  • batch_size (int (non-negative), required) – Batch size.
  • round_batch (boolean, optional, default=True) – Whether to use round robin to handle overflow batch or not.
  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.
  • dtype ({None, 'float16', 'float32', 'float64', 'int32', 'uint8'},optional, default='None') – Output data type. None means no change.
  • resize (int, optional, default='-1') – Down scale the shorter edge to a new size before applying other augmentations.
  • rand_crop (boolean, optional, default=False) – If or not randomly crop the image
  • max_rotate_angle (int, optional, default='0') – Rotate by a random degree in [-v, v]
  • max_aspect_ratio (float, optional, default=0) – Change the aspect (namely width/height) to a random value in [1 - max_aspect_ratio, 1 + max_aspect_ratio]
  • max_shear_ratio (float, optional, default=0) – Apply a shear transformation (namely (x,y)->(x+my,y)) with m randomly chose from [-max_shear_ratio, max_shear_ratio]
  • max_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size]
  • min_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size]
  • max_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]
  • min_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]
  • max_img_size (float, optional, default=1e+10) – Set the maximal width and height after all resize and rotate argumentation are applied
  • min_img_size (float, optional, default=0) – Set the minimal width and height after all resize and rotate argumentation are applied
  • random_h (int, optional, default='0') – Add a random value in [-random_h, random_h] to the H channel in HSL color space.
  • random_s (int, optional, default='0') – Add a random value in [-random_s, random_s] to the S channel in HSL color space.
  • random_l (int, optional, default='0') – Add a random value in [-random_l, random_l] to the L channel in HSL color space.
  • rotate (int, optional, default='-1') – Rotate by an angle. If set, it overwrites the max_rotate_angle option.
  • fill_value (int, optional, default='255') – Set the padding pixes value into fill_value.
  • inter_method (int, optional, default='1') – The interpolation method: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.
  • pad (int, optional, default='0') – Change size from [width, height] into [pad + width + pad, pad + height + pad] by padding pixes
  • mirror (boolean, optional, default=False) – Whether to mirror the image or not. If true, images are flipped along the horizontal axis.
  • rand_mirror (boolean, optional, default=False) – Whether to randomly mirror images or not. If true, 50% of the images will be randomly mirrored (flipped along the horizontal axis)
  • mean_img (string, optional, default='') – Filename of the mean image.
  • mean_r (float, optional, default=0) – The mean value to be subtracted on the R channel
  • mean_g (float, optional, default=0) – The mean value to be subtracted on the G channel
  • mean_b (float, optional, default=0) – The mean value to be subtracted on the B channel
  • mean_a (float, optional, default=0) – The mean value to be subtracted on the alpha channel
  • scale (float, optional, default=1) – Multiply the image with a scale value.
  • max_random_contrast (float, optional, default=0) – Change the contrast with a value randomly chosen from [-max_random_contrast, max_random_contrast]
  • max_random_illumination (float, optional, default=0) – Change the illumination with a value randomly chosen from [-max_random_illumination, max_random_illumination]
Returns:

The result iterator.

Return type:

MXDataIter

mxnet.io.ImageRecordUInt8Iter(*args, **kwargs)

Iterating on image RecordIO files

This iterator is identical to ImageRecordIter except for using uint8 as the data type instead of float.

Defined in src/io/iter_image_recordio_2.cc:L600

Parameters:
  • path_imglist (string, optional, default='') – Path to the image list (.lst) file. Generally created with tools/im2rec.py. Format (Tab separated): <index of record> <one or more labels> <relative path from root folder>.
  • path_imgrec (string, optional, default='') – Path to the image RecordIO (.rec) file or a directory path. Created with tools/im2rec.py.
  • aug_seq (string, optional, default='aug_default') – The augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters.
  • label_width (int, optional, default='1') – The number of labels per image.
  • data_shape (Shape(tuple), required) – The shape of one output image in (channels, height, width) format.
  • preprocess_threads (int, optional, default='4') – The number of threads to do preprocessing.
  • verbose (boolean, optional, default=True) – If or not output verbose information.
  • num_parts (int, optional, default='1') – Virtually partition the data into these many parts.
  • part_index (int, optional, default='0') – The i-th virtual partition to be read.
  • shuffle_chunk_size (long (non-negative), optional, default=0) – The data shuffle buffer size in MB. Only valid if shuffle is true.
  • shuffle_chunk_seed (int, optional, default='0') – The random seed for shuffling
  • shuffle (boolean, optional, default=False) – Whether to shuffle data randomly or not.
  • seed (int, optional, default='0') – The random seed.
  • batch_size (int (non-negative), required) – Batch size.
  • round_batch (boolean, optional, default=True) – Whether to use round robin to handle overflow batch or not.
  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.
  • dtype ({None, 'float16', 'float32', 'float64', 'int32', 'uint8'},optional, default='None') – Output data type. None means no change.
  • resize (int, optional, default='-1') – Down scale the shorter edge to a new size before applying other augmentations.
  • rand_crop (boolean, optional, default=False) – If or not randomly crop the image
  • max_rotate_angle (int, optional, default='0') – Rotate by a random degree in [-v, v]
  • max_aspect_ratio (float, optional, default=0) – Change the aspect (namely width/height) to a random value in [1 - max_aspect_ratio, 1 + max_aspect_ratio]
  • max_shear_ratio (float, optional, default=0) – Apply a shear transformation (namely (x,y)->(x+my,y)) with m randomly chose from [-max_shear_ratio, max_shear_ratio]
  • max_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size]
  • min_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size]
  • max_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]
  • min_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]
  • max_img_size (float, optional, default=1e+10) – Set the maximal width and height after all resize and rotate argumentation are applied
  • min_img_size (float, optional, default=0) – Set the minimal width and height after all resize and rotate argumentation are applied
  • random_h (int, optional, default='0') – Add a random value in [-random_h, random_h] to the H channel in HSL color space.
  • random_s (int, optional, default='0') – Add a random value in [-random_s, random_s] to the S channel in HSL color space.
  • random_l (int, optional, default='0') – Add a random value in [-random_l, random_l] to the L channel in HSL color space.
  • rotate (int, optional, default='-1') – Rotate by an angle. If set, it overwrites the max_rotate_angle option.
  • fill_value (int, optional, default='255') – Set the padding pixes value into fill_value.
  • inter_method (int, optional, default='1') – The interpolation method: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.
  • pad (int, optional, default='0') – Change size from [width, height] into [pad + width + pad, pad + height + pad] by padding pixes
Returns:

The result iterator.

Return type:

MXDataIter

mxnet.io.ImageRecordUInt8Iter_v1(*args, **kwargs)

Iterating on image RecordIO files

This iterator is identical to ImageRecordIter except for using uint8 as the data type instead of float.

Defined in src/io/iter_image_recordio.cc:L349

Parameters:
  • path_imglist (string, optional, default='') – Path to the image list (.lst) file. Generally created with tools/im2rec.py. Format (Tab separated): <index of record> <one or more labels> <relative path from root folder>.
  • path_imgrec (string, optional, default='') – Path to the image RecordIO (.rec) file or a directory path. Created with tools/im2rec.py.
  • aug_seq (string, optional, default='aug_default') – The augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters.
  • label_width (int, optional, default='1') – The number of labels per image.
  • data_shape (Shape(tuple), required) – The shape of one output image in (channels, height, width) format.
  • preprocess_threads (int, optional, default='4') – The number of threads to do preprocessing.
  • verbose (boolean, optional, default=True) – If or not output verbose information.
  • num_parts (int, optional, default='1') – Virtually partition the data into these many parts.
  • part_index (int, optional, default='0') – The i-th virtual partition to be read.
  • shuffle_chunk_size (long (non-negative), optional, default=0) – The data shuffle buffer size in MB. Only valid if shuffle is true.
  • shuffle_chunk_seed (int, optional, default='0') – The random seed for shuffling
  • shuffle (boolean, optional, default=False) – Whether to shuffle data randomly or not.
  • seed (int, optional, default='0') – The random seed.
  • batch_size (int (non-negative), required) – Batch size.
  • round_batch (boolean, optional, default=True) – Whether to use round robin to handle overflow batch or not.
  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.
  • dtype ({None, 'float16', 'float32', 'float64', 'int32', 'uint8'},optional, default='None') – Output data type. None means no change.
  • resize (int, optional, default='-1') – Down scale the shorter edge to a new size before applying other augmentations.
  • rand_crop (boolean, optional, default=False) – If or not randomly crop the image
  • max_rotate_angle (int, optional, default='0') – Rotate by a random degree in [-v, v]
  • max_aspect_ratio (float, optional, default=0) – Change the aspect (namely width/height) to a random value in [1 - max_aspect_ratio, 1 + max_aspect_ratio]
  • max_shear_ratio (float, optional, default=0) – Apply a shear transformation (namely (x,y)->(x+my,y)) with m randomly chose from [-max_shear_ratio, max_shear_ratio]
  • max_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size]
  • min_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size]
  • max_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]
  • min_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]
  • max_img_size (float, optional, default=1e+10) – Set the maximal width and height after all resize and rotate argumentation are applied
  • min_img_size (float, optional, default=0) – Set the minimal width and height after all resize and rotate argumentation are applied
  • random_h (int, optional, default='0') – Add a random value in [-random_h, random_h] to the H channel in HSL color space.
  • random_s (int, optional, default='0') – Add a random value in [-random_s, random_s] to the S channel in HSL color space.
  • random_l (int, optional, default='0') – Add a random value in [-random_l, random_l] to the L channel in HSL color space.
  • rotate (int, optional, default='-1') – Rotate by an angle. If set, it overwrites the max_rotate_angle option.
  • fill_value (int, optional, default='255') – Set the padding pixes value into fill_value.
  • inter_method (int, optional, default='1') – The interpolation method: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.
  • pad (int, optional, default='0') – Change size from [width, height] into [pad + width + pad, pad + height + pad] by padding pixes
Returns:

The result iterator.

Return type:

MXDataIter

mxnet.io.MNISTIter(*args, **kwargs)

Iterating on the MNIST dataset.

One can download the dataset from http://yann.lecun.com/exdb/mnist/

Defined in src/io/iter_mnist.cc:L246

Parameters:
  • image (string, optional, default='./train-images-idx3-ubyte') – Dataset Param: Mnist image path.
  • label (string, optional, default='./train-labels-idx1-ubyte') – Dataset Param: Mnist label path.
  • batch_size (int, optional, default='128') – Batch Param: Batch Size.
  • shuffle (boolean, optional, default=True) – Augmentation Param: Whether to shuffle data.
  • flat (boolean, optional, default=False) – Augmentation Param: Whether to flat the data into 1D.
  • seed (int, optional, default='0') – Augmentation Param: Random Seed.
  • silent (boolean, optional, default=False) – Auxiliary Param: Whether to print out data info.
  • num_parts (int, optional, default='1') – partition the data into multiple parts
  • part_index (int, optional, default='0') – the index of the part will read
  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.
  • dtype ({None, 'float16', 'float32', 'float64', 'int32', 'uint8'},optional, default='None') – Output data type. None means no change.
Returns:

The result iterator.

Return type:

MXDataIter

Read individual image files and perform augmentations.

mxnet.image.imdecode(buf, **kwargs)

Decode an image to an NDArray.

Note: imdecode uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for imdecode to work.

Parameters:
  • buf (str/bytes or numpy.ndarray) – Binary image data as string or numpy ndarray.
  • flag (int, optional, default=1) – 1 for three channel color output. 0 for grayscale output.
  • to_rgb (int, optional, default=1) – 1 for RGB formatted output (MXNet default). 0 for BGR formatted output (OpenCV default).
  • out (NDArray, optional) – Output buffer. Use None for automatic allocation.
Returns:

An NDArray containing the image.

Return type:

NDArray

Example

>>> with open("flower.jpg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image)
>>> image
<NDArray 224x224x3 @cpu(0)>

Set flag parameter to 0 to get grayscale output

>>> with open("flower.jpg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image, flag=0)
>>> image
<NDArray 224x224x1 @cpu(0)>

Set to_rgb parameter to 0 to get output in OpenCV format (BGR)

>>> with open("flower.jpg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image, to_rgb=0)
>>> image
<NDArray 224x224x3 @cpu(0)>
mxnet.image.scale_down(src_size, size)

Scales down crop size if it’s larger than image size.

If width/height of the crop is larger than the width/height of the image, sets the width/height to the width/height of the image.

Parameters:
  • src_size (tuple of int) – Size of the image in (width, height) format.
  • size (tuple of int) – Size of the crop in (width, height) format.
Returns:

A tuple containing the scaled crop size in (width, height) format.

Return type:

tuple of int

Example

>>> src_size = (640,480)
>>> size = (720,120)
>>> new_size = mx.img.scale_down(src_size, size)
>>> new_size
(640,106)
mxnet.image.resize_short(src, size, interp=2)

Resizes shorter edge to size.

Note: resize_short uses OpenCV (not the CV2 Python library). MXNet must have been built with OpenCV for resize_short to work.

Resizes the original image by setting the shorter edge to size and setting the longer edge accordingly. Resizing function is called from OpenCV.

Parameters:
Returns:

An ‘NDArray’ containing the resized image.

Return type:

NDArray

Example

>>> with open("flower.jpeg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.img.imdecode(str_image)
>>> image
<NDArray 2321x3482x3 @cpu(0)>
>>> size = 640
>>> new_image = mx.img.resize_short(image, size)
>>> new_image
<NDArray 2321x3482x3 @cpu(0)>
mxnet.image.fixed_crop(src, x0, y0, w, h, size=None, interp=2)

Crop src at fixed location, and (optionally) resize it to size.

mxnet.image.random_crop(src, size, interp=2)

Randomly crop src with size (width, height). Upsample result if src is smaller than size.

Parameters:
  • src (Source image NDArray) –
  • size (Size of the crop formatted as (width, height). If the size is larger) – than the image, then the source image is upsampled to size and returned.
  • interp (Interpolation method to be used in case the size is larger (default: bicubic).) – Uses OpenCV convention for the parameters. Nearest - 0, Bilinear - 1, Bicubic - 2, Area - 3. See OpenCV imresize function for more details.
Returns:

  • NDArray – An NDArray containing the cropped image.
  • Tuple – A tuple (x, y, width, height) where (x, y) is top-left position of the crop in the original image and (width, height) are the dimensions of the cropped image.

Example

>>> im = mx.nd.array(cv2.imread("flower.jpg"))
>>> cropped_im, rect  = mx.image.random_crop(im, (100, 100))
>>> print cropped_im
<NDArray 100x100x1 @cpu(0)>
>>> print rect
(20, 21, 100, 100)
mxnet.image.center_crop(src, size, interp=2)

Crops the image src to the given size by trimming on all four sides and preserving the center of the image. Upsamples if src is smaller than size.

Note

This requires MXNet to be compiled with USE_OPENCV.

Parameters:
  • src (NDArray) – Binary source image data.
  • size (list or tuple of int) – The desired output image size.
  • interp (interpolation, optional, default=Area-based) –

    The type of interpolation that is done to the image.

    Possible values:

    0: Nearest Neighbors Interpolation.

    1: Bilinear interpolation.

    2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default).

    3: Bicubic interpolation over 4x4 pixel neighborhood.

    4: Lanczos interpolation over 8x8 pixel neighborhood.

    When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK).

Returns:

  • NDArray – The cropped image.
  • Tuple – (x, y, width, height) where x, y are the positions of the crop in the original image and width, height the dimensions of the crop.

Example

>>> with open("flower.jpg", 'rb') as fp:
...     str_image = fp.read()
...
>>> image = mx.image.imdecode(str_image)
>>> image
<NDArray 2321x3482x3 @cpu(0)>
>>> cropped_image, (x, y, width, height) = mx.image.center_crop(image, (1000, 500))
>>> cropped_image
<NDArray 500x1000x3 @cpu(0)>
>>> x, y, width, height
(1241, 910, 1000, 500)
mxnet.image.color_normalize(src, mean, std=None)

Normalize src with mean and std.

mxnet.image.random_size_crop(src, size, min_area, ratio, interp=2)

Randomly crop src with size. Randomize area and aspect ratio.

mxnet.image.ResizeAug(size, interp=2)

Make resize shorter edge to size augmenter.

mxnet.image.RandomCropAug(size, interp=2)

Make random crop augmenter

mxnet.image.RandomSizedCropAug(size, min_area, ratio, interp=2)

Make random crop with random resizing and random aspect ratio jitter augmenter.

mxnet.image.CenterCropAug(size, interp=2)

Make center crop augmenter.

mxnet.image.RandomOrderAug(ts)

Apply list of augmenters in random order

mxnet.image.ColorJitterAug(brightness, contrast, saturation)

Apply random brightness, contrast and saturation jitter in random order.

mxnet.image.LightingAug(alphastd, eigval, eigvec)

Add PCA based noise.

mxnet.image.ColorNormalizeAug(mean, std)

Mean and std normalization.

mxnet.image.HorizontalFlipAug(p)

Random horizontal flipping.

mxnet.image.CastAug()

Cast to float32

mxnet.image.CreateAugmenter(data_shape, resize=0, rand_crop=False, rand_resize=False, rand_mirror=False, mean=None, std=None, brightness=0, contrast=0, saturation=0, pca_noise=0, inter_method=2)

Creates an augmenter list.

class mxnet.image.ImageIter(batch_size, data_shape, label_width=1, path_imgrec=None, path_imglist=None, path_root=None, path_imgidx=None, shuffle=False, part_index=0, num_parts=1, aug_list=None, imglist=None, data_name='data', label_name='softmax_label', **kwargs)

Image data iterator with a large number of augmentation choices. This iterator supports reading from both .rec files and raw image files.

To load input images from .rec files, use path_imgrec parameter and to load from raw image files, use path_imglist and path_root parameters.

To use data partition (for distributed training) or shuffling, specify path_imgidx parameter.

Parameters:
  • batch_size (int) – Number of examples per batch.
  • data_shape (tuple) – Data shape in (channels, height, width) format. For now, only RGB image with 3 channels is supported.
  • label_width (int, optional) – Number of labels per example. The default label width is 1.
  • path_imgrec (str) – Path to image record file (.rec). Created with tools/im2rec.py or bin/im2rec.
  • path_imglist (str) – Path to image list (.lst). Created with tools/im2rec.py or with custom script. Format: Tab separated record of index, one or more labels and relative_path_from_root.
  • imglist (list) – A list of images with the label(s). Each item is a list [imagelabel: float or list of float, imgpath].
  • path_root (str) – Root folder of image files.
  • path_imgidx (str) – Path to image index file. Needed for partition and shuffling when using .rec source.
  • shuffle (bool) – Whether to shuffle all images at the start of each iteration or not. Can be slow for HDD.
  • part_index (int) – Partition index.
  • num_parts (int) – Total number of partitions.
  • data_name (str) – Data name for provided symbols.
  • label_name (str) – Label name for provided symbols.
  • kwargs – More arguments for creating augmenter. See mx.image.CreateAugmenter.
reset()

Resets the iterator to the beginning of the data.

next_sample()

Helper function for reading in next sample.

next()

Returns the next batch of data.

check_data_shape(data_shape)

Checks if the input data shape is valid

check_valid_image(data)

Checks if the input data is valid

imdecode(s)

Decodes a string or byte string to an NDArray. See mx.img.imdecode for more details.

read_image(fname)

Reads an input image fname and returns the decoded raw bytes.

>>> dataIter.read_image('Face.jpg') # returns decoded raw bytes.
augmentation_transform(data)

Transforms input data with specified augmentation.

postprocess_data(datum)

Final postprocessing step before image is loaded into the batch.

Read and write for the RecordIO data format.

class mxnet.recordio.MXRecordIO(uri, flag)

Reads/writes RecordIO data format, supporting sequential read and write.

>>> record = mx.recordio.MXRecordIO('tmp.rec', 'w')
<mxnet.recordio.MXRecordIO object at 0x10ef40ed0>
>>> for i in range(5):
...    record.write('record_%d'%i)
>>> record.close()
>>> record = mx.recordio.MXRecordIO('tmp.rec', 'r')
>>> for i in range(5):
...    item = record.read()
...    print(item)
record_0
record_1
record_2
record_3
record_4
>>> record.close()
Parameters:
  • uri (string) – Path to the record file.
  • flag (string) – ‘w’ for write or ‘r’ for read.
open()

Opens the record file.

close()

Closes the record file.

reset()

Resets the pointer to first item.

If the record is opened with ‘w’, this function will truncate the file to empty.

>>> record = mx.recordio.MXRecordIO('tmp.rec', 'r')
>>> for i in range(2):
...    item = record.read()
...    print(item)
record_0
record_1
>>> record.reset()  # Pointer is reset.
>>> print(record.read()) # Started reading from start again.
record_0
>>> record.close()
write(buf)

Inserts a string buffer as a record.

>>> record = mx.recordio.MXRecordIO('tmp.rec', 'w')
>>> for i in range(5):
...    record.write('record_%d'%i)
>>> record.close()
Parameters:buf (string (python2), bytes (python3)) – Buffer to write.
read()

Returns record as a string.

>>> record = mx.recordio.MXRecordIO('tmp.rec', 'r')
>>> for i in range(5):
...    item = record.read()
...    print(item)
record_0
record_1
record_2
record_3
record_4
>>> record.close()
Returns:buf – Buffer read.
Return type:string
class mxnet.recordio.MXIndexedRecordIO(idx_path, uri, flag, key_type=<type 'int'>)

Reads/writes RecordIO data format, supporting random access.

>>> for i in range(5):
...     record.write_idx(i, 'record_%d'%i)
>>> record.close()
>>> record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'r')
>>> record.read_idx(3)
record_3
Parameters:
  • idx_path (str) – Path to the index file.
  • uri (str) – Path to the record file. Only supports seekable file types.
  • flag (str) – ‘w’ for write or ‘r’ for read.
  • key_type (type) – Data type for keys.
close()

Closes the record file.

seek(idx)

Sets the current read pointer position.

This function is internally called by read_idx(idx) to find the current reader pointer position. It doesn’t return anything.

tell()

Returns the current position of write head.

>>> record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'w')
>>> print(record.tell())
0
>>> for i in range(5):
...     record.write_idx(i, 'record_%d'%i)
...     print(record.tell())
16
32
48
64
80
read_idx(idx)

Returns the record at given index.

>>> record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'w')
>>> for i in range(5):
...     record.write_idx(i, 'record_%d'%i)
>>> record.close()
>>> record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'r')
>>> record.read_idx(3)
record_3
write_idx(idx, buf)

Inserts input record at given index.

>>> for i in range(5):
...     record.write_idx(i, 'record_%d'%i)
>>> record.close()
Parameters:
  • idx (int) – Index of a file.
  • buf – Record to write.
mxnet.recordio.IRHeader

An alias for HEADER. Used to store metadata (e.g. labels) accompanying a record. See mxnet.recordio.pack and mxnet.recordio.pack_img for example uses.

Parameters:
  • flag (int) – Available for convenience, can be set arbitrarily.
  • label (float or an array of float) – Typically used to store label(s) for a record.
  • id (int) – Usually a unique id representing record.
  • id2 (int) – Higher order bits of the unique id, should be set to 0 (in most cases).

alias of HEADER

mxnet.recordio.pack(header, s)

Pack a string into MXImageRecord.

Parameters:
  • header (IRHeader) – Header of the image record. header.label can be a number or an array. See more detail in IRHeader.
  • s (str) – Raw image string to be packed.
Returns:

s – The packed string.

Return type:

str

Examples

>>> label = 4 # label can also be a 1-D array, for example: label = [1,2,3]
>>> id = 2574
>>> header = mx.recordio.IRHeader(0, label, id, 0)
>>> with open(path, 'r') as file:
...     s = file.read()
>>> packed_s = mx.recordio.pack(header, s)
mxnet.recordio.unpack(s)

Unpack a MXImageRecord to string.

Parameters:s (str) – String buffer from MXRecordIO.read.
Returns:
  • header (IRHeader) – Header of the image record.
  • s (str) – Unpacked string.

Examples

>>> record = mx.recordio.MXRecordIO('test.rec', 'r')
>>> item = record.read()
>>> header, s = mx.recordio.unpack(item)
>>> header
HEADER(flag=0, label=14.0, id=20129312, id2=0)
mxnet.recordio.unpack_img(s, iscolor=-1)

Unpack a MXImageRecord to image.

Parameters:
  • s (str) – String buffer from MXRecordIO.read.
  • iscolor (int) – Image format option for cv2.imdecode.
Returns:

  • header (IRHeader) – Header of the image record.
  • img (numpy.ndarray) – Unpacked image.

Examples

>>> record = mx.recordio.MXRecordIO('test.rec', 'r')
>>> item = record.read()
>>> header, img = mx.recordio.unpack_img(item)
>>> header
HEADER(flag=0, label=14.0, id=20129312, id2=0)
>>> img
array([[[ 23,  27,  45],
        [ 28,  32,  50],
        ...,
        [ 36,  40,  59],
        [ 35,  39,  58]],
       ...,
       [[ 91,  92, 113],
        [ 97,  98, 119],
        ...,
        [168, 169, 167],
        [166, 167, 165]]], dtype=uint8)
mxnet.recordio.pack_img(header, img, quality=95, img_fmt='.jpg')

Pack an image into MXImageRecord.

Parameters:
  • header (IRHeader) – Header of the image record. header.label can be a number or an array. See more detail in IRHeader.
  • img (numpy.ndarray) – Image to be packed.
  • quality (int) – Quality for JPEG encoding in range 1-100, or compression for PNG encoding in range 1-9.
  • img_fmt (str) – Encoding of the image (.jpg for JPEG, .png for PNG).
Returns:

s – The packed string.

Return type:

str

Examples

>>> label = 4 # label can also be a 1-D array, for example: label = [1,2,3]
>>> id = 2574
>>> header = mx.recordio.IRHeader(0, label, id, 0)
>>> img = cv2.imread('test.jpg')
>>> packed_s = mx.recordio.pack_img(header, img)