Fine-tune with Pre-trained Models

In practice the dataset we use is relative small, so that we do not train an neural network from scratch, namely staring from random initialized parameters. Instead, it is common to train a neural network on a large-scale dataset and then use it either as an initialization or a fixed feature extractor. On predict.ipynb we explained how to do the feature extraction, this tutorial will focus on how to use pre-trained model to fine tune a new network.

The idea of fine-tune is that, we take a pre-trained model, replace the last fully-connected layer with new one, which outputs the desired number of classes and initializes with random values. Then we train as normal except that we may often use a smaller learning rate since we may already very close the final result.

We will use pre-trained models on the Imagenet dataset to fine-tune the smaller caltech-256 dataset as an example. But note that it can be used to other datasets as well, even for quite different applications such as face identification.

We will show that, even with simple hyper-parameters setting, we can match and even outperform state-of-the-art results on caltech-256.

Network Accuracy
Resnet-50 77.4%
Resnet-152 86.4%

Prepare data

We follow the standard protocol to sample 60 images from each class as the training set, and the rest for the validation set. We resize images into 256x256 size and pack them into the rec file. The scripts to prepare the data is as following.

wget http://www.vision.caltech.edu/Image_Datasets/Caltech256/256_ObjectCategories.tar
tar -xf 256_ObjectCategories.tar

mkdir -p caltech_256_train_60
for i in 256_ObjectCategories/*; do
    c=`basename $i`
    mkdir -p caltech_256_train_60/$c
    for j in `ls $i/*.jpg | shuf | head -n 60`; do
        mv $j caltech_256_train_60/$c/
    done
done

python ~/mxnet/tools/im2rec.py --list True --recursive True caltech-256-60-train caltech_256_train_60/
python ~/mxnet/tools/im2rec.py --list True --recursive True caltech-256-60-val 256_ObjectCategories/
python ~/mxnet/tools/im2rec.py --resize 256 --quality 90 --num-thread 16 caltech-256-60-val 256_ObjectCategories/
python ~/mxnet/tools/im2rec.py --resize 256 --quality 90 --num-thread 16 caltech-256-60-train caltech_256_train_60/

The following codes download the pre-generated rec files. It may take a few minutes.

import os, urllib
def download(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.urlretrieve(url, filename)
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')

Next we define the function which returns the data iterators.

import mxnet as mx

def get_iterators(batch_size, data_shape=(3, 224, 224)):
    train = mx.io.ImageRecordIter(
        path_imgrec         = './caltech-256-60-train.rec',
        data_name           = 'data',
        label_name          = 'softmax_label',
        batch_size          = batch_size,
        data_shape          = data_shape,
        shuffle             = True,
        rand_crop           = True,
        rand_mirror         = True)
    val = mx.io.ImageRecordIter(
        path_imgrec         = './caltech-256-60-val.rec',
        data_name           = 'data',
        label_name          = 'softmax_label',
        batch_size          = batch_size,
        data_shape          = data_shape,
        rand_crop           = False,
        rand_mirror         = False)
    return (train, val)

We then download a pretrained 50-layer ResNet model and load into memory.

Note. If load_checkpoint reports error, we can remove the downloaded files and try get_model again.

def get_model(prefix, epoch):
    download(prefix+'-symbol.json')
    download(prefix+'-%04d.params' % (epoch,))

get_model('http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50', 0)
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-50', 0)

Train

We first define a function which replaces the the last fully-connected layer for a given network.

def get_fine_tune_model(symbol, arg_params, num_classes, layer_name='flatten0'):
    """
    symbol: the pre-trained network symbol
    arg_params: the argument parameters of the pre-trained model
    num_classes: the number of classes for the fine-tune datasets
    layer_name: the layer name before the last fully-connected layer
    """
    all_layers = sym.get_internals()
    net = all_layers[layer_name+'_output']
    net = mx.symbol.FullyConnected(data=net, num_hidden=num_classes, name='fc1')
    net = mx.symbol.SoftmaxOutput(data=net, name='softmax')
    new_args = dict({k:arg_params[k] for k in arg_params if 'fc1' not in k})
    return (net, new_args)

Now we create a module. We first call init_params to randomly initialize parameters, next use set_params to replace all parameters except for the last fully-connected layer with pre-trained model.

import logging
head = '%(asctime)-15s %(message)s'
logging.basicConfig(level=logging.DEBUG, format=head)

def fit(symbol, arg_params, aux_params, train, val, batch_size, num_gpus):
    devs = [mx.gpu(i) for i in range(num_gpus)]
    mod = mx.mod.Module(symbol=new_sym, context=devs)
    mod.bind(data_shapes=train.provide_data, label_shapes=train.provide_label)
    mod.init_params(initializer=mx.init.Xavier(rnd_type='gaussian', factor_type="in", magnitude=2))
    mod.set_params(new_args, aux_params, allow_missing=True)
    mod.fit(train, val, 
        num_epoch=8,
        batch_end_callback = mx.callback.Speedometer(batch_size, 10),        
        kvstore='device',
        optimizer='sgd',
        optimizer_params={'learning_rate':0.01},
        eval_metric='acc')
    metric = mx.metric.Accuracy()
    return mod.score(val, metric)

Then we can start training. We use AWS EC2 g2.8xlarge, which has 8 GPUs.

# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
num_classes = 256
batch_per_gpu = 16
num_gpus = 8

(new_sym, new_args) = get_fine_tune_model(sym, arg_params, num_classes)

batch_size = batch_per_gpu * num_gpus
(train, val) = get_iterators(batch_size)
mod_score = fit(new_sym, new_args, aux_params, train, val, batch_size, num_gpus)
assert mod_score > 0.77, "Low training accuracy."
2016-10-22 18:24:16,695 Already binded, ignoring bind()
2016-10-22 18:24:22,361 Epoch[0] Batch [10] Speed: 325.98 samples/sec   Train-accuracy=0.004261
2016-10-22 18:24:26,205 Epoch[0] Batch [20] Speed: 333.06 samples/sec   Train-accuracy=0.011719
2016-10-22 18:24:30,072 Epoch[0] Batch [30] Speed: 331.06 samples/sec   Train-accuracy=0.021094
2016-10-22 18:24:33,954 Epoch[0] Batch [40] Speed: 329.84 samples/sec   Train-accuracy=0.020313
2016-10-22 18:24:37,811 Epoch[0] Batch [50] Speed: 331.93 samples/sec   Train-accuracy=0.023438
2016-10-22 18:24:41,668 Epoch[0] Batch [60] Speed: 331.93 samples/sec   Train-accuracy=0.032813
2016-10-22 18:24:45,557 Epoch[0] Batch [70] Speed: 329.22 samples/sec   Train-accuracy=0.049219
2016-10-22 18:24:49,424 Epoch[0] Batch [80] Speed: 331.12 samples/sec   Train-accuracy=0.071875
2016-10-22 18:24:53,323 Epoch[0] Batch [90] Speed: 328.36 samples/sec   Train-accuracy=0.084375
2016-10-22 18:24:57,203 Epoch[0] Batch [100]    Speed: 329.95 samples/sec   Train-accuracy=0.115625
2016-10-22 18:25:01,091 Epoch[0] Batch [110]    Speed: 329.33 samples/sec   Train-accuracy=0.153906
2016-10-22 18:25:05,000 Epoch[0] Batch [120]    Speed: 327.49 samples/sec   Train-accuracy=0.187500
2016-10-22 18:25:05,001 Epoch[0] Train-accuracy=nan
2016-10-22 18:25:05,002 Epoch[0] Time cost=48.301
2016-10-22 18:25:24,502 Epoch[0] Validation-accuracy=0.297072
2016-10-22 18:25:28,564 Epoch[1] Batch [10] Speed: 330.58 samples/sec   Train-accuracy=0.240767
2016-10-22 18:25:32,426 Epoch[1] Batch [20] Speed: 331.53 samples/sec   Train-accuracy=0.265625
2016-10-22 18:25:36,289 Epoch[1] Batch [30] Speed: 331.41 samples/sec   Train-accuracy=0.287500
2016-10-22 18:25:40,173 Epoch[1] Batch [40] Speed: 329.64 samples/sec   Train-accuracy=0.314063
2016-10-22 18:25:44,032 Epoch[1] Batch [50] Speed: 331.80 samples/sec   Train-accuracy=0.361719
2016-10-22 18:25:47,876 Epoch[1] Batch [60] Speed: 333.07 samples/sec   Train-accuracy=0.347656
2016-10-22 18:25:51,741 Epoch[1] Batch [70] Speed: 331.30 samples/sec   Train-accuracy=0.410156
2016-10-22 18:25:55,603 Epoch[1] Batch [80] Speed: 331.50 samples/sec   Train-accuracy=0.417187
2016-10-22 18:25:59,460 Epoch[1] Batch [90] Speed: 331.88 samples/sec   Train-accuracy=0.425781
2016-10-22 18:26:03,304 Epoch[1] Batch [100]    Speed: 333.11 samples/sec   Train-accuracy=0.419531
2016-10-22 18:26:07,196 Epoch[1] Batch [110]    Speed: 328.97 samples/sec   Train-accuracy=0.496875
2016-10-22 18:26:10,665 Epoch[1] Train-accuracy=0.488715
2016-10-22 18:26:10,666 Epoch[1] Time cost=46.163
2016-10-22 18:26:29,719 Epoch[1] Validation-accuracy=0.556066
2016-10-22 18:26:33,883 Epoch[2] Batch [10] Speed: 325.12 samples/sec   Train-accuracy=0.514915
2016-10-22 18:26:37,757 Epoch[2] Batch [20] Speed: 330.50 samples/sec   Train-accuracy=0.524219
2016-10-22 18:26:41,684 Epoch[2] Batch [30] Speed: 325.98 samples/sec   Train-accuracy=0.536719
2016-10-22 18:26:45,562 Epoch[2] Batch [40] Speed: 330.21 samples/sec   Train-accuracy=0.514844
2016-10-22 18:26:49,448 Epoch[2] Batch [50] Speed: 329.44 samples/sec   Train-accuracy=0.564844
2016-10-22 18:26:53,338 Epoch[2] Batch [60] Speed: 329.16 samples/sec   Train-accuracy=0.534375
2016-10-22 18:26:57,230 Epoch[2] Batch [70] Speed: 328.99 samples/sec   Train-accuracy=0.576562
2016-10-22 18:27:01,128 Epoch[2] Batch [80] Speed: 328.42 samples/sec   Train-accuracy=0.604688
2016-10-22 18:27:04,990 Epoch[2] Batch [90] Speed: 331.54 samples/sec   Train-accuracy=0.582812
2016-10-22 18:27:08,874 Epoch[2] Batch [100]    Speed: 329.63 samples/sec   Train-accuracy=0.572656
2016-10-22 18:27:12,737 Epoch[2] Batch [110]    Speed: 331.45 samples/sec   Train-accuracy=0.625781
2016-10-22 18:27:16,591 Epoch[2] Batch [120]    Speed: 332.20 samples/sec   Train-accuracy=0.603125
2016-10-22 18:27:16,597 Epoch[2] Train-accuracy=nan
2016-10-22 18:27:16,598 Epoch[2] Time cost=46.878
2016-10-22 18:27:34,905 Epoch[2] Validation-accuracy=0.651947
2016-10-22 18:27:38,961 Epoch[3] Batch [10] Speed: 330.53 samples/sec   Train-accuracy=0.636364
2016-10-22 18:27:42,811 Epoch[3] Batch [20] Speed: 332.56 samples/sec   Train-accuracy=0.634375
2016-10-22 18:27:46,675 Epoch[3] Batch [30] Speed: 331.38 samples/sec   Train-accuracy=0.629687
2016-10-22 18:27:50,545 Epoch[3] Batch [40] Speed: 330.79 samples/sec   Train-accuracy=0.641406
2016-10-22 18:27:54,423 Epoch[3] Batch [50] Speed: 330.16 samples/sec   Train-accuracy=0.665625
2016-10-22 18:27:58,273 Epoch[3] Batch [60] Speed: 332.54 samples/sec   Train-accuracy=0.638281
2016-10-22 18:28:02,131 Epoch[3] Batch [70] Speed: 331.93 samples/sec   Train-accuracy=0.671875
2016-10-22 18:28:05,988 Epoch[3] Batch [80] Speed: 331.88 samples/sec   Train-accuracy=0.691406
2016-10-22 18:28:09,870 Epoch[3] Batch [90] Speed: 329.84 samples/sec   Train-accuracy=0.670312
2016-10-22 18:28:13,742 Epoch[3] Batch [100]    Speed: 330.65 samples/sec   Train-accuracy=0.660156
2016-10-22 18:28:17,636 Epoch[3] Batch [110]    Speed: 328.77 samples/sec   Train-accuracy=0.681250
2016-10-22 18:28:21,097 Epoch[3] Train-accuracy=0.684028
2016-10-22 18:28:21,098 Epoch[3] Time cost=46.192
2016-10-22 18:28:40,464 Epoch[3] Validation-accuracy=0.701943
2016-10-22 18:28:44,610 Epoch[4] Batch [10] Speed: 327.03 samples/sec   Train-accuracy=0.708807
2016-10-22 18:28:48,480 Epoch[4] Batch [20] Speed: 330.86 samples/sec   Train-accuracy=0.708594
2016-10-22 18:28:52,371 Epoch[4] Batch [30] Speed: 329.02 samples/sec   Train-accuracy=0.713281
2016-10-22 18:28:56,234 Epoch[4] Batch [40] Speed: 331.46 samples/sec   Train-accuracy=0.700781
2016-10-22 18:29:00,129 Epoch[4] Batch [50] Speed: 328.65 samples/sec   Train-accuracy=0.712500
2016-10-22 18:29:04,006 Epoch[4] Batch [60] Speed: 330.30 samples/sec   Train-accuracy=0.697656
2016-10-22 18:29:07,865 Epoch[4] Batch [70] Speed: 331.74 samples/sec   Train-accuracy=0.717969
2016-10-22 18:29:11,737 Epoch[4] Batch [80] Speed: 330.61 samples/sec   Train-accuracy=0.737500
2016-10-22 18:29:15,592 Epoch[4] Batch [90] Speed: 332.19 samples/sec   Train-accuracy=0.714844
2016-10-22 18:29:19,435 Epoch[4] Batch [100]    Speed: 333.15 samples/sec   Train-accuracy=0.696875
2016-10-22 18:29:23,287 Epoch[4] Batch [110]    Speed: 332.35 samples/sec   Train-accuracy=0.734375
2016-10-22 18:29:27,136 Epoch[4] Batch [120]    Speed: 332.61 samples/sec   Train-accuracy=0.726562
2016-10-22 18:29:27,137 Epoch[4] Train-accuracy=nan
2016-10-22 18:29:27,138 Epoch[4] Time cost=46.673
2016-10-22 18:29:45,791 Epoch[4] Validation-accuracy=0.736935
2016-10-22 18:29:49,873 Epoch[5] Batch [10] Speed: 332.48 samples/sec   Train-accuracy=0.749290
2016-10-22 18:29:53,765 Epoch[5] Batch [20] Speed: 328.95 samples/sec   Train-accuracy=0.732031
2016-10-22 18:29:57,648 Epoch[5] Batch [30] Speed: 329.67 samples/sec   Train-accuracy=0.736719
2016-10-22 18:30:01,540 Epoch[5] Batch [40] Speed: 329.42 samples/sec   Train-accuracy=0.722656
2016-10-22 18:30:05,433 Epoch[5] Batch [50] Speed: 328.82 samples/sec   Train-accuracy=0.751563
2016-10-22 18:30:09,309 Epoch[5] Batch [60] Speed: 330.37 samples/sec   Train-accuracy=0.736719
2016-10-22 18:30:13,198 Epoch[5] Batch [70] Speed: 329.27 samples/sec   Train-accuracy=0.771875
2016-10-22 18:30:17,084 Epoch[5] Batch [80] Speed: 329.47 samples/sec   Train-accuracy=0.762500
2016-10-22 18:30:20,958 Epoch[5] Batch [90] Speed: 330.43 samples/sec   Train-accuracy=0.742969
2016-10-22 18:30:24,858 Epoch[5] Batch [100]    Speed: 328.32 samples/sec   Train-accuracy=0.770312
2016-10-22 18:30:28,734 Epoch[5] Batch [110]    Speed: 330.27 samples/sec   Train-accuracy=0.781250
2016-10-22 18:30:32,217 Epoch[5] Train-accuracy=0.757812
2016-10-22 18:30:32,218 Epoch[5] Time cost=46.426
2016-10-22 18:30:51,745 Epoch[5] Validation-accuracy=0.752450
2016-10-22 18:30:55,887 Epoch[6] Batch [10] Speed: 326.48 samples/sec   Train-accuracy=0.754261
2016-10-22 18:30:59,754 Epoch[6] Batch [20] Speed: 331.16 samples/sec   Train-accuracy=0.768750
2016-10-22 18:31:03,612 Epoch[6] Batch [30] Speed: 331.83 samples/sec   Train-accuracy=0.774219
2016-10-22 18:31:07,472 Epoch[6] Batch [40] Speed: 331.66 samples/sec   Train-accuracy=0.751563
2016-10-22 18:31:11,326 Epoch[6] Batch [50] Speed: 332.21 samples/sec   Train-accuracy=0.777344
2016-10-22 18:31:15,194 Epoch[6] Batch [60] Speed: 331.01 samples/sec   Train-accuracy=0.762500
2016-10-22 18:31:19,062 Epoch[6] Batch [70] Speed: 331.03 samples/sec   Train-accuracy=0.801562
2016-10-22 18:31:22,938 Epoch[6] Batch [80] Speed: 330.32 samples/sec   Train-accuracy=0.788281
2016-10-22 18:31:26,802 Epoch[6] Batch [90] Speed: 331.37 samples/sec   Train-accuracy=0.773438
2016-10-22 18:31:30,656 Epoch[6] Batch [100]    Speed: 332.24 samples/sec   Train-accuracy=0.777344
2016-10-22 18:31:34,555 Epoch[6] Batch [110]    Speed: 328.36 samples/sec   Train-accuracy=0.791406
2016-10-22 18:31:38,412 Epoch[6] Batch [120]    Speed: 331.89 samples/sec   Train-accuracy=0.791406
2016-10-22 18:31:38,413 Epoch[6] Train-accuracy=nan
2016-10-22 18:31:38,414 Epoch[6] Time cost=46.668
2016-10-22 18:31:57,459 Epoch[6] Validation-accuracy=0.768382
2016-10-22 18:32:01,634 Epoch[7] Batch [10] Speed: 324.04 samples/sec   Train-accuracy=0.789773
2016-10-22 18:32:05,542 Epoch[7] Batch [20] Speed: 327.57 samples/sec   Train-accuracy=0.794531
2016-10-22 18:32:09,411 Epoch[7] Batch [30] Speed: 330.90 samples/sec   Train-accuracy=0.788281
2016-10-22 18:32:13,311 Epoch[7] Batch [40] Speed: 328.36 samples/sec   Train-accuracy=0.778906
2016-10-22 18:32:17,190 Epoch[7] Batch [50] Speed: 330.00 samples/sec   Train-accuracy=0.803125
2016-10-22 18:32:21,075 Epoch[7] Batch [60] Speed: 329.54 samples/sec   Train-accuracy=0.780469
2016-10-22 18:32:24,934 Epoch[7] Batch [70] Speed: 331.78 samples/sec   Train-accuracy=0.779687
2016-10-22 18:32:28,803 Epoch[7] Batch [80] Speed: 330.92 samples/sec   Train-accuracy=0.821875
2016-10-22 18:32:32,662 Epoch[7] Batch [90] Speed: 331.79 samples/sec   Train-accuracy=0.783594
2016-10-22 18:32:36,515 Epoch[7] Batch [100]    Speed: 332.32 samples/sec   Train-accuracy=0.802344
2016-10-22 18:32:40,393 Epoch[7] Batch [110]    Speed: 330.16 samples/sec   Train-accuracy=0.800000
2016-10-22 18:32:43,832 Epoch[7] Train-accuracy=0.782118
2016-10-22 18:32:43,833 Epoch[7] Time cost=46.373
2016-10-22 18:33:01,994 Epoch[7] Validation-accuracy=0.774422

As can be seen, even for 8 data epochs, we can get 78% validation accuracy. It matches the state-of-the-art results training on caltech-256 alone, e.g. VGG.

Next we try to use another pretrained model. It uses the complete Imagenet dataset, which is 10x larger than the Imagenet 1K classes one, and is trained with a 3x deeper Resnet network.

# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
get_model('http://data.mxnet.io/models/imagenet-11k/resnet-152/resnet-152', 0)
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0)
(new_sym, new_args) = get_fine_tune_model(sym, arg_params, num_classes)
mod_score = fit(new_sym, new_args, aux_params, train, val, batch_size, num_gpus)
assert mod_score > 0.86, "Low training accuracy."
2016-10-22 18:35:42,274 Already binded, ignoring bind()
2016-10-22 18:35:55,659 Epoch[0] Batch [10] Speed: 139.63 samples/sec   Train-accuracy=0.070312
2016-10-22 18:36:04,814 Epoch[0] Batch [20] Speed: 139.83 samples/sec   Train-accuracy=0.349219
2016-10-22 18:36:13,991 Epoch[0] Batch [30] Speed: 139.49 samples/sec   Train-accuracy=0.585156
2016-10-22 18:36:23,163 Epoch[0] Batch [40] Speed: 139.57 samples/sec   Train-accuracy=0.642188
2016-10-22 18:36:32,309 Epoch[0] Batch [50] Speed: 139.97 samples/sec   Train-accuracy=0.728906
2016-10-22 18:36:41,426 Epoch[0] Batch [60] Speed: 140.41 samples/sec   Train-accuracy=0.760156
2016-10-22 18:36:50,531 Epoch[0] Batch [70] Speed: 140.60 samples/sec   Train-accuracy=0.778906
2016-10-22 18:36:59,631 Epoch[0] Batch [80] Speed: 140.68 samples/sec   Train-accuracy=0.786719
2016-10-22 18:37:08,742 Epoch[0] Batch [90] Speed: 140.51 samples/sec   Train-accuracy=0.797656
2016-10-22 18:37:17,857 Epoch[0] Batch [100]    Speed: 140.45 samples/sec   Train-accuracy=0.823438
2016-10-22 18:37:26,969 Epoch[0] Batch [110]    Speed: 140.50 samples/sec   Train-accuracy=0.827344
2016-10-22 18:37:36,094 Epoch[0] Batch [120]    Speed: 140.29 samples/sec   Train-accuracy=0.829688
2016-10-22 18:37:36,095 Epoch[0] Train-accuracy=nan
2016-10-22 18:37:36,096 Epoch[0] Time cost=113.804
2016-10-22 18:38:08,728 Epoch[0] Validation-accuracy=0.829780
2016-10-22 18:38:18,228 Epoch[1] Batch [10] Speed: 139.92 samples/sec   Train-accuracy=0.862926
2016-10-22 18:38:27,365 Epoch[1] Batch [20] Speed: 140.10 samples/sec   Train-accuracy=0.867969
2016-10-22 18:38:36,476 Epoch[1] Batch [30] Speed: 140.52 samples/sec   Train-accuracy=0.884375
2016-10-22 18:38:45,581 Epoch[1] Batch [40] Speed: 140.60 samples/sec   Train-accuracy=0.856250
2016-10-22 18:38:54,671 Epoch[1] Batch [50] Speed: 140.84 samples/sec   Train-accuracy=0.888281
2016-10-22 18:39:03,774 Epoch[1] Batch [60] Speed: 140.62 samples/sec   Train-accuracy=0.891406
2016-10-22 18:39:12,893 Epoch[1] Batch [70] Speed: 140.38 samples/sec   Train-accuracy=0.893750
2016-10-22 18:39:22,016 Epoch[1] Batch [80] Speed: 140.33 samples/sec   Train-accuracy=0.911719
2016-10-22 18:39:31,173 Epoch[1] Batch [90] Speed: 139.79 samples/sec   Train-accuracy=0.893750
2016-10-22 18:39:40,341 Epoch[1] Batch [100]    Speed: 139.65 samples/sec   Train-accuracy=0.885938
2016-10-22 18:39:49,522 Epoch[1] Batch [110]    Speed: 139.45 samples/sec   Train-accuracy=0.901563
2016-10-22 18:39:57,750 Epoch[1] Train-accuracy=0.907986
2016-10-22 18:39:57,751 Epoch[1] Time cost=109.022
2016-10-22 18:40:30,649 Epoch[1] Validation-accuracy=0.848608
2016-10-22 18:40:40,134 Epoch[2] Batch [10] Speed: 140.33 samples/sec   Train-accuracy=0.921875
2016-10-22 18:40:49,247 Epoch[2] Batch [20] Speed: 140.47 samples/sec   Train-accuracy=0.911719
2016-10-22 18:40:58,367 Epoch[2] Batch [30] Speed: 140.37 samples/sec   Train-accuracy=0.914844
2016-10-22 18:41:07,515 Epoch[2] Batch [40] Speed: 139.93 samples/sec   Train-accuracy=0.913281
2016-10-22 18:41:16,659 Epoch[2] Batch [50] Speed: 140.01 samples/sec   Train-accuracy=0.929688
2016-10-22 18:41:25,826 Epoch[2] Batch [60] Speed: 139.64 samples/sec   Train-accuracy=0.940625
2016-10-22 18:41:35,015 Epoch[2] Batch [70] Speed: 139.31 samples/sec   Train-accuracy=0.927344
2016-10-22 18:41:44,178 Epoch[2] Batch [80] Speed: 139.72 samples/sec   Train-accuracy=0.940625
2016-10-22 18:41:53,316 Epoch[2] Batch [90] Speed: 140.09 samples/sec   Train-accuracy=0.928125
2016-10-22 18:42:02,413 Epoch[2] Batch [100]    Speed: 140.72 samples/sec   Train-accuracy=0.948438
2016-10-22 18:42:11,522 Epoch[2] Batch [110]    Speed: 140.53 samples/sec   Train-accuracy=0.925781
2016-10-22 18:42:20,624 Epoch[2] Batch [120]    Speed: 140.66 samples/sec   Train-accuracy=0.928906
2016-10-22 18:42:20,625 Epoch[2] Train-accuracy=nan
2016-10-22 18:42:20,626 Epoch[2] Time cost=109.976
2016-10-22 18:42:53,414 Epoch[2] Validation-accuracy=0.853269
2016-10-22 18:43:02,925 Epoch[3] Batch [10] Speed: 139.86 samples/sec   Train-accuracy=0.941051
2016-10-22 18:43:12,095 Epoch[3] Batch [20] Speed: 139.60 samples/sec   Train-accuracy=0.935156
2016-10-22 18:43:21,270 Epoch[3] Batch [30] Speed: 139.52 samples/sec   Train-accuracy=0.939844
2016-10-22 18:43:30,434 Epoch[3] Batch [40] Speed: 139.70 samples/sec   Train-accuracy=0.945312
2016-10-22 18:43:39,557 Epoch[3] Batch [50] Speed: 140.31 samples/sec   Train-accuracy=0.946094
2016-10-22 18:43:48,680 Epoch[3] Batch [60] Speed: 140.33 samples/sec   Train-accuracy=0.937500
2016-10-22 18:43:57,775 Epoch[3] Batch [70] Speed: 140.75 samples/sec   Train-accuracy=0.951562
2016-10-22 18:44:06,899 Epoch[3] Batch [80] Speed: 140.31 samples/sec   Train-accuracy=0.956250
2016-10-22 18:44:16,000 Epoch[3] Batch [90] Speed: 140.67 samples/sec   Train-accuracy=0.942969
2016-10-22 18:44:25,110 Epoch[3] Batch [100]    Speed: 140.52 samples/sec   Train-accuracy=0.958594
2016-10-22 18:44:34,225 Epoch[3] Batch [110]    Speed: 140.46 samples/sec   Train-accuracy=0.946875
2016-10-22 18:44:42,448 Epoch[3] Train-accuracy=0.952257
2016-10-22 18:44:42,450 Epoch[3] Time cost=109.035
2016-10-22 18:45:15,423 Epoch[3] Validation-accuracy=0.857587
2016-10-22 18:45:24,921 Epoch[4] Batch [10] Speed: 139.90 samples/sec   Train-accuracy=0.965199
2016-10-22 18:45:34,041 Epoch[4] Batch [20] Speed: 140.37 samples/sec   Train-accuracy=0.964844
2016-10-22 18:45:43,172 Epoch[4] Batch [30] Speed: 140.20 samples/sec   Train-accuracy=0.968750
2016-10-22 18:45:52,287 Epoch[4] Batch [40] Speed: 140.45 samples/sec   Train-accuracy=0.955469
2016-10-22 18:46:01,418 Epoch[4] Batch [50] Speed: 140.20 samples/sec   Train-accuracy=0.971094
2016-10-22 18:46:10,534 Epoch[4] Batch [60] Speed: 140.43 samples/sec   Train-accuracy=0.954688
2016-10-22 18:46:19,664 Epoch[4] Batch [70] Speed: 140.21 samples/sec   Train-accuracy=0.964063
2016-10-22 18:46:28,811 Epoch[4] Batch [80] Speed: 139.96 samples/sec   Train-accuracy=0.969531
2016-10-22 18:46:37,986 Epoch[4] Batch [90] Speed: 139.53 samples/sec   Train-accuracy=0.961719
2016-10-22 18:46:47,150 Epoch[4] Batch [100]    Speed: 139.70 samples/sec   Train-accuracy=0.966406
2016-10-22 18:46:56,307 Epoch[4] Batch [110]    Speed: 139.79 samples/sec   Train-accuracy=0.966406
2016-10-22 18:47:05,456 Epoch[4] Batch [120]    Speed: 139.94 samples/sec   Train-accuracy=0.966406
2016-10-22 18:47:05,457 Epoch[4] Train-accuracy=nan
2016-10-22 18:47:05,457 Epoch[4] Time cost=110.033
2016-10-22 18:47:38,303 Epoch[4] Validation-accuracy=0.862329
2016-10-22 18:47:47,779 Epoch[5] Batch [10] Speed: 140.25 samples/sec   Train-accuracy=0.971591
2016-10-22 18:47:56,897 Epoch[5] Batch [20] Speed: 140.40 samples/sec   Train-accuracy=0.970313
2016-10-22 18:48:06,006 Epoch[5] Batch [30] Speed: 140.53 samples/sec   Train-accuracy=0.976562
2016-10-22 18:48:15,150 Epoch[5] Batch [40] Speed: 140.01 samples/sec   Train-accuracy=0.967187
2016-10-22 18:48:24,320 Epoch[5] Batch [50] Speed: 139.60 samples/sec   Train-accuracy=0.975781
2016-10-22 18:48:33,515 Epoch[5] Batch [60] Speed: 139.22 samples/sec   Train-accuracy=0.971094
2016-10-22 18:48:42,707 Epoch[5] Batch [70] Speed: 139.26 samples/sec   Train-accuracy=0.971875
2016-10-22 18:48:51,857 Epoch[5] Batch [80] Speed: 139.92 samples/sec   Train-accuracy=0.988281
2016-10-22 18:49:00,980 Epoch[5] Batch [90] Speed: 140.32 samples/sec   Train-accuracy=0.969531
2016-10-22 18:49:10,092 Epoch[5] Batch [100]    Speed: 140.49 samples/sec   Train-accuracy=0.984375
2016-10-22 18:49:19,205 Epoch[5] Batch [110]    Speed: 140.49 samples/sec   Train-accuracy=0.978125
2016-10-22 18:49:27,399 Epoch[5] Train-accuracy=0.968750
2016-10-22 18:49:27,400 Epoch[5] Time cost=109.095
2016-10-22 18:50:00,339 Epoch[5] Validation-accuracy=0.864102
2016-10-22 18:50:09,861 Epoch[6] Batch [10] Speed: 139.72 samples/sec   Train-accuracy=0.978693
2016-10-22 18:50:19,028 Epoch[6] Batch [20] Speed: 139.65 samples/sec   Train-accuracy=0.976562
2016-10-22 18:50:28,206 Epoch[6] Batch [30] Speed: 139.48 samples/sec   Train-accuracy=0.975000
2016-10-22 18:50:37,343 Epoch[6] Batch [40] Speed: 140.11 samples/sec   Train-accuracy=0.976562
2016-10-22 18:50:46,475 Epoch[6] Batch [50] Speed: 140.18 samples/sec   Train-accuracy=0.971094
2016-10-22 18:50:55,613 Epoch[6] Batch [60] Speed: 140.10 samples/sec   Train-accuracy=0.976562
2016-10-22 18:51:04,717 Epoch[6] Batch [70] Speed: 140.60 samples/sec   Train-accuracy=0.978906
2016-10-22 18:51:13,821 Epoch[6] Batch [80] Speed: 140.63 samples/sec   Train-accuracy=0.977344
2016-10-22 18:51:22,932 Epoch[6] Batch [90] Speed: 140.50 samples/sec   Train-accuracy=0.971875
2016-10-22 18:51:32,039 Epoch[6] Batch [100]    Speed: 140.56 samples/sec   Train-accuracy=0.980469
2016-10-22 18:51:41,172 Epoch[6] Batch [110]    Speed: 140.17 samples/sec   Train-accuracy=0.978906
2016-10-22 18:51:50,312 Epoch[6] Batch [120]    Speed: 140.06 samples/sec   Train-accuracy=0.978906
2016-10-22 18:51:50,314 Epoch[6] Train-accuracy=nan
2016-10-22 18:51:50,314 Epoch[6] Time cost=109.974
2016-10-22 18:52:23,287 Epoch[6] Validation-accuracy=0.864738
2016-10-22 18:52:32,798 Epoch[7] Batch [10] Speed: 139.84 samples/sec   Train-accuracy=0.982244
2016-10-22 18:52:41,881 Epoch[7] Batch [20] Speed: 140.94 samples/sec   Train-accuracy=0.980469
2016-10-22 18:52:50,982 Epoch[7] Batch [30] Speed: 140.67 samples/sec   Train-accuracy=0.978906
2016-10-22 18:53:00,086 Epoch[7] Batch [40] Speed: 140.61 samples/sec   Train-accuracy=0.980469
2016-10-22 18:53:09,208 Epoch[7] Batch [50] Speed: 140.35 samples/sec   Train-accuracy=0.975000
2016-10-22 18:53:18,342 Epoch[7] Batch [60] Speed: 140.15 samples/sec   Train-accuracy=0.970313
2016-10-22 18:53:27,490 Epoch[7] Batch [70] Speed: 139.94 samples/sec   Train-accuracy=0.978125
2016-10-22 18:53:36,623 Epoch[7] Batch [80] Speed: 140.15 samples/sec   Train-accuracy=0.989844
2016-10-22 18:53:45,795 Epoch[7] Batch [90] Speed: 139.58 samples/sec   Train-accuracy=0.976562
2016-10-22 18:53:54,958 Epoch[7] Batch [100]    Speed: 139.70 samples/sec   Train-accuracy=0.981250
2016-10-22 18:54:04,143 Epoch[7] Batch [110]    Speed: 139.39 samples/sec   Train-accuracy=0.974219
2016-10-22 18:54:12,364 Epoch[7] Train-accuracy=0.976562
2016-10-22 18:54:12,365 Epoch[7] Time cost=109.077
2016-10-22 18:54:45,259 Epoch[7] Validation-accuracy=0.863905

As can be seen, even for a single data epoch, it reaches 83% validation accuracy. After 8 epoches, the validation accuracy increases to 86.4%.