Deep Learning in a Single File for Smart Devices

Deep learning (DL) systems are complex, and often have a few of dependencies. It can be painful to port a DL library to different platforms, especially smart devices. A simple solution to this problem is to provide a light interface and put all required code into a single file with minimal dependencies. In this topic, we explain how to amalgamate the code into a single file, and demonstrate how to run image object recognition on mobile devices.

Amalgamation: Making the Whole System a Single File

The idea of amalgamation comes from SQLite and other projects, which pack all code into a single source file. To create the library, you only need to compile that single file. This simplifies porting to various platforms. Thanks to Jack Deng, MXNet provides an amalgamation script, that compiles all code needed for prediction based on trained DL models into a single .cc file, which has approximately 30K lines of code. The only dependency is a BLAS library.

We’ve also created a minimal version, with the BLAS dependency removed. You can compile the single file into JavaScript by using emscripten.

The compiled library can be used by any other programming language. The .h file contains a light prediction API. Porting to another language with a C foreign function interface requires little effort. For examples, see the following examples on GitHub:

If you plan to amalgamate your system, there are a few guidelines you need to observe when building the project:

  • Minimize the dependency on other libraries.
  • Use namespace to encapsulate the types and operators.
  • Avoid doing commands such as using namespace xyz on the global scope.
  • Avoid cyclic include dependencies.

Image Recognition Demo on Mobile Devices

With amalgamation, deploying the system on smart devices (such as Android or iOS) is simple. But there are two additional considerations:

  • The model should be small enough to fit into the device’s memory.
  • The model shouldn’t be too expensive to run given the relatively low computational power of these devices.

Let’s use image recognition as an example to show how to get such a model. We start with the state-of-the-art inception model. We train it on an ImageNet dataset, using multiple servers with GTX 980 cards. The resulting model fits into memory, but it’s too expensive to run. We remove some layers, but now the results are poor.

Finally, we show an Android example, thanks to Leliana, to demonstrate how to run on Android.

By using amalgamation, we can easily port the prediction library to mobile devices, with nearly no dependency. Compiling on a smart platform is no longer a painful task. After compiling the library for smart platforms, the last thing is to call C-API in the target language (Java/Swift).

Besides this pre-trained Inception-BatchNorm network, we’ve provided two pre-trained models.

We tested our model on Nexus 5:

.. list-table::
  :header-rows: 1

  * -  
    - Top-1 Validation on ILSVRC2012 
    - Time 
    - App Size 
    - Runtime Temp Memory Req 
  * - FastPoorNet 
    - around 52%, similar to 2011 winner 
    - 1s 
    - <10MB 
    - ::        <5MB                 
  * - Sub InceptionBN 
    - around 64%, similar to 2013 winner 
    - 2.7s 
    - <40MB 
    - ::        <10MB                
  * - InceptionBN 
    - around 70% 
    - 4s-5s 
    - <60MB 
    - ::         10MB                

These models are for demonstration only. They aren't fine-tuned for mobile devices, and there is definitely great room for improvement.  We believe making a lightweight, portable, and fast deep learning library is fun and interesting, and hope you enjoy using the library.

## Source Code

## Demo APK Download

- [FastPoorNet](

- [SubInception](