Resources:
- Official Base Page: https://developer.nvidia.com/tensorrt
- Official User Guide: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/index.html
- Webminar: Introduction to TensorRT. http://on-demand.gputechconf.com/gtcdc/2017/video/DC7172
- My Codes involving TensorRT
Introduction
You are most probably familiar with deep learning frameworks like Tensorflow, Pytorch, mxnet etc. These frameworks are general purpose tools geared towards learning a model from data. These are great for research prototyping but not tailored for deployment. A lot of opensource code is available which develop various applications using the above frameworks. However, when it comes to deploying the trained models using these could be a sub-optimal solution. Although an overwhelming majority of people will still use (atleast for next 2-3 years) these frameworks even for deployment. Although in theory it is possible to install tensorflow-gpu on this device, but so far I have not been able to compile tensorflow on aarch64 (TX2). I was suggested to use TensorRT instead which comes pre-installed on TX2.
Nvidia has come up with TensorRT which is an inference engine. It is a high performance runtime inference engine, which gives maximum GPU performance from server GPUs to embedded GPUs like Jetson. Particularly they developed libnvinfer
, which is a cuda based library geared for scalable inference. I am trying to get TensorRT working on the DJI Manifold 2 (Nvidia TX2). It is claimed that TensorRT roughly is 10x-40x faster than tensorflow models. For a quick info on understanding what is TensorRT, I recommend the official webminar (30min) from Nvidia [Link].

Running the Official Samples
I good first step is to get the official samples working correctly. On my device (DJI Manifold-2G aka Nvidia TX2), they were found at location `/usr/src/tensorrt/`. Copy this whole folder to your home directory. These samples demonstrate the C++ API of TensorRT.
$ cd /usr/src/tensorrt/
$ ls
bin/ data/ samples/
$ cp -r tensorrt $HOME
$ cd $HOME/tensorrt/samples/
$ make all
This hopefully should compile all the samples. The executable are generated in the bin
directory. Now if you go to bin directory and say try to execute sample_mnist
, you will see the program crash.
ERROR: cudnnEngine.cpp (56) - Cuda Error in initializeCommonContext: 4
ERROR: cudnnEngine.cpp (56) - Cuda Error in initializeCommonContext: 4
sample_mnist: sampleMNIST.cpp:63: void caffeToGIEModel(const string&, const string&, const std::vector<std::__cxx11::basic_string<char> >&, unsigned int, nvinfer1::IHostMemory*&): Assertion `engine' failed.
Aborted (core dumped)
On looking at the code, it is easy to see that the program assumes you are in the corresponding data directory. Also the program crashes on not using sudo. I haven’t figured the ‘why’, but if any reader has info on this, please do comment. Something like the following works:
dji@manifold2:~/tensorrt_officialsamples/bin$ cd ../data/mnist/
dji@manifold2:~/tensorrt_officialsamples/data/mnist$ sudo ../../bin/sample_mnist
---------------------------
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@%=#@@@@@%=%@@@@@@@@@@
@@@@@@@ %@@@@@@@@@
@@@@@@@ %@@@@@@@@@
@@@@@@@#:-#-. %@@@@@@@@@
@@@@@@@@@@@@# #@@@@@@@@@@
@@@@@@@@@@@@@ #@@@@@@@@@@
@@@@@@@@@@@@@: :@@@@@@@@@@@
@@@@@@@@@%+== *%%%%%%%%%@@
@@@@@@@@% -@
@@@@@@@@@#+. .:-%@@
@@@@@@@@@@@* :-###@@@@@@
@@@@@@@@@@@* -%@@@@@@@@@@@
@@@@@@@@@@@* *@@@@@@@@@@@@
@@@@@@@@@@@* @@@@@@@@@@@@@
@@@@@@@@@@@* #@@@@@@@@@@@@
@@@@@@@@@@@* *@@@@@@@@@@@@
@@@@@@@@@@@* *@@@@@@@@@@@@
@@@@@@@@@@@* @@@@@@@@@@@@@
@@@@@@@@@@@* @@@@@@@@@@@@@
@@@@@@@@@@@@+=#@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
0:
1:
2:
3:
4:
5:
6:
7: **********
8:
9:
If you could get something like this, congratulations, your tensorrt is working correctly…I highly recommend you to read the code of sample_mnist
. It demonstrate the toy-mnist example of digit-image classification, deployed using the tensorrt’s C++ API. To get to know how it works, read here. Lot of you could be like me, and more interested to use the Python API, read-on to know about my experience on it.
TensorRT Python API
For Jetson devices, python-tensorrt is available with jetpack4.2. See here for info. So for my device, as of may 2019, C++ is the only was to get tensorRT model deployment.
TensorRT C++ API
While there are several ways to specify the network in TensorRT, my desired usage is that, I wish to use my pretrained keras model with TensorRT. If you are familiar with keras, then you know that a model can be built with Sequential API or the Functional API. In both cases the model is of the type keras.models.Model
. Yet another way is to load a pretrained model from .h5 file or from .json file. In TensorRT there is a UFF Parser, which can load a .uff file. UFF is the Nvidia’s network and weights definition file format. One can write keras model to UFF through tensorflow’s intermediate .pb (proto-binary) format.
keras —> .pb —> .uff —> load with UFFParser on TX2
Step-1: Keras Model to Tensorflow Proto-binary (.pb)
Using tensorflow and keras it is possible to produce a .pb file. See details in the next step.
Step-2: Proto-binary (.PB) to Nvidia’s .uff
https://github.com/mpkuse/cartwheel_train/blob/master/test_kerasmodel_to_pb.py
You could also see this for a minimalist demo.
To run this script, you need to full TensorFlow (atleast tf1.12) as well TensorRT (I used 5.1) on your x86 computer. Note that, installing full tensorflow is not recommended on the TX2 device. Alternately you may use my docker image which can already run this script. Simply clone the cartwheel_train
repo and run this script. Make sure to adjust the paths before running the script. You need to note down the input and output tensor name which this script outputs. This is needed for the UFFParser.
$(host) mkdir $HOME/docker_ws
$(host) docker run --runtime=nvidia -it -v $HOME/docker_ws:/app mpkuse/kusevisionkit:tfgpu-1.12-tensorrt-5.1 bash
$(docker) cd /app
$(docker) git clone https://github.com/mpkuse/cartwheel_train
$(docker) cd cartwheel_train #make sure you adjust the path in the script before executing
$(docker) test_kerasmodel_to_pb.py
Load model_json_fname: models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss//model.json
Load JSON file: models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss//model.json
Load Weights: models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss//core_model.1000.keras
**Converted output node names are: [u'net_vlad_layer_1/l2_normalize_1']**
Saved the graph definition in ascii format at models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss//output_model.pbtxt
Saved the freezed graph at models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss//output_model.pb
Finished....OK!
Now do
cd models.keras/May2019/centeredinput-m1to1-240x320x1__mobilenet-conv_pw_7_relu__K16__allpairloss/
convert-to-uff output_model.pbtxt
$(docker) convert output_model.pbtxt # this will produce the .uff file.
Step-3: TensorRT Load .uff with UFFParser C++ API
I adopted a standalone example from the official samples. It loads a pretrained MNIST model in uff format. It works on TX2.
https://github.com/mpkuse/tx2_whole_image_desc_server/tree/master/standalone
Hi there,
We’ve had the same problem running TensorRT programs without sudo. In our case, it appears that some permissions were not set correctly, I guess because we didn’t do a proper flash of the OS using Jetpack.
Running the following command should fix this. (Found out by running the program using strace -f.)
sudo chown -R nvidia:nvidia /home/nvidia/.nv/
We also had some issue running TensorRT from another user (not nvidia), I believe we resolved that by adding that user to the ‘video’ group.
Hope this helps you!