How to train a Deep Neural Network using only TensorFlow C++

Math rendering...

As you may know the core of TensorFlow (TF) is built using C++, yet lots of conveniences are only available in the python API.

When I wrote the last blog post, my goal was to implement the same basic Deep Neural Network (DNN) using only the TF C++ API and then using only CuDNN. As I began with the TF C++ version I realized that lots of things were missing even for my simple DNN.

Keep in mind that it will certainly not be possible to train a network using exotic operations. The most likely error that you will face is a missing gradient operation. I am currently working on migrating the gradient operations from Python to C++ so if you have a specific need write a comment and I'll be happy to help.

In this blog post we will build a Deep Neural Network, the one described here, and try to predict the price of a BMW Serie 1 using its age, number of kilometers and type of fuel. We will use TensorFlow only in C++. The goal is to complete the guide explaining how to use TF in C++ with the missing details about the training part. There are currently no Optimizers in C++ so you'll see that the training code is less sexy but they will be added at some point in the future.

This post completes this guide from Google so you should read it first, all the code written is available on Github.

Setup

We will run our C++ code from inside the TensorFlow C++ code, we could try to use a compiled library but I am sure that some people will run into troubles due to the specifities of their environment. Building TensorFlow from scratch will avoid theses problems and also ensure that we are using the last version of the API.

You need to install the bazel build tool and that's all. You have to follow these instructions regarding you OS. On OSX using brew is enough:

brew install bazel

As we will build from sources, you also need the tensorflow sources:

mkdir /path/tensorflow
cd /path/tensorflow
git clone https://github.com/tensorflow/tensorflow.git

Then you have to configure your installation, i.e choose GPU or not, ... To do that run the configure script:

cd /path/tensorflow
./configure

Now we will create the files that will receive the code of our model and build TF for the first time. Be aware that the first build will take quite some time (10 - 15 min).

The non core C++ TF code lives in /tensorflow/cc, this is where we will create our model files, we also need a BUILD file so that bazel can build model.cc.

mkdir /path/tensorflow/model
cd /path/tensorflow/model
touch model.cc
touch BUILD

We add the bazel instructions into the BUILD file:

load("//tensorflow:tensorflow.bzl", "tf_cc_binary")

tf_cc_binary(
    name = "model",
    srcs = [
        "model.cc",
    ],
    deps = [
        "//tensorflow/cc:gradients",
        "//tensorflow/cc:grad_ops",
        "//tensorflow/cc:cc_ops",
        "//tensorflow/cc:client_session",
        "//tensorflow/core:tensorflow"
    ],
)

Basically it will build a model binary using model.cc. We are now ready to code our model.

Reading the data

If you remember, the data are scrapped from leboncoin.fr, a French website, than cleaned and normalized before to be saved into a CSV file. Our goal is then to read these data. The metadata used to normalize the data are saved as the first line of the CSV file, we will need them to reconstruct the price from the output of our network. I created a data_set.h and data_set.cc file to keep the code clean. They produce a two dimensions array of float from the CSV file that we will use to feed our network. I paste the code here but it is irrelevant so don't loose too much time reading it.

data_set.h

using namespace std;

// Meta data used to normalize the data set. Useful to
// go back and forth between normalized data.
class DataSetMetaData {
friend class DataSet;
private:
  float mean_km;
  float std_km;
  float mean_age;
  float std_age;
  float min_price;
  float max_price;
};

enum class Fuel {
    DIESEL,
    GAZOLINE
};

class DataSet {
public:
  // Construct a data set from the given csv file path.
  DataSet(string path) {
    ReadCSVFile(path);
  }

  // getters
  vector<float>& x() { return x_; }
  vector<float>& y() { return y_; }

  // read the given csv file and complete x_ and y_
  void ReadCSVFile(string path);

  // convert one csv line to a vector of float
  vector<float> ReadCSVLine(string line);

  // normalize a human input using the data set metadata
  initializer_list<float> input(float km, Fuel fuel, float age);

  // convert a price outputted by the DNN to a human price
  float output(float price);
private:
  DataSetMetaData data_set_metadata;
  vector<float> x_;
  vector<float> y_;
};

data_set.cc

#include <vector>
#include <fstream>
#include <sstream>
#include <iostream>
#include "data_set.h"

using namespace std;

void DataSet::ReadCSVFile(string path) {
  ifstream file(path);
  stringstream buffer;
  buffer << file.rdbuf();
  string line;
  vector<string> lines;
  while(getline(buffer, line, '\n')) {
    lines.push_back(line);
  }

  // the first line contains the metadata
  vector<float> metadata = ReadCSVLine(lines[0]);

  data_set_metadata.mean_km = metadata[0];
  data_set_metadata.std_km = metadata[1];
  data_set_metadata.mean_age = metadata[2];
  data_set_metadata.std_age = metadata[3];
  data_set_metadata.min_price = metadata[4];
  data_set_metadata.max_price = metadata[5];
  
  // the other lines contain the features for each car
  for (int i = 2; i < lines.size(); ++i) {
    vector<float> features = ReadCSVLine(lines[i]);
    x_.insert(x_.end(), features.begin(), features.begin() + 3);
    y_.push_back(features[3]);
  }
}

vector<float> DataSet::ReadCSVLine(string line) {
  vector<float> line_data;
  std::stringstream lineStream(line);
  std::string cell;
  while(std::getline(lineStream, cell, ','))
  {
    line_data.push_back(stod(cell));
  }
  return line_data;
}

initializer_list<float> DataSet::input(float km, Fuel fuel, float age) {
  km = (km - data_set_metadata.mean_km) / data_set_metadata.std_km;
  age = (age - data_set_metadata.mean_age) / data_set_metadata.std_age;
  float f = fuel == Fuel::DIESEL ? -1.f : 1.f;
  return {km, f, age};
}

float DataSet::output(float price) {
  return price * (data_set_metadata.max_price - data_set_metadata.min_price) + data_set_metadata.min_price;
}

We must also add these two files in our bazel BUILD file.

load("//tensorflow:tensorflow.bzl", "tf_cc_binary")

tf_cc_binary(
    name = "model",
    srcs = [
        "model.cc",
        "data_set.h",
        "data_set.cc"
    ],
    deps = [
        "//tensorflow/cc:gradients",
        "//tensorflow/cc:grad_ops",
        "//tensorflow/cc:cc_ops",
        "//tensorflow/cc:client_session",
        "//tensorflow/core:tensorflow"
    ],
)

Building the model

The first step is to read the CSV file into two tensors, x for the input, y for the expected results. We use the DataSet class defined previously. You can download the CSV data set here.

DataSet data_set("/path/normalized_car_features.csv");
Tensor x_data(DataTypeToEnum<float>::v(), 
              TensorShape{static_cast<int>(data_set.x().size())/3, 3});
copy_n(data_set.x().begin(), data_set.x().size(),
       x_data.flat<float>().data());

Tensor y_data(DataTypeToEnum<float>::v(), 
              TensorShape{static_cast<int>(data_set.y().size()), 1});
copy_n(data_set.y().begin(), data_set.y().size(), 
       y_data.flat<float>().data());

To define a Tensor we need its type and its shape. In the data_set object, the x data are saved in a flat way, that's why we cut the size by 3 (each car has 3 features). Then we are using std::copy_n to copy the data from our data_set object to the underlying data structure of the Tensor (a Eigen::TensorMap). We now have our data as a TensorFlow data structure. Let's build our model.

You can easily debug a Tensor using:

LOG(INFO) << x_data.DebugString();

A partucularity of the C++ API is that you will need a Scope object to hold the state of the graph construction and this object will be passed to each operation.

Scope scope = Scope::NewRootScope();

We'll have two placeholders, x that will contain the car features and y the corresponding prices for each car.

auto x = Placeholder(scope, DT_FLOAT);
auto y = Placeholder(scope, DT_FLOAT);

Our network has two hidden layers, consequently we'll have three weights matrices and three biases matrices. Whereas in Python it is done under the hood, in C++ you'll have to define a Variable then define an Assign node in order to assign a default value to that variable. We will init our Variable using RandomNormal, which will give us random values from a normal distribution.

// weights init
auto w1 = Variable(scope, {3, 3}, DT_FLOAT);
auto assign_w1 = Assign(scope, w1, RandomNormal(scope, {3, 3}, DT_FLOAT));

auto w2 = Variable(scope, {3, 2}, DT_FLOAT);
auto assign_w2 = Assign(scope, w2, RandomNormal(scope, {3, 2}, DT_FLOAT));

auto w3 = Variable(scope, {2, 1}, DT_FLOAT);
auto assign_w3 = Assign(scope, w3, RandomNormal(scope, {2, 1}, DT_FLOAT));

// bias init
auto b1 = Variable(scope, {1, 3}, DT_FLOAT);
auto assign_b1 = Assign(scope, b1, RandomNormal(scope, {1, 3}, DT_FLOAT));

auto b2 = Variable(scope, {1, 2}, DT_FLOAT);
auto assign_b2 = Assign(scope, b2, RandomNormal(scope, {1, 2}, DT_FLOAT));

auto b3 = Variable(scope, {1, 1}, DT_FLOAT);
auto assign_b3 = Assign(scope, b3, RandomNormal(scope, {1, 1}, DT_FLOAT));

Then we build our three layers using Tanh as activation function.

// layers
auto layer_1 = Tanh(scope, Add(scope, MatMul(scope, x, w1), b1));
auto layer_2 = Tanh(scope, Add(scope, MatMul(scope, layer_1, w2), b2));
auto layer_3 = Tanh(scope, Add(scope, MatMul(scope, layer_2, w3), b3));

We add a L2 regularization.

// regularization
auto regularization = AddN(scope,
                         initializer_list<Input>{L2Loss(scope, w1),
                                                 L2Loss(scope, w2),
                                                 L2Loss(scope, w3)});

Finally we calculate the loss, the differences between our predictions and the real price y and we add our regularization to this loss.

// loss calculation
auto loss = Add(scope,
                ReduceMean(scope, Square(scope, Sub(scope, layer_3, y)), {0, 1}),
                Mul(scope, Cast(scope, 0.01,  DT_FLOAT), regularization));

At this point we are done with the forward propagation and ready to start the backpropagation part. The first step is to add the gradients of the forward operations to the graph using one function call.

// add the gradients operations to the graph
std::vector<Output> grad_outputs;
TF_CHECK_OK(AddSymbolicGradients(scope, {loss}, {w1, w2, w3, b1, b2, b3}, &grad_outputs));

All the operations necessary to compute the gradient from loss with regards to each variable are added to the graph, we initialized an empty grad_outputs vector, it will be filled with nodes that give us the gradient for a Variable when used in a TensorFlow session, grad_outputs[0] will give us the gradient of loss w.r.t w1, grad_outputs[1] the grad of loss w.r.t w2, it follows the order {w1, w2, w3, b1, b2, b3}, the ordering of the variables passed to AddSymbolicGradients.

Now we have a list of nodes in grad_outputs. When used in a TensorFlow session, each node computes a gradient of loss w.r.t a variable. We should then use it to update the variable. We'll have a line for each Variable, we are using Gradient Descent which is the simplest kind of update.

// update the weights and bias using gradient descent
auto apply_w1 = ApplyGradientDescent(scope, w1, Cast(scope, 0.01,  DT_FLOAT), {grad_outputs[0]});
auto apply_w2 = ApplyGradientDescent(scope, w2, Cast(scope, 0.01,  DT_FLOAT), {grad_outputs[1]});
auto apply_w3 = ApplyGradientDescent(scope, w3, Cast(scope, 0.01,  DT_FLOAT), {grad_outputs[2]});
auto apply_b1 = ApplyGradientDescent(scope, b1, Cast(scope, 0.01,  DT_FLOAT), {grad_outputs[3]});
auto apply_b2 = ApplyGradientDescent(scope, b2, Cast(scope, 0.01,  DT_FLOAT), {grad_outputs[4]});
auto apply_b3 = ApplyGradientDescent(scope, b3, Cast(scope, 0.01,  DT_FLOAT), {grad_outputs[5]});

The Cast operation is in fact the learning rate parameter, 0.01 in our case.

Our network is ready to be launched in a Session, the minimize function from the Optimizers API in python basically wraps the computing and applying of the gradients in a function call. That's what I did in the PR #11377 and probably what will be done when the Optimizer API will be ported in C++.

We init a ClientSession and a vector of Tensor named outputs that will receive the output of our network.

ClientSession session(scope);
std::vector<Tensor> outputs;

Then we initialize our variables, in python a call to tf.global_variables_initializer() is enough because a list of all the variables is kept during the graph construction. In C++ we must list our variables. Each RandomNormal output will be assigned to the defined variable in the Assign node.

// init the weights and biases by running the assigns nodes once
TF_CHECK_OK(session.Run({assign_w1, assign_w2, assign_w3, assign_b1, assign_b2, assign_b3}, nullptr));

At this point we can loop on the number of training steps. In our case we will make 5000 steps. The first step is to run the forward propagation part using the loss node, the output will be the loss of our network. Every 100 steps we are logging the loss value, a decreasing loss is a mandatory attribute of a working network. Then we must compute our gradient nodes and update our variables. If you remember our gradient nodes have been used as input of the ApplyGradientDescent nodes, so running our apply_ nodes will first compute the gradient then apply it to the correct variable.

// training steps
for (int i = 0; i < 5000; ++i) {
  TF_CHECK_OK(session.Run({{x, x_data}, {y, y_data}}, {loss}, &outputs));
  if (i % 100 == 0) {
    std::cout << "Loss after " << i << " steps " << outputs[0].scalar<float>() << std::endl;
  }
  // nullptr because the output from the run is useless
  TF_CHECK_OK(session.Run({{x, x_data}, {y, y_data}}, {apply_w1, apply_w2, apply_w3, apply_b1, apply_b2, apply_b3, layer_3}, nullptr));
}

At this point our network is trained, we can try to predict the price of a car, also known as inference. We will try to predict the price of a 7 years old BMW Serie 1, with a mileage of 110 000 kilometers using a Diesel engine. To do that we run our layer_3 node using the car data as input x, it is basically a forward propagation step. Because we previously trained our network for 5000 steps, the weights have a learned value and the produced result will not be random.

We can't use the car attributes directly because our network learned from normalized attributes, they must go trough the same normalization process. Our DataSet class has an input method that takes care of that step using the data set metadata loaded during the CSV reading.

// prediction using the trained neural net
TF_CHECK_OK(session.Run({{x, {data_set.input(110000.f, Fuel::DIESEL, 7.f)}}}, {layer_3}, &outputs));
cout << "DNN output: " << *outputs[0].scalar<float>().data() << endl;
std::cout << "Price predicted " << data_set.output(*outputs[0].scalar<float>().data()) << " euros" << std::endl;

Our network produces a value between 0 and 1, the data_set output method also takes care of converting that value back to a human readable price using the data set metadata. The model can be run using the command bazel run -c opt //tensorflow/cc/models:model, if TensorFlow has been compiled recently, you will quickly see this kind of output:

Loss after 0 steps 0.317394
Loss after 100 steps 0.0503757
Loss after 200 steps 0.0487724
Loss after 300 steps 0.047366
Loss after 400 steps 0.0460944
Loss after 500 steps 0.0449263
Loss after 600 steps 0.0438395
Loss after 700 steps 0.0428183
Loss after 800 steps 0.041851
Loss after 900 steps 0.040929
Loss after 1000 steps 0.0400459
Loss after 1100 steps 0.0391964
Loss after 1200 steps 0.0383768
Loss after 1300 steps 0.0375839
Loss after 1400 steps 0.0368152
Loss after 1500 steps 0.0360687
Loss after 1600 steps 0.0353427
Loss after 1700 steps 0.0346358
Loss after 1800 steps 0.0339468
Loss after 1900 steps 0.0332748
Loss after 2000 steps 0.0326189
Loss after 2100 steps 0.0319783
Loss after 2200 steps 0.0313524
Loss after 2300 steps 0.0307407
Loss after 2400 steps 0.0301426
Loss after 2500 steps 0.0295577
Loss after 2600 steps 0.0289855
Loss after 2700 steps 0.0284258
Loss after 2800 steps 0.0278781
Loss after 2900 steps 0.0273422
Loss after 3000 steps 0.0268178
Loss after 3100 steps 0.0263046
Loss after 3200 steps 0.0258023
Loss after 3300 steps 0.0253108
Loss after 3400 steps 0.0248298
Loss after 3500 steps 0.0243591
Loss after 3600 steps 0.0238985
Loss after 3700 steps 0.0234478
Loss after 3800 steps 0.0230068
Loss after 3900 steps 0.0225755
Loss after 4000 steps 0.0221534
Loss after 4100 steps 0.0217407
Loss after 4200 steps 0.0213369
Loss after 4300 steps 0.0209421
Loss after 4400 steps 0.020556
Loss after 4500 steps 0.0201784
Loss after 4600 steps 0.0198093
Loss after 4700 steps 0.0194484
Loss after 4800 steps 0.0190956
Loss after 4900 steps 0.0187508
DNN output: 0.0969611
Price predicted 13377.7 euros

It shows a predicted price of 13377.7 euros for our car. Running the model several times will give different results, sometimes really different like 8000€ vs. 17000€. This is explained by the fact that we are using only three attributes to describe a car and our network has a really simple architecture.

As I said before, the C++ API is a work in progress, we can expect easier to use methods in the future. If you see a way to improve this post or a mistake, please leave a comment.