Build Tensorflow from source in Google Colab

Filip Drapejkowski
4 min readApr 27, 2021

Spoiler alert: the build takes many hours and I wast try to complete it everyday for a week with various javascript hacks, but always colab died before it finished. No error messages though.

So this article is unfinished, but I keep it for potential future reference.

I figured it would be nice to understand Tensorflow in depth. For that building it from scratch could be a nice starting point. So here comes a journal of my struggles with the config and compilation. Feel free to use the resulting notebook (colab | github) and only reading the stuff below if you find yourself in stuck at some point.

Let’s first get rid of existing installation, so that at least we know if we failed or succeeded after we’re done:

!pip uninstall tensorflow -y
import tensorflow as tf
print(tf.__version__)
# ModuleNotFoundError: No module named 'tensorflow'
# We want this error ^, it's cool.

Ok. Let’s get tensorflow sources and try to build them according to their docs:

!git clone https://github.com/tensorflow/tensorflow.git
%cd tensorflow
!ls
!./configure
# Cannot find bazel. Please install bazel.

Ok, we need their custom build system — bazel. Let’s follow the docs:
https://docs.bazel.build/versions/master/install-ubuntu.html

!sudo apt install apt-transport-https curl gnupg
!curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor > bazel.gpg
!sudo mv bazel.gpg /etc/apt/trusted.gpg.d/
!echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
!sudo apt update && sudo apt install bazel
!./configure

Cool, now the configure build run! Nevertheless jupyter (hence colab as well) hates environmental variables, because each bash command with ! mark essentially runs a new shell. So all this script does might be ignored by our environment. Well, let’s just ignore it for now and try to run the build and see what’s missing.

!bazel build --config=cuda [--config=option] //tensorflow/tools/pip_package:build_pip_package
#ERROR: The project you're trying to build requires Bazel 3.7.2 (specified in /content/tensorflow/.bazelversion), but it wasn't found in /usr/bin.

Ok, one step back. We have bazel 4.0.0, but tf wants specifically 3.7.2.
Btw this will probably change in the future, so you should read the error message and copy the version from there.

!sudo apt install bazel-3.7.2

*Note: for some weird reason installing 4.0.0 and downgrading works OK, but if I get 3.7.2 directly it’s not added to path.
After rerunning bazel build:

#ERROR: Skipping '[--config=option]': no such target '//:[--config=option]': target '[--config=option]' not declared in package '' defined by /content/tensorflow/BUILD
#ERROR: no such target '//:[--config=option]': target '[--config=option]' not declared in package '' defined by /content/tensorflow/BUILD

Ok, it seems that !./configure does much more than setting environmental vars. It’s interactive, so I’m pasting the answers I was giving to it (your mileage may vary, although colab should give us some unification).

/usr/bin/python3
/usr/lib/python3/dist-packages
N
y
y
(empty for CUDA 10)
(empty for cuDNN 7)
(empty for TensorRT 6)
(empty for default nvidia nccl)
(empty for default CUDA libs and header paths)
#fails with: Could not find any cuda.h matching version 'CUDA 11' in any subdirectory
# let's try to find out our cuda version(s):
!nvcc --version
# Cuda compilation tools, release 11.0, V11.0.221
# Build cuda_11.0_bu.TC445_37.28845127_0
# Then tried CUDA 11, cuda-11.0, cuda-10.1, CUDA 10.1 etc but failed!find / -name cuda.h

Ok, let’s look for cuda.h manually:

!find / -name cuda.h#results:
/root/.cache/bazel/_bazel_root/889612a75a81b3d8b4ed860522ba4e34/external/tf_runtime/third_party/rules_cuda/test/cuda.h
/usr/local/lib/python3.7/dist-packages/pystan/stan/lib/stan_math/lib/boost_1.69.0/boost/predef/language/cuda.h
/usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include/torch/cuda.h
/usr/local/lib/python2.7/dist-packages/tensorflow_core/include/tensorflow/core/platform/cuda.h
/usr/local/lib/python2.7/dist-packages/pystan/stan/lib/stan_math/lib/boost_1.69.0/boost/predef/language/cuda.h
/usr/local/lib/python2.7/dist-packages/torch/include/torch/csrc/api/include/torch/cuda.h
/usr/local/cuda-10.0/targets/x86_64-linux/include/cuda.h
/usr/local/cuda-11.0/targets/x86_64-linux/include/cuda.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/cuda.h
/usr/src/linux-headers-4.15.0-140/include/uapi/linux/cuda.h
/usr/src/linux-headers-4.15.0-140/include/linux/cuda.h
/usr/lib/R/site-library/BH/include/boost/predef/language/cuda.h
/usr/include/hwloc/cuda.h
/usr/include/linux/cuda.h
/content/tensorflow/tensorflow/core/platform/cuda.h
/tensorflow-1.15.2/python2.7/tensorflow_core/include/tensorflow/core/platform/cuda.h
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/platform/cuda.h

Tried “/usr/local/cuda-10.0/targets/x86_64-linux/include/” when .configure asks for list of paths, the error is now:

Could not find any libcudart.so.10* in any subdirectory

Tried

sudo apt-get install --no-install-recommends \
cuda-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0

But it seems that by default on colab:

cuda-11-0 is already the newest version (11.0.3-1).
libcudnn8-dev is already the newest version (8.0.4.30-1+cuda11.0).
libcudnn8 is already the newest version (8.0.4.30-1+cuda11.0).
0 upgraded, 0 newly installed, 0 to remove and 76 not upgraded.

Ok. The only tutorials I could find actually used only CUDA support — but no TensorRT, or Cudnn. The CUDA compute capabilities were: 6.1. And this actually works (which is kinda sad, because I don’t know what’s wrong with previous setup)!

So on ./configure you just click on the text field and hit Enter except for 2 questions:

Do you wish to build TensorFlow with CUDA support? [y/N]: y
Please specify a list of comma-separated CUDA compute capabilities you want to build with. (...): 6.1

Than configure finishes without crash. But the bazel build still crashes with:

ERROR: Skipping '[--config=option]': no such target '//:[--config=option]': target '[--config=option]' not declared in package '' defined by /content/tensorflow/BUILD
WARNING: Download from https://mirror.bazel.build/github.com/bazelbuild/rules_cc/archive/40548a2974f1aea06215272d9c2b47a14a24e556.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
ERROR:
no such target '//:[--config=option]': target '[--config=option]' not declared in package '' defined by /content/tensorflow/BUILD

Btw, a side note: they are apparently using GLOG for error/debug/info messages. Which is cool, we know GLOG from caffe (it originates in Google though).

Ok, lol, ‘[ — config=option] stuff was just a placeholder in the example in the tutorial (mixed with real value ‘cuda’ though)

Let’s build with:

!bazel build --config=cuda //tensorflow/tools/pip_package:build_pip_package

And the real build begins and doesn’t crash for few minutes which is promising! 4h later it’s still running…

--

--