Theano with GPU for Deep Neural Networks on python with OSX

Install Theano and CUDA Toolkit 7.5 on OSX

Following these steps you will have the following DNN environment in python 2.7 and OSX 10.1.4

CUDA Toolkit 7.5

The NVIDIA® CUDA® Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing the performance of your applications. You’ll also find programming guides, user manuals, API reference, and other documentation to help you get started quickly accelerating your application with GPUs.

cuDNN 5 Release Candidate

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK.

Theano

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features:

  • tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions.
  • transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
  • efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
  • speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
  • dynamic C code generation – Evaluate expressions faster.
  • extensive unit-testing and self-verification – Detect and diagnose many types of errors.

Installation

1- Make sure pip is using the right version of python

$ python --version
Python 2.7.11 :: Anaconda 2.5.0 (x86_64)
$ pip --version
pip 8.1.1 from /Users/RLO/anaconda/envs/py27/lib/python2.7/site-packages 
(python 2.7)

2- Install nose:

$ pip install nose

2.1 – If you later see error in Theano tests:

ERROR: Failure: ImportError (No module named nose_parameterized)

you should install nose-parametrized

$ pip-3.2 install nose-parameterized

3- Install the latest version of Theano:

$ pip install Theano --upgrade --no-deps git+git://github.com/Theano/Theano.git

4- Install GPU related packages, TWO steps process

4.1 – Install  CUDA Toolkit 7.5.

Download and Install the package of CUDA Toolkit 7.5 from the official link.

4.2- Install cuDNN

Next, we have to register on NVIDIA to be able to download cuDNN, which is a GPU-accelerated library of primitives for deep neural networks.

Manual Step: After downloading, please uncompress the package and copy the header file and libraries to include and lib under the root directory of CUDA Toolkit (e.g. /usr/local/cuda), respectively.

5- Set environment variables for CUDA and cuDNN

There are two ways to add the environment variables for CUDA and cuDNN, pointing paths to /Developer/NVIDIA/CUDA-7.5 or to /usr/local/cuda. I ended up using the first option.

You can edit ~/.bash_profile or add the lines from a command, I prefer to edit the file directly

# add CUDA tools to command path
export CUDA_ROOT=/Developer/NVIDIA/CUDA-7.5
export PATH=$CUDA_ROOT/bin:${PATH}
export CPATH=$CUDA_ROOT/include:${CPATH}
export LIBRARY_PATH=$CUDA_ROOT/lib:$LIBRARY_PATH
export DYLD_LIBRARY_PATH=$CUDA_ROOT/lib:$DYLD_LIBRARY_PATH
export LD_LIBRARY_PATH=$CUDA_ROOT/lib:$LD_LIBRARY_PATH

Reload your bash_profile to validate the settings right away

$ echo  >> ~/.bash_profile

6- Optional: Install PyCUDA

PyCUDA lets you access Nvidia‘s CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist.

$ pip install pycuda

7- Test Theano

To make sure that Theano is using CDNN and GPU and everything works correctly you should run the test with the following command (processing can take some time first time is executed, up to a couple of hours..)

$ python -c "import theano; theano.test(verbose=3)"

You should see “Using gpu device” in the first lines.

Using gpu device 0: GeForce GT 650M (CNMeM is disabled, cuDNN 5004)

Note: Depending on your video card GPU or external GPU you cans see some errors when memory is trying to be allocated

Go to energy settings and turn off automatic graphics card switching to turn your NVIDIA card on.

8- Testing usage

Now, we can run a test code to see if the Theano works as expected.

python code

test.py:

from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

Let’s run this code on CPU and GPU, separately.

CPU case:

$ THEANO_FLAGS='device=cpu' python test.py
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 14.474722 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu

GPU case:

$ THEANO_FLAGS='device=gpu' python test.py
Using gpu device 0: GeForce GTX 660M (CNMeM is disabled)
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.517552 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu

Please note that Theano will have to compile the python code to generate C++/CUDA code when executing with GPU for the first time. Thus, the results shown above came from the execution of the second time.

Finally, it can be observed that the runtime is greatly reduced when GPU is used : )

ipython notebook

You can test the usage of GPU with the following ipython notebook

9- Optional not necessary in most cases: Configure Theano and GPU usage

Theano does not create any configuration file by itself, but have default value for all its configuration flags. You only need such a file if you want to modify the default values.

You can create the file ~/.theanorc  in your home directory to modify the default Theano settings for GPU usage

See this page for more details: http://deeplearning.net/software/theano/library/config.html

For example my ~/.theanorc is the following

[global]
floatX = float32
device = gpu
force_device = True
optimizer_including=cudnn

[nvcc]
fastmath = True

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s