CUDA Programming

Prepared by Pengfei Zhang (pfzhang@cse.cuhk.edu.hk)

In this tutorial, we aim to introduce the CUDA compiling environment in the servers of the CSE department. We take GPU18 as an example.

Connect to GPU18 / GPU19

GPU18 and GPU19 are servers provided by the CSE department in which GPUs are installed. You can connect to it just like how you connect linux1~linux15. The following commands are provided in case you forget.

# if you are using cse vpn
ssh cse_account@gpu18

# if you are not using cse vpn, you can connect through gateway.
ssh cse_account@gw.cse.cuhk.edu.hk
# inside gateway machine, type
ssh cse_account@gpu18

I think the default shell of gpu18 and gpu19 should be bash, you can check it by command echo "$SHELL" , if it is not, you can change it to bash by command > bash.

echo "$SHELL"
> bash

Run a demo

Initially, you must want to know the hardware configuration of the GPU server, especially the information of GPUs. We can use the following command to see the GPU information:

nvidia-smi
nvidia-smi output

GPU18 is equipped with 4 GeForce GTX 1080Ti GPUs. Though GTX 1080Ti is not the latest model, its computation resource is still super powerful, for example, it has 3584 Pascal CUDA cores which can deliver 10.6TFlops single-precision performance and it has 11 GB GDDR5X memory.

For more information about the GPUs installed on gpu18 and gpu19, you can use commands nvidia-smi -a and nvidia-smi -h

Now, it is time to introduce how to set up the software environment in the Linux server and fully utilize the GPU computation power. We prepared a demo and you can git clone it in any place at your home directory in GPU18. Since the server is behind a proxy, so you have to set the network proxies as follows:

Then git clone the demo repo:

In this repo, there is an executable file named matrixMulCUBLAS. Before running it, we have to run the following command to change its file permission.

If you are not familiar with linux file permission, we recommend you to do our CSCI-3150 lab of File System.

By the name of the executable file, it is easy to know it evaluates a matrix multiplication operation. The suffix CUBLAS denotes it invokes the highly-tuned cuBLAS library at runtime. Before running the program, we should set an environment variable calledLD_LIBRARY_PATHto tell the program where to find the shared libararies needed at runtime. Notice that you may have set this env variable in other servers before.

At first, we can use the help option to show the help information.

The -device is used to set the ID of GPU, where you want to run the program. By default, matrixMulCUBLAS uses GPU 0, otherwise, it overrides the GPU ID based on what is provided at the command line, for example, if you specify -device=1, then GPU 1 will be chosen. Notice that there might be many other users running GPU programs on the server, it is necessary to know which GPU is available now. The command introduced at the beginning --- nvidia-smi can help. For more information about this tool, you can refer to the official documentation.

nvidia-smi output

The above figure shows that all four GPUs are idle according to the 'Memory-Usage' and 'Volatile GPU-Util' columns. Thus, in this lab, we set -device=1 and we also specify the dimensions of input matrices A and B with -wA=1024 -hA=2048 -wB=2048 -hB=1024.

The above figure shows that this program achieves 8398.15 GFlop/s single precision performance on GPU 1. It almost hits its peak performance.

Compile a GPU program

Now, we will introduce how to compile a GPU program in our Linux server. In the GPUDemo repo, there is a source file named vectorAdd.cu. It performs an addition between two vectors.

In order to compile this source file, another compiler named nvcc is used. nvcc is provided by the GPU vendor Nvidia and it is installed in the directory: /usr/local/cuda/bin/ of GPU18. However, this directory is not included in the PATH environment variable. In Linux, PATH specifies a set of directories where executable programs are located. Thus, we have to use the absolute path of nvcc in the command line to execute it and compile vectorAdd.cu.

We strongly suggest that you use export command to make the directory /usr/local/cuda/bin/ included in the environment variable PATH.

Then, you can simply use nvcc instead of /usr/local/cuda/bin/nvcc to run the compiler. It will benefit you when you want to use other tools installed in this directory, like nvprof. After obtaining the executable file vectorAdd, just run it and have fun.

Kill you zombie jobs

Sometimes your job running on GPU may become a zombie job because of bugs or some other reasons. If so, you should kill your zombie job, to avoid too much occupation of resources. To kill your jobs, you should first check your jobs and get the job id with nvidia-smi

nvidia-smi to show your jobs

We can see from the above figure that I have a job with PID=5777 running on GPU-0. To kill it, you can use the following command

Notice that Killing your zombie jobs is important as it may cause our servers to be down.

Set environment variables in bashrc

We have configured several environment variables using export in above sections. However, whenever you logout and ssh to the GPU machine again, you need to re-set those environment variables. To avoid this, you can set environment variables in ~/.bashrc so that the environment variables are loaded automatically when you use bash.

add the following lines to ~/.bashrc

CUDA Programming

As CUDA is so popular in recent years, there are numerous materials discussing it. I highly recommend you to read the nice CUDA programming tutorials by Mark Harris. He introduced many code optimization tricks of CUDA, which will help you a lot in Asgn1b.

Among those tutorials, you must learn the following:

Last updated

Was this helpful?