CUDA Programming
Prepared by Pengfei Zhang (pfzhang@cse.cuhk.edu.hk)
Last updated
Was this helpful?
Prepared by Pengfei Zhang (pfzhang@cse.cuhk.edu.hk)
Last updated
Was this helpful?
In this tutorial, we aim to introduce the CUDA compiling environment in the servers of the CSE department. We take GPU18 as an example.
GPU18 and GPU19 are servers provided by the CSE department in which GPUs are installed. You can connect to it just like how you connect linux1~linux15. The following commands are provided in case you forget.
I think the default shell of gpu18 and gpu19 should be bash
, you can check it by command echo "$SHELL"
, if it is not, you can change it to bash by command > bash
.
Initially, you must want to know the hardware configuration of the GPU server, especially the information of GPUs. We can use the following command to see the GPU information:
For more information about the GPUs installed on gpu18 and gpu19, you can use commands nvidia-smi -a
and nvidia-smi -h
Now, it is time to introduce how to set up the software environment in the Linux server and fully utilize the GPU computation power. We prepared a demo and you can git clone it in any place at your home directory in GPU18. Since the server is behind a proxy, so you have to set the network proxies as follows:
Then git clone the demo repo:
In this repo, there is an executable file named matrixMulCUBLAS. Before running it, we have to run the following command to change its file permission.
At first, we can use the help
option to show the help information.
The above figure shows that all four GPUs are idle according to the 'Memory-Usage' and 'Volatile GPU-Util' columns. Thus, in this lab, we set -device=1
and we also specify the dimensions of input matrices A and B with -wA=1024 -hA=2048 -wB=2048 -hB=1024
.
The above figure shows that this program achieves 8398.15 GFlop/s single precision performance on GPU 1. It almost hits its peak performance.
In order to compile this source file, another compiler named nvcc
is used. nvcc
is provided by the GPU vendor Nvidia and it is installed in the directory: /usr/local/cuda/bin/
of GPU18. However, this directory is not included in the PATH environment variable. In Linux, PATH specifies a set of directories where executable programs are located. Thus, we have to use the absolute path of nvcc
in the command line to execute it and compile vectorAdd.cu
.
We strongly suggest that you use export
command to make the directory /usr/local/cuda/bin/
included in the environment variable PATH
.
Then, you can simply use nvcc
instead of /usr/local/cuda/bin/nvcc
to run the compiler. It will benefit you when you want to use other tools installed in this directory, like nvprof
. After obtaining the executable file vectorAdd, just run it and have fun.
Sometimes your job running on GPU may become a zombie job because of bugs or some other reasons. If so, you should kill your zombie job, to avoid too much occupation of resources. To kill your jobs, you should first check your jobs and get the job id with nvidia-smi
We can see from the above figure that I have a job with PID=5777 running on GPU-0. To kill it, you can use the following command
Notice that Killing your zombie jobs is important as it may cause our servers to be down.
We have configured several environment variables using export
in above sections. However, whenever you logout and ssh
to the GPU machine again, you need to re-set those environment variables. To avoid this, you can set environment variables in ~/.bashrc
so that the environment variables are loaded automatically when you use bash
.
add the following lines to ~/.bashrc
Among those tutorials, you must learn the following:
GPU18 is equipped with 4 GPUs. Though GTX 1080Ti is not the latest model, its computation resource is still super powerful, for example, it has 3584 Pascal CUDA cores which can deliver 10.6TFlops single-precision performance and it has 11 GB GDDR5X memory.
If you are not familiar with linux file permission, we recommend you to do our of File System.
By the name of the executable file, it is easy to know it evaluates a matrix multiplication operation. The suffix CUBLAS denotes it invokes the highly-tuned library at runtime. Before running the program, we should set an environment variable calledLD_LIBRARY_PATH
to tell the program where to find the shared libararies needed at runtime. Notice that you may have set this env variable in other servers before.
The -device
is used to set the ID of GPU, where you want to run the program. By default, matrixMulCUBLAS uses GPU 0, otherwise, it overrides the GPU ID based on what is provided at the command line, for example, if you specify -device=1
, then GPU 1 will be chosen. Notice that there might be many other users running GPU programs on the server, it is necessary to know which GPU is available now. The command introduced at the beginning --- nvidia-smi
can help. For more information about this tool, you can refer to the .
Now, we will introduce how to compile a GPU program in our Linux server. In the GPUDemo repo, there is a source file named . It performs an addition between two vectors.
As CUDA is so popular in recent years, there are numerous materials discussing it. I highly recommend you to read the nice by Mark Harris. He introduced many code optimization tricks of CUDA, which will help you a lot in Asgn1b.