Week 1

In week 1, we explored setting up PyCUDA on Google Colab and discussed some CUDA concepts like threads, blocks, and grids. We also wrote some basic CUDA C++ code and Python code to invoke it on a GPU.

The code for week 1 can be found in this Google Colab notebook; you can open your own (click "new notebook"). Make sure that you are using the "T4 GPU" runtime when running the cells (you will find it under runtime --> change runtime type). This code compare the runtime of multipying N numbers with and without the use of GPU. N is set to 1000, to make things interesting, try bigger values (1000000 or even 100000000).

!pip install pycuda

import pycuda.driver as cuda

import pycuda.autoinit

from pycuda.compiler import SourceModule

import numpy as np

from time import perf_counter

module = SourceModule('''

__global__ void multiply(float *dest, float *a, float *b) {

int i = blockIdx.x * blockDim.x + threadIdx.x;

dest[i] = a[i] * b[i];

}

''')

multiply = module.get_function("multiply")

N = 1000

a = np.random.randn(N).astype(np.float32)

b = np.random.randn(N).astype(np.float32)

dest = np.zeros_like(a)

t0 = perf_counter()

multiply(cuda.Out(dest), cuda.In(a), cuda.In(b), block=(1024,1,1), grid=((N//1024)+1,1,1))

t1 = perf_counter()

print(f"Multiplied {N} pairs of numbers")

print(f"GPU computation time: {t1 - t0}")

dest2 = np.zeros_like(a)

t0 = perf_counter()

for i in range(N):

dest2[i] = a[i] * b[i]

t1 = perf_counter()

print(f"CPU computation time: {t1 - t0}")

Page updated

Google Sites

Report abuse