0% found this document useful (0 votes)
22 views36 pages

Divy HPC

Uploaded by

Meet Panchal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views36 pages

Divy HPC

Uploaded by

Meet Panchal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

FACULTY OF ENGINEERING AND TECHNOLOGY

BACHLOR OF TECHNOLOGY

HIGH PERFORMANCE COMPUTING


(203105430)

SEMESTER VI

Computer Science & Engineering


Department

Laboratory Manual
CERTIFICATE

This is to certify that Mr. Divy Patel with enrollment

no.210303105288 has successfully completed his laboratory

experiments in the High Performance Computing (203105430)

from the department of Computer Science & Engineering during

the academic year 2023-24.

Date of Submission: ......................... Staff In charge: ...........................

Head Of Department: ...........................................


COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
HIGH PERFORMANCE COMPUTING (203105430) B. Tech. 3rdYEAR
ENROLLMENT NO: 210303105356

TABLE OF CONTENT

Page No
Date of Marks
Sr. Date of
Experiment Title Comple Sign (out of
No Start
tion 10)
From To

Study the facilities provided by


1. Google Colab.

Demonstrate basic Linux


2.
Commands.

Using Divide and Conquer Strategies


3. design a class for Concurrent Quick
Sort using C++.
Write a program on an unloaded
cluster for several different numbers
4. of nodes and record the time taken in
each case. Draw a graph of execution
time against the number of nodes.

Write a program to check task


5.
distribution using Gprof.

Use Intel V-Tune Performance


6.
Analyzer for Profiling.

Analyze the code using Nvidia-


7. Profilers.

Write a program to perform load


8. distribution on GPU using CUDA.

Write a simple CUDA program to


9. print “Hello World!”

10. Write a CUDA program to add two


arrays.

ii| Page
PRACTICAL- 1
Aim: Study the facilities provided by Google Colab.
What is google collab?
Collaboratory, or “Colab” for short, is a product from Google Research. Colab allows
anybody to writeand execute arbitrary Python code through the browser and is especially well
suited to machine learning, data analysis, and education. More technically, Colab is a hosted
Jupyter notebook service that requires no setup to use, while providing access free of charge to
computing resources includingGPUs.
As a programmer, you can perform the following using Google Colab.
• Write and execute code in Python
• Document your code that supports mathematical equations
• Create/Upload/Share notebooks
• Import/Save notebooks from/to Google Drive
• Import/Publish notebooks from GitHub
• Import external datasets e.g., from Kaggle
• Integrate PyTorch, TensorFlow, Keras, OpenCV
• Free Cloud service with free GPU
Google Colab makes data science, deep learning, neural network, and machine learning
accessible toindividual researchers who cannot afford costly computational infrastructure.

Why should you choose google colab?

Google Co-laboratory is a cloud-based tool. You can start coding fantastic ML and data
science models using a Chrome browser. Colab is free of charge with limited resources.
However, you should not expect that you can store your artificial intelligence or machine
learning models indefinitely on Colab’sfree infrastructure.If you know working on Jupyter, you
need not go through any learning curve on Google Colaboratory. Free access to GPUs and
TPUs for extensive data science and machine learning models. It comes with pre-installed and
popular data science libraries. Coders can easily share the codenotebook with collaborators for
real-time coding. Since Google hosts the notebook on Google Cloud, you do not need to worry
about code document version control and storage. Easily integrates with GitHub. You can train
AI using images. You can also train models on audio and text. Researchers can also run
TensorFlow programs on Colab.

210303105288 Page | 1
• Features: Google Colab, also known as Google Colaboratory, is a cloud-based platform
that provides a free Jupyter Notebook environment with integrated support for Python
programming.
• Free Cloud Computing: Google Colab offers free access to powerful GPUs and TPUs
(Tensor ProcessingUnits) for running computationally intensive tasks.
• Jupyter Notebook Integration: Colab provides a Jupyter Notebook interface that
allows you to createand execute code cells, write markdown text, and visualize data.
• Python Support: Colab supports Python programming language, allowing you to
write, execute, and debug Python code seamlessly.
• Code Snippets: You can easily create, reuse, and share code snippets in Colab
notebooks, making it convenient for collaborative coding.
• Markdown Support: Colab supports Markdown formatting, allowing you to write rich
text documentation, and create headings, lists, tables, and more within your notebook.
• GPU and TPU Support: Colab provides free access to GPU and TPU accelerators,
enabling faster computation for machine learning and deep learning tasks.
Free Colab users get chargeless access to GPU and TPU runtimes for up to 12
hours. Its GPU runtime comes with an Intel Xeon CPU @2.20 GHz, 13 GB RAM, a
Tesla K80 accelerator, and 12 GB GDDR5 VRAM. The TPU runtime consists of an
Intel Xeon CPU @2.30 GHz, 13 GB RAM, and a cloud TPU with 180 teraflops of
computational power. With Colab Pro or Pro+, you can commission more CPUs, TPUs,
and GPUs for more than 12 hours.
• Interactive Data Visualization: Colab supports various data visualization libraries like
Matplotlib, Seaborn, and Plotly, allowing you to create interactive plots and charts.
• Integrated Libraries: Colab comes pre-installed with many popular Python libraries
such as NumPy, Pandas, TensorFlow, and PyTorch, making it easy to leverage their
functionality.
• File Sharing and Collaboration: You can easily share Colab notebooks with others,
allowing for real- time collaboration and version control.
• Code Execution in the Cloud: With Colab, you can execute your code in the cloud,
which means you don't have to worry about your local machine's resources or
configurations.
• Notebook Version History: Colab automatically saves the version history of your

210303105288 Page | 2
notebooks, allowingyou to revert to previous versions if needed.
• GPU Memory Management: Colab provides tools to monitor and manage GPU
memory usage, helpingyou optimize your code for efficient memory utilization.
• Code Snippets and Examples: Colab provides a vast collection of code snippets and
examples for various tasks, including machine learning, data analysis, and visualization.
• Integrated Documentation: Colab integrates documentation for Python, TensorFlow,
and other libraries, making it easy to access reference materials while coding.
• Cloud Storage Integration: Colab seamlessly integrates with Google Drive, allowing
you to save, load,and sync your notebooks with your Google Drive account.
• Notebook Sharing: Python code notebook has never been accessible before Colab.
• Special Library Installation: Colab lets you install non-Colaboratory libraries (AWS S3,
GCP, SQL, MySQL,etc.) that are unavailable in the Code snippets. All you need to do is
add a one-liner code with the following code prefixes:
!pip install (example: !pip install matplotlib-venn)! apt-get install (example: !apt-get -qq
install -y libfluidsynth1)

Conclusion:
Google Colab offers a user-friendly Jupyter Notebook interface, high-performance
cloud-based hardware resources (CPUs, GPUs, TPUs), and extensive support for Python
libraries and frameworks. It enhances productivity and facilitates tasks such as data
exploration, machine learning, and deep learning.

210303105288 Page | 3
PRACTICAL- 2
Aim: Demonstrate basic Linux Commands.

The Linux kernel, an open-source operating system that resembles Unix, was initially
released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as the
Linux distribution, which includes the kernel, system software, and supporting libraries,
some of which are provided by the GNU Project.

Features of Linux:
• Linux is an open-source operating system, which means that anybody is free to read, alter,
and distribute its source code. The Linux community benefits from collaboration, openness,
and innovation as a result.
• Linux is renowned for its dependability and stability. Its sturdy build enables it to run
continuously for long periods of time without suffering performance degradation or
frequent rebooting.
• Security is a top priority when developing Linux. It includes strong permission systems, an
integrated firewall, and regular security upgrades to quickly fix vulnerabilities. Linux's open-
source nature also enables community review and quickbug patches.
• Linux is capable of running on a wide range of devices, including servers, embeddedsystems,
desktop and laptop computers, and Internet of Things (IoT) gadgets. It has strong hardware
support. It includes drivers for a wide range of hardware elements.
• Linux has a robust command-line interface that gives users granular control and theoption to
script operations for automation. System configuration, software management, and
administration are all made possible using the CLI.
• Linux offers a high degree of freedom and enables users to customize different elements of
their operating system. Users can customize their Linux distribution to meet their unique
needs by selecting the desktop environment and software components.

210303105356 Page | 4
1. pwd: Print Working Directory
pwd prints the full pathname of the current working directory.
$ pwd

2. cd: Change Directory


It allows you to change your working directory. You use it to move around withinthe hierarchy
of your file system.
To change into “Desktop directory” in “documents” need to write as follows.
$ cd Mihir.txt

3. cd ..
Move up one directory.
If you are in work directory and want to go to documents then write cd ..
You will end up in /documents.

4. ls: List all the files and directories


List all files and folders in the current directory in the column format.
Using various options
• Lists the total files in the directory and subdirectories, the names of the files in the current
directory, their permissions, the number of subdirectories in directories listed, the size of the
file, andthe date of last modification.
• ls -l
• List all files including hidden filesls -a

210303105356 Page | 5
5. cat
cat stands for "catenate". It reads data from files, and outputs their contents. It is thesimplest way to
display the contents of a file at the command line.
• Print the contents of files mytext.txt and yourtext.txt cat mytext.txt yourtext.txt
• Print the cpu information using cat command cat /proc/cpuinfo
• Print the memory information using cat command

cat /proc/meminfo

6. mkdir
If the specified directory does not already exist, mkdir creates it. More than onedirectory may
be specified when calling mkdir.
Create a directory named Mihir.
mkdir Mihir.txt

7. cp: Copy file


The cp command is used to make copy of files and directories.
Creates a copy of the file in the currently working directory named origfile. The copy will be
namednewfile, and will be located in the working directory.
cp Mihir.txt new.txt

8. rmdir
The rmdir command is used to remove a directory that contains other files or directories.
Delete mydir directory along with all files and directories within that directory. Here, -r is for
recursiveand –f is for forcefully.

210303105356 Page | 6
rmdir -rf mydir

9. echo
Display text on the screen.
Print Hello World on the screen echo “Hello, Myself Mihir Vaghasiya.”

10. clear
Used to clear the screen.
Clear the entire screen
Clear

11. mv
Used for the move through command line. Also used for renaming the file.
mv new.txt newer.txt

12. locate
The locate command is used to locate file in a Linux System. Just like the search command in
Windows. This command is useful when you don’t know where a file saved or the actual name of
the file.

210303105356 Page | 7
PRACTICAL- 3
Aim: Using Divide and Conquer Strategies design a class for Concurrent Quick
Sort using C++.

What is divide and conquer Strategy?


• The Divide and Conquer strategy are an approach used to solve complex
problems bybreaking them down into smaller, more manageable sub-problems.
It consists of threesteps:
• Divide: The original problem is divided into smaller sub-problems that are similar
to the original problem but of reduced size. This division is often done
recursively until the subproblems become simple enough to be solved directly.
• Conquer: Each sub-problem is solved independently. This step involves applying
the same divide and conquer strategy to the sub-problems until they are small
enough to be solved straightforwardly. • Combine: The solutions to the sub-
problems are combined ormerged to obtain the solution to the original problem.
✓ In simpler terms, Divide and Conquer is like breaking a big problem into smaller
parts, solving each part individually, and then combining the solutions to get the
final answer. By breaking down a problem into smaller and more manageable
pieces, it becomes easier to solve and understand. This strategy is widely used in
various algorithms and problemsolving techniques to efficiently tackle complex
tasks.

What is quick Sort?


• Quicksort is a sorting algorithm that follows the Divide and Conquer strategy to
sort a listof elements. It works by selecting a pivot element from the list and
partitioning the otherelements into two sub-arrays, according to whether they
are less than or greater than the pivot. The process is then repeated recursively
for the sub-arrays until the entire list is sorted.

210303105356 Page | 8
Code:
#include <iostream>
using namespace std;
int partition(int arr[], int low, int high)
{
int pivot = arr[high]; int i = (low - 1);
for (int j = low; j <= high - 1; j++)
{
if (arr[j] < pivot)
{
i++;
swap(arr[i], arr[j]);
}
}
swap(arr[i + 1], arr[high]);
return (i + 1);
}
void quickSort(int arr[], int low, int high)
{
if (low < high)
{
int pi = partition(arr, low, high);
quickSort(arr, low, pi - 1);
quickSort(arr, pi + 1, high);
}
}
int main()
{
int n;
cout<<"Enter the size of the array: ";
cin>>n;
int arr[n];
cout<<"Enter the elements of the array:\n";

210303105356 Page | 9
for(int i=0;i<n;i++)
cin>>arr[i];
quickSort(arr, 0, n - 1);
cout << "Sorted array: ";
for (int i = 0; i < n; i++)
cout << arr[i] << " ";
return 0;
}

Output:

210303105356 Page | 10
PRACTICAL- 4
Aim: Write a program on an unloaded cluster for several different numbers of
nodes and record the time taken in each case. Draw a graph of execution time
against the number of nodes.

What is an HPC Cluster?


An HPC cluster, or high-performance computing cluster, is a
combination of specialized hardware, including a group of large and powerful
computers, and a distributed processing software framework configured to handle
massive amounts of data at high speeds with parallel performance and high
availability.

How do you build an HPC cluster?


While building an HPC cluster is fairly straightforward, it requires an
organization to understand the level of compute power needed on a daily basis to
determine the setup.
• Build a compute node: Configure a head node by installing tools for monitoring and
resource management as well as high-speed interconnect drivers/software.
• Configure IP addresses: For peak efficiency, HPC clusters contain a high-speed
interconnect network that uses a dedicated IP subnet.
• Configure jobs as CMU user groups: As workloads arrive in the queue, you will needa script
to dynamically create CMU user groups for each currently running job.

Key components of an HPC cluster:


Compute hardware:
Compute hardware includes servers, storage, and a dedicated network. Typically, you
will need to provision at least three servers that function as primary, worker, and
client nodes. With such a limited setup, you’ll need to invest in high-end servers with
ample processors and storage for more compute capacity in each.
Software:
The software layer includes the tools you intend to use to monitor, provision, and
manage your HPC cluster. Software stacks comprise libraries, compilers, debuggers,

210303105356 Page | 11
and file systems as well to execute cluster management functions.
Facilities:
To house your HPC cluster, you need actual physical floor space to hold and support
the weight of racks of servers, which can include up to 72 blade-style servers and five
top-of-rack switches weighing in at up to 1,800 pounds.

Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import
KMeansimport time

X = [[1,2],[1,4],[1,0],
[4,2],[4,0],[4,4],
[4,5],[0,2],[5,5]]

nodes=[1,2,3,4,5]
time_taken=[]
for n in nodes:
start_time=time.time()
kmeans=KMeans(n_clusters=n)
kmeans.fit(X)
end_time=time.time()
time_taken.append(end_time - start_time)
plt.plot(nodes, time_taken)
plt.xlabel('Number ofNodes')
plt.ylabel('Time Taken')
plt.title('Time Taken VS Number of Nodes')
plt.show()

210303105356 Page | 12
Output:

210303105356 Page | 13
PRACTICAL- 5
Aim: Write a program to check task distribution using Gprof.
Code:

Theory:

What is Profiling?

Profiling involves instrumenting your code to collect data about


its execution. This data can then be analysed to identify the
following:

• Time spent in different functions and code blocks

• Frequency of function calls

• Memory allocation and usage patterns

• CPU and other hardware resource utilization

%%writefile func1.c
//func1.c
//user-defined functions
#include <stdio.h>
int sum(int a, int b)
{
return a+b;
}
//driver code
int main()
{
int a = 30, b = 40;
210303105356 Page | 14
//function call
int res = sum(a ,b);
printf("Sum is %d", res);
return 0;
}

210303105356 Page | 15
!gcc -Wall -pg func1.c -o func1
!ls

!./func1

!gprof func1 gmon.out > func.txt


!cat func.txt

%%writefile test_gprof.c
//test_gprof.call
#include<stdio.h>
void new_func1(void);
void func1(void)
{
printf("\n Inside func1 \n");
210303105356 Page | 16
int i=0;
for(;i<0xffffffff;i++);
new_func1();
return;
}

static void func2(void)


{
printf("\n Inside func2\n");
int i=0;
for(;i<0xffffffaa;i++);
return;
}

int main(void)
{
printf("\n Inside main()\n");
int i=0;
for(;i<0xffffff;i++);
func1();
func2();
return 0;
}

%%writefile test_gprof_new.c
//test_gprof_new.c
#include<stdio.h>
void new_func1(void)
{
printf("\n Inside new_func1()\n");
int i = 0;

for(;i<0xffffffee;i++);

210303105356 Page | 17
return;
}
int main()
{
printf("\n Inside main...");
new_func1();
return 0;
}

#Profiling enabled while compilation


!gcc -Wall -pg test_gprof.c test_gprof_new.c -o test_gprof
!ls

#Execute the code


!./test_gprof

210303105356 Page | 18
#Run the gprof tool
!gprof test_gprof gmon.out > analysis.txt
!ls

210303105356 Page | 19
!cat analysis.txt

210303105356 Page | 20
PRACTICAL- 6
Aim: Use Intel V-Tune Performance Analyzer for Profiling.

Intel® VTuneTM Profiler is a performance analysis tool for serial and multithreaded applications.
Use VTune Profiler to analyze your choice of algorithm. Identify potential benefits for your
application from available hardware resources.

New in Intel® V-Tune Profiler

• GPU Accelerators
o Stall Factor Information in GPU Profiling Results
o Metric Groups for Multiple GPUs
o Updated Metrics for Multiple GPUs
o Support for Unified Shared Memory extension of OpenCLTM API
o Support for DirectML API
• Application Performance Snapshot
o Updated Metrics for Multiple GPUs
o Histograms in Metric Tooltips
• Input and Output Analysis
• VTune Profiler Server
• Managed Code Targets
• Language Support
• Operating System Support

Install Intel® VTuneTM Profiler


• Download and install Intel® VTuneTM Profiler on your system to gather performance data,
either on your native system or on a remote system. You can install the application on Linux*,
Windows*, or macOS* host systems but you can collect performance data on remote
Windows or Linux target systems only.

210303105356 Page | 21
System Requirements
To verify hardware and software requirements for your VTune Profiler download, see
Intel® VTuneTM Profiler System Requirements.
Installation Information
Whether you downloaded Intel® VTuneTM Profiler as a standalone component or with
the Intel® oneAPI Base Toolkit, the default path for your <install-dir> is:

System Requirements
VTune Profiler Server System
• 64-bit Linux* or Windows* OS
• Same system requirements and supported operating system distributions as specified for
VTune Profiler command line tool in the Release Notes
Client System
• Chrome, Firefox or Safari (recent versions)
VTune Profiler Server is tested with the latest versions of supported browsers at the time of each
release.
Target System
• 32- or 64-bit Linux or Windows OS
• Same system requirements and supported operating system distributions as specified for
VTune Profiler target systems in the Release Notes

Set Environment Variables


To set up environment variables for VTune ProfilerVTune Profiler, run the setvars script:
Linux* OS: source <install-dir>/setvars.sh
Windows* OS:<install-dir>\setvars.bat
When you run this script, it displays the product name and the build number. You can now
use the vtune and vtune-gui commands.

210303105356 Page | 22
Open VTune Profiler from the GUI
On Windows* OS, use the Search menu or locate VTune Profiler from the Start menu to run the
standalone GUI client.
For the version of VTune Profiler that is integrated into Microsoft* Visual Studio* IDE on
Windows OS, do one
of the following:
• Select Intel VTune Profiler from the Tools menu of Visual Studio.
• Click the Configure Analysis with VTune Profiler toolbar button.
On a macOS* system, start Intel VTune Profiler version from the Launchpad.

Open VTune Profiler from the Command Line


To launch the VTune Profiler from the command line, run the following scripts from the
<install-dir>/bin64 directory:
• vtune-gui for the standalone graphical interface
• vtune for the command line interface
To open a specific VTune Profiler project or a result file, enter: > vtune-gui <path> where <path>
is one of the following:
• full path to a result file (*.vtune)
• full path to a project file (*.vtuneproj)
• full path to a project directory. If the project file does not exist in the directory, the
New Project dialog box opens and prompts you to create a new project in the given
directory.
For example, to open the matrix project in the VTune Profiler GUI on Linux, run: vtune-gui
/root/intel/vtune/projects/matrix/matrix.vtuneproj

210303105356 Page | 23
PRACTICAL- 7
Aim: Analyze the code using Nvidia-Profilers.
Nvidia Profiler
Introduction:
Nvidia Profiler is a powerful tool used to analyze the performance of applications running on
Nvidia GPUs. It provides developers with valuable insights into the execution behavior of their
code, helping them identify performance bottlenecks and optimize their applications for better
GPU utilization. The theory behind Nvidia Profiler revolves around understanding the GPU's
architecture, the concepts of parallelism, memory hierarchy, and various performance metrics.

Type of Nvidia Profilers:


Nvidia Visual Profiler (NVVP):
The Nvidia Visual Profiler is a graphical user interface (GUI) tool that enables developers to
profile and analyze the performance of CUDA applications. It provides a range of visualizations
and metrics to understand how the application utilizes the GPU resources, including kernel
execution time, memory access patterns, occupancy, and more. NVVP offers an intuitive way to
identify performance bottlenecks and optimize CUDA code.

1. Nvidia Command Line Profiler (nvprof):


Nvprof is a command-line tool that allows developers to profile CUDA applications from the
terminal or command prompt. It provides various profiling metrics and outputs the results in a
textual format. Developers can use nvprof to quickly gather performance data and integrate it into
scripts or automated workflows for batch profiling.

2. Nvidia Nsight Systems:


Nsight Systems is a powerful system-wide performance analysis tool provided by Nvidia. It
allows developers to analyze the performance of CPU and GPU activities in a unified timeline
view. Nsight Systems provides insights into the interaction between the CPU and GPU, helping to
identify potential bottlenecks due to data transfer or synchronization.

210303105356 Page | 24
3. Nvidia Nsight Compute:
Nsight Compute is a profiler dedicated to analyzing the performance of CUDA kernels at a low-
levelinstruction and hardware operation level. It offers detailed metrics related to instruction-level
execution, memory transactions, and cache behavior, providing a deep understanding of how the
GPU executes specific kernels.

4. Nvidia Nsight Graphics:


Nsight Graphics is a profiler designed specifically for DirectX and Vulkan-based applications. It
helps game developers and graphics programmers analyze GPU performance in rendering
workloads, shaders, and graphics API calls. It provides insights into GPU utilization, frame
pacing, and renderingpipeline efficiency.

5. Nvidia Nsight for AI:


This profiler is focused on deep learning workloads and AI applications. It provides performance
analysis and insights into GPU utilization, memory usage, and data transfer efficiency for deep
learning frameworks like TensorFlow, PyTorch, and others.

Basic code of add two numbers:


%%writefile
cudabasic.cu
#include<stdio.h>
#include<cuda.h>
#include "curand.h"
#include<cuda_runtime_api.h>
global void add(int *a, int *b, int *c)
{
*c = *a + *b;
}
int main()
{
int a,b,c;
int *d_a, *d_b, *d_c;
int size=sizeof(int);

210303105356 Page | 25
cudaMalloc((void **)&d_a,
size);
cudaMalloc((void **)&d_b,
size);
cudaMalloc((void **)&d_c,
size);
a=2;
b=7;
cudaMemcpy(d_a,&a,size,cudaMemcpyHostToDevic
e);
cudaMemcpy(d_b,&b,size,cudaMemcpyHostToDevie
);
add<<<1,1>>>(d_a, d_b, d_c);
cudaMemcpy(&c, d_c, size, cudaMemcpyDeviceToHost);
cudaFree(d_a), cudaFree(d_b), cudaFree(d_c);
printf("%d", c);
return 0;
}

Applying Nvidia Profiler to this code


!nvcc -0 cudabasic cudabasic.cu
! ./cudabasic

210303105356 Page | 26
!nvprof ./cudabasic

210303105356 Page | 27
PRACTICAL- 8
Aim: Write a program to perform load distribution on GPU using CUDA.
Performing load distribution on GPU using CUDA in Google Colab involves several
steps. CUDA is a parallel computing platform and application programming interface (API)
developed by NVIDIA for general-purpose GPU programming. Google Colab provides free
access to GPU resources, making it an excellent platform to experimentwith CUDA.
Here are the steps to create a simple CUDA program in Google Colab:
1. Access Google Colab
2. Create a New Notebook
3. Set GPU as the Runtime Type
4. Install Required Libraries
If you need to install any libraries, you can do so using pip. For CUDA programming,
you might need to install the pycuda library:
Code :
!pip install pycuda

5. Import Required Libraries:


Import the necessary Python libraries, including pycuda and numpy for this example.
6. Write CUDA Kernel Code:
Write the CUDA kernel code in a code cell. This is the part of the code that will run
on the GPU. Here's a simple example that adds two arrays element- wise:

Code :

from numba import cuda

import numpy as np

@cuda.jit

def add_arrays(a, b, result):

idx = cuda.grid(1)

if idx < a.size:

result[idx] = a[idx] + b[idx]

7. Allocate Memory on GPU:


In another cell, allocate memory on the GPU for the input
arrays and theresult array using cuda.to_device:

210303105356 Page | 28
Code:

a = np.array([1, 2, 3, 4, 5])

b = np.array([10, 20, 30, 40, 50])

result = np.empty_like(a)

d_a = cuda.to_device(a)

d_b = cuda.to_device(b)

d_result = cuda.to_device(result)

1. Configure GPU Grid and Block:


Configure the GPU grid and block dimensions to control how the GPUthreads are organized:

threads_per_block = 256

blocks_per_grid = (a.size + threads_per_block - 1) // threads_per_block

2. Launch the CUDA Kernel:


Call the CUDA kernel function with the configured grid and blockdimensions:
add_arrays[blocks_per_grid, threads_per_block](d_a, d_b, d_result)

3. Copy the Result Back to CPU:


Copy the result array from the GPU back to the CPU:
d_result.copy_to_host(result)

4. Print the Result:


Finally, print the result to verify that the GPU computation was successful:
print(result)

Output:

210303105356 Page | 29
PRACTICAL- 9
Aim: Write a simple CUDA program to print “Hello World!”.
Code:
%%writefile hello.cu
#include <stdio.h>

global void hello()


{
printf("Hello World !!!, Myself Mihir Vaghasiya\n");
}

int main()
{
hello<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}

!nvcc hello.cu -o hello

!./hello

210303105356 Page | 30
Output:

210303105356 Page | 31
PRACTICAL- 10
Aim: Write a CUDA program to add two arrays.
Code:
%%writefile addtwoarr.cu
//func.c
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
global void addKernel(int*c, const int*a, const int*b, int size)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i<size)
{
c[i]=a[i]+b[i];
}
}

void addWithCuda(int* c, const int* a, const int* b, int size)


{
int* dev_a = nullptr; int* dev_b = nullptr; int* dev_c = nullptr;
cudaMalloc((void**)&dev_c, size * sizeof(int));
cudaMalloc((void**)&dev_a, size * sizeof(int));
cudaMalloc((void**)&dev_b, size * sizeof(int));

cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);


cudaMemcpy(dev_b, b, size * sizeof(int), cudaMemcpyHostToDevice);
addKernel<<<2, (size + 1) / 2>>>(dev_c, dev_a, dev_b, size);

cudaDeviceSynchronize();

cudaMemcpy(c, dev_c, size * sizeof(int), cudaMemcpyDeviceToHost);

210303105356 Page | 32
cudaFree(dev_c);
cudaFree(dev_a);
cudaFree(dev_b);
}
int main(int argc, char** argv)
{
const int arraySize = 5;
const int a[arraySize] = {1, 2, 3, 4, 5};
const int b[arraySize] = {10, 20, 30, 40, 50};
int c[arraySize] = { 0 };
addWithCuda(c, a, b, arraySize);
printf("{1, 2, 3, 4, 5} + {10, 20, 30, 40, 50} = {%d, %d, %d, %d}\n",c[0], c[1], c[2], c[3], c[4]);
cudaDeviceReset();
return 0;
}

!nvcc addtwoarr.cu -o addtwoarr


!./addtwoarr

210303105356 Page | 33

You might also like