0% found this document useful (0 votes)

22 views36 pages

Divy HPC

Uploaded by

Meet Panchal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views36 pages

Divy HPC

Uploaded by

Meet Panchal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

FACULTY OF ENGINEERING AND TECHNOLOGY

BACHLOR OF TECHNOLOGY

HIGH PERFORMANCE COMPUTING

(203105430)

SEMESTER VI

Computer Science & Engineering

Department

Laboratory Manual
CERTIFICATE

This is to certify that Mr. Divy Patel with enrollment

no.210303105288 has successfully completed his laboratory

experiments in the High Performance Computing (203105430)

from the department of Computer Science & Engineering during

the academic year 2023-24.

Date of Submission: ......................... Staff In charge: ...........................

Head Of Department: ...........................................

COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
HIGH PERFORMANCE COMPUTING (203105430) B. Tech. 3rdYEAR
ENROLLMENT NO: 210303105356

TABLE OF CONTENT

Page No
Date of Marks
Sr. Date of
Experiment Title Comple Sign (out of
No Start
tion 10)
From To

Study the facilities provided by

1. Google Colab.

Demonstrate basic Linux

2.
Commands.

Using Divide and Conquer Strategies

3. design a class for Concurrent Quick
Sort using C++.
Write a program on an unloaded
cluster for several different numbers
4. of nodes and record the time taken in
each case. Draw a graph of execution
time against the number of nodes.

Write a program to check task

5.
distribution using Gprof.

Use Intel V-Tune Performance

6.
Analyzer for Profiling.

Analyze the code using Nvidia-

7. Profilers.

Write a program to perform load

8. distribution on GPU using CUDA.

Write a simple CUDA program to

9. print “Hello World!”

10. Write a CUDA program to add two

arrays.

ii| Page
PRACTICAL- 1
Aim: Study the facilities provided by Google Colab.
What is google collab?
Collaboratory, or “Colab” for short, is a product from Google Research. Colab allows
anybody to writeand execute arbitrary Python code through the browser and is especially well
suited to machine learning, data analysis, and education. More technically, Colab is a hosted
Jupyter notebook service that requires no setup to use, while providing access free of charge to
computing resources includingGPUs.
As a programmer, you can perform the following using Google Colab.
• Write and execute code in Python
• Document your code that supports mathematical equations
• Create/Upload/Share notebooks
• Import/Save notebooks from/to Google Drive
• Import/Publish notebooks from GitHub
• Import external datasets e.g., from Kaggle
• Integrate PyTorch, TensorFlow, Keras, OpenCV
• Free Cloud service with free GPU
Google Colab makes data science, deep learning, neural network, and machine learning
accessible toindividual researchers who cannot afford costly computational infrastructure.

Why should you choose google colab?

Google Co-laboratory is a cloud-based tool. You can start coding fantastic ML and data
science models using a Chrome browser. Colab is free of charge with limited resources.
However, you should not expect that you can store your artificial intelligence or machine
learning models indefinitely on Colab’sfree infrastructure.If you know working on Jupyter, you
need not go through any learning curve on Google Colaboratory. Free access to GPUs and
TPUs for extensive data science and machine learning models. It comes with pre-installed and
popular data science libraries. Coders can easily share the codenotebook with collaborators for
real-time coding. Since Google hosts the notebook on Google Cloud, you do not need to worry
about code document version control and storage. Easily integrates with GitHub. You can train
AI using images. You can also train models on audio and text. Researchers can also run
TensorFlow programs on Colab.

210303105288 Page | 1
• Features: Google Colab, also known as Google Colaboratory, is a cloud-based platform
that provides a free Jupyter Notebook environment with integrated support for Python
programming.
• Free Cloud Computing: Google Colab offers free access to powerful GPUs and TPUs
(Tensor ProcessingUnits) for running computationally intensive tasks.
• Jupyter Notebook Integration: Colab provides a Jupyter Notebook interface that
allows you to createand execute code cells, write markdown text, and visualize data.
• Python Support: Colab supports Python programming language, allowing you to
write, execute, and debug Python code seamlessly.
• Code Snippets: You can easily create, reuse, and share code snippets in Colab
notebooks, making it convenient for collaborative coding.
• Markdown Support: Colab supports Markdown formatting, allowing you to write rich
text documentation, and create headings, lists, tables, and more within your notebook.
• GPU and TPU Support: Colab provides free access to GPU and TPU accelerators,
enabling faster computation for machine learning and deep learning tasks.
Free Colab users get chargeless access to GPU and TPU runtimes for up to 12
hours. Its GPU runtime comes with an Intel Xeon CPU @2.20 GHz, 13 GB RAM, a
Tesla K80 accelerator, and 12 GB GDDR5 VRAM. The TPU runtime consists of an
Intel Xeon CPU @2.30 GHz, 13 GB RAM, and a cloud TPU with 180 teraflops of
computational power. With Colab Pro or Pro+, you can commission more CPUs, TPUs,
and GPUs for more than 12 hours.
• Interactive Data Visualization: Colab supports various data visualization libraries like
Matplotlib, Seaborn, and Plotly, allowing you to create interactive plots and charts.
• Integrated Libraries: Colab comes pre-installed with many popular Python libraries
such as NumPy, Pandas, TensorFlow, and PyTorch, making it easy to leverage their
functionality.
• File Sharing and Collaboration: You can easily share Colab notebooks with others,
allowing for real- time collaboration and version control.
• Code Execution in the Cloud: With Colab, you can execute your code in the cloud,
which means you don't have to worry about your local machine's resources or
configurations.
• Notebook Version History: Colab automatically saves the version history of your

210303105288 Page | 2
notebooks, allowingyou to revert to previous versions if needed.
• GPU Memory Management: Colab provides tools to monitor and manage GPU
memory usage, helpingyou optimize your code for efficient memory utilization.
• Code Snippets and Examples: Colab provides a vast collection of code snippets and
examples for various tasks, including machine learning, data analysis, and visualization.
• Integrated Documentation: Colab integrates documentation for Python, TensorFlow,
and other libraries, making it easy to access reference materials while coding.
• Cloud Storage Integration: Colab seamlessly integrates with Google Drive, allowing
you to save, load,and sync your notebooks with your Google Drive account.
• Notebook Sharing: Python code notebook has never been accessible before Colab.
• Special Library Installation: Colab lets you install non-Colaboratory libraries (AWS S3,
GCP, SQL, MySQL,etc.) that are unavailable in the Code snippets. All you need to do is
add a one-liner code with the following code prefixes:
!pip install (example: !pip install matplotlib-venn)! apt-get install (example: !apt-get -qq
install -y libfluidsynth1)

Conclusion:
Google Colab offers a user-friendly Jupyter Notebook interface, high-performance
cloud-based hardware resources (CPUs, GPUs, TPUs), and extensive support for Python
libraries and frameworks. It enhances productivity and facilitates tasks such as data
exploration, machine learning, and deep learning.

210303105288 Page | 3
PRACTICAL- 2
Aim: Demonstrate basic Linux Commands.

The Linux kernel, an open-source operating system that resembles Unix, was initially
released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as the
Linux distribution, which includes the kernel, system software, and supporting libraries,
some of which are provided by the GNU Project.

Features of Linux:
• Linux is an open-source operating system, which means that anybody is free to read, alter,
and distribute its source code. The Linux community benefits from collaboration, openness,
and innovation as a result.
• Linux is renowned for its dependability and stability. Its sturdy build enables it to run
continuously for long periods of time without suffering performance degradation or
frequent rebooting.
• Security is a top priority when developing Linux. It includes strong permission systems, an
integrated firewall, and regular security upgrades to quickly fix vulnerabilities. Linux's open-
source nature also enables community review and quickbug patches.
• Linux is capable of running on a wide range of devices, including servers, embeddedsystems,
desktop and laptop computers, and Internet of Things (IoT) gadgets. It has strong hardware
support. It includes drivers for a wide range of hardware elements.
• Linux has a robust command-line interface that gives users granular control and theoption to
script operations for automation. System configuration, software management, and
administration are all made possible using the CLI.
• Linux offers a high degree of freedom and enables users to customize different elements of
their operating system. Users can customize their Linux distribution to meet their unique
needs by selecting the desktop environment and software components.

210303105356 Page | 4
1. pwd: Print Working Directory
pwd prints the full pathname of the current working directory.
$ pwd

2. cd: Change Directory

It allows you to change your working directory. You use it to move around withinthe hierarchy
of your file system.
To change into “Desktop directory” in “documents” need to write as follows.
$ cd Mihir.txt

3. cd ..
Move up one directory.
If you are in work directory and want to go to documents then write cd ..
You will end up in /documents.

4. ls: List all the files and directories

List all files and folders in the current directory in the column format.
Using various options
• Lists the total files in the directory and subdirectories, the names of the files in the current
directory, their permissions, the number of subdirectories in directories listed, the size of the
file, andthe date of last modification.
• ls -l
• List all files including hidden filesls -a

210303105356 Page | 5
5. cat
cat stands for "catenate". It reads data from files, and outputs their contents. It is thesimplest way to
display the contents of a file at the command line.
• Print the contents of files mytext.txt and yourtext.txt cat mytext.txt yourtext.txt
• Print the cpu information using cat command cat /proc/cpuinfo
• Print the memory information using cat command

cat /proc/meminfo

6. mkdir
If the specified directory does not already exist, mkdir creates it. More than onedirectory may
be specified when calling mkdir.
Create a directory named Mihir.
mkdir Mihir.txt

7. cp: Copy file

The cp command is used to make copy of files and directories.
Creates a copy of the file in the currently working directory named origfile. The copy will be
namednewfile, and will be located in the working directory.
cp Mihir.txt new.txt

8. rmdir
The rmdir command is used to remove a directory that contains other files or directories.
Delete mydir directory along with all files and directories within that directory. Here, -r is for
recursiveand –f is for forcefully.

210303105356 Page | 6
rmdir -rf mydir

9. echo
Display text on the screen.
Print Hello World on the screen echo “Hello, Myself Mihir Vaghasiya.”

10. clear
Used to clear the screen.
Clear the entire screen
Clear

11. mv
Used for the move through command line. Also used for renaming the file.
mv new.txt newer.txt

12. locate
The locate command is used to locate file in a Linux System. Just like the search command in
Windows. This command is useful when you don’t know where a file saved or the actual name of
the file.

210303105356 Page | 7
PRACTICAL- 3
Aim: Using Divide and Conquer Strategies design a class for Concurrent Quick
Sort using C++.

What is divide and conquer Strategy?

• The Divide and Conquer strategy are an approach used to solve complex
problems bybreaking them down into smaller, more manageable sub-problems.
It consists of threesteps:
• Divide: The original problem is divided into smaller sub-problems that are similar
to the original problem but of reduced size. This division is often done
recursively until the subproblems become simple enough to be solved directly.
• Conquer: Each sub-problem is solved independently. This step involves applying
the same divide and conquer strategy to the sub-problems until they are small
enough to be solved straightforwardly. • Combine: The solutions to the sub-
problems are combined ormerged to obtain the solution to the original problem.
✓ In simpler terms, Divide and Conquer is like breaking a big problem into smaller
parts, solving each part individually, and then combining the solutions to get the
final answer. By breaking down a problem into smaller and more manageable
pieces, it becomes easier to solve and understand. This strategy is widely used in
various algorithms and problemsolving techniques to efficiently tackle complex
tasks.

What is quick Sort?

• Quicksort is a sorting algorithm that follows the Divide and Conquer strategy to
sort a listof elements. It works by selecting a pivot element from the list and
partitioning the otherelements into two sub-arrays, according to whether they
are less than or greater than the pivot. The process is then repeated recursively
for the sub-arrays until the entire list is sorted.

210303105356 Page | 8
Code:
#include <iostream>
using namespace std;
int partition(int arr[], int low, int high)
{
int pivot = arr[high]; int i = (low - 1);
for (int j = low; j <= high - 1; j++)
{
if (arr[j] < pivot)
{
i++;
swap(arr[i], arr[j]);
}
}
swap(arr[i + 1], arr[high]);
return (i + 1);
}
void quickSort(int arr[], int low, int high)
{
if (low < high)
{
int pi = partition(arr, low, high);
quickSort(arr, low, pi - 1);
quickSort(arr, pi + 1, high);
}
}
int main()
{
int n;
cout<<"Enter the size of the array: ";
cin>>n;
int arr[n];
cout<<"Enter the elements of the array:\n";

210303105356 Page | 9
for(int i=0;i<n;i++)
cin>>arr[i];
quickSort(arr, 0, n - 1);
cout << "Sorted array: ";
for (int i = 0; i < n; i++)
cout << arr[i] << " ";
return 0;
}

Output:

210303105356 Page | 10
PRACTICAL- 4
Aim: Write a program on an unloaded cluster for several different numbers of
nodes and record the time taken in each case. Draw a graph of execution time
against the number of nodes.

What is an HPC Cluster?

An HPC cluster, or high-performance computing cluster, is a
combination of specialized hardware, including a group of large and powerful
computers, and a distributed processing software framework configured to handle
massive amounts of data at high speeds with parallel performance and high
availability.

How do you build an HPC cluster?

While building an HPC cluster is fairly straightforward, it requires an
organization to understand the level of compute power needed on a daily basis to
determine the setup.
• Build a compute node: Configure a head node by installing tools for monitoring and
resource management as well as high-speed interconnect drivers/software.
• Configure IP addresses: For peak efficiency, HPC clusters contain a high-speed
interconnect network that uses a dedicated IP subnet.
• Configure jobs as CMU user groups: As workloads arrive in the queue, you will needa script
to dynamically create CMU user groups for each currently running job.

Key components of an HPC cluster:

Compute hardware:
Compute hardware includes servers, storage, and a dedicated network. Typically, you
will need to provision at least three servers that function as primary, worker, and
client nodes. With such a limited setup, you’ll need to invest in high-end servers with
ample processors and storage for more compute capacity in each.
Software:
The software layer includes the tools you intend to use to monitor, provision, and
manage your HPC cluster. Software stacks comprise libraries, compilers, debuggers,

210303105356 Page | 11
and file systems as well to execute cluster management functions.
Facilities:
To house your HPC cluster, you need actual physical floor space to hold and support
the weight of racks of servers, which can include up to 72 blade-style servers and five
top-of-rack switches weighing in at up to 1,800 pounds.

Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import
KMeansimport time

X = [[1,2],[1,4],[1,0],
[4,2],[4,0],[4,4],
[4,5],[0,2],[5,5]]

nodes=[1,2,3,4,5]
time_taken=[]
for n in nodes:
start_time=time.time()
kmeans=KMeans(n_clusters=n)
kmeans.fit(X)
end_time=time.time()
time_taken.append(end_time - start_time)
plt.plot(nodes, time_taken)
plt.xlabel('Number ofNodes')
plt.ylabel('Time Taken')
plt.title('Time Taken VS Number of Nodes')
plt.show()

210303105356 Page | 12
Output:

210303105356 Page | 13
PRACTICAL- 5
Aim: Write a program to check task distribution using Gprof.
Code:

Theory:

What is Profiling?

Profiling involves instrumenting your code to collect data about

its execution. This data can then be analysed to identify the
following:

• Time spent in different functions and code blocks

• Frequency of function calls

• Memory allocation and usage patterns

• CPU and other hardware resource utilization

%%writefile func1.c
//func1.c
//user-defined functions
#include <stdio.h>
int sum(int a, int b)
{
return a+b;
}
//driver code
int main()
{
int a = 30, b = 40;
210303105356 Page | 14
//function call
int res = sum(a ,b);
printf("Sum is %d", res);
return 0;
}

210303105356 Page | 15
!gcc -Wall -pg func1.c -o func1
!ls

!./func1

!gprof func1 gmon.out > func.txt

!cat func.txt

%%writefile test_gprof.c
//test_gprof.call
#include<stdio.h>
void new_func1(void);
void func1(void)
{
printf("\n Inside func1 \n");
210303105356 Page | 16
int i=0;
for(;i<0xffffffff;i++);
new_func1();
return;
}

static void func2(void)

{
printf("\n Inside func2\n");
int i=0;
for(;i<0xffffffaa;i++);
return;
}

int main(void)
{
printf("\n Inside main()\n");
int i=0;
for(;i<0xffffff;i++);
func1();
func2();
return 0;
}

%%writefile test_gprof_new.c
//test_gprof_new.c
#include<stdio.h>
void new_func1(void)
{
printf("\n Inside new_func1()\n");
int i = 0;

for(;i<0xffffffee;i++);

210303105356 Page | 17
return;
}
int main()
{
printf("\n Inside main...");
new_func1();
return 0;
}

#Profiling enabled while compilation

!gcc -Wall -pg test_gprof.c test_gprof_new.c -o test_gprof
!ls

#Execute the code

!./test_gprof

210303105356 Page | 18
#Run the gprof tool
!gprof test_gprof gmon.out > analysis.txt
!ls

210303105356 Page | 19
!cat analysis.txt

210303105356 Page | 20
PRACTICAL- 6
Aim: Use Intel V-Tune Performance Analyzer for Profiling.

Intel® VTuneTM Profiler is a performance analysis tool for serial and multithreaded applications.
Use VTune Profiler to analyze your choice of algorithm. Identify potential benefits for your
application from available hardware resources.

New in Intel® V-Tune Profiler

• GPU Accelerators
o Stall Factor Information in GPU Profiling Results
o Metric Groups for Multiple GPUs
o Updated Metrics for Multiple GPUs
o Support for Unified Shared Memory extension of OpenCLTM API
o Support for DirectML API
• Application Performance Snapshot
o Updated Metrics for Multiple GPUs
o Histograms in Metric Tooltips
• Input and Output Analysis
• VTune Profiler Server
• Managed Code Targets
• Language Support
• Operating System Support

Install Intel® VTuneTM Profiler

• Download and install Intel® VTuneTM Profiler on your system to gather performance data,
either on your native system or on a remote system. You can install the application on Linux*,
Windows*, or macOS* host systems but you can collect performance data on remote
Windows or Linux target systems only.

210303105356 Page | 21
System Requirements
To verify hardware and software requirements for your VTune Profiler download, see
Intel® VTuneTM Profiler System Requirements.
Installation Information
Whether you downloaded Intel® VTuneTM Profiler as a standalone component or with
the Intel® oneAPI Base Toolkit, the default path for your <install-dir> is:

System Requirements
VTune Profiler Server System
• 64-bit Linux* or Windows* OS
• Same system requirements and supported operating system distributions as specified for
VTune Profiler command line tool in the Release Notes
Client System
• Chrome, Firefox or Safari (recent versions)
VTune Profiler Server is tested with the latest versions of supported browsers at the time of each
release.
Target System
• 32- or 64-bit Linux or Windows OS
• Same system requirements and supported operating system distributions as specified for
VTune Profiler target systems in the Release Notes

Set Environment Variables

To set up environment variables for VTune ProfilerVTune Profiler, run the setvars script:
Linux* OS: source <install-dir>/setvars.sh
Windows* OS:<install-dir>\setvars.bat
When you run this script, it displays the product name and the build number. You can now
use the vtune and vtune-gui commands.

210303105356 Page | 22
Open VTune Profiler from the GUI
On Windows* OS, use the Search menu or locate VTune Profiler from the Start menu to run the
standalone GUI client.
For the version of VTune Profiler that is integrated into Microsoft* Visual Studio* IDE on
Windows OS, do one
of the following:
• Select Intel VTune Profiler from the Tools menu of Visual Studio.
• Click the Configure Analysis with VTune Profiler toolbar button.
On a macOS* system, start Intel VTune Profiler version from the Launchpad.

Open VTune Profiler from the Command Line

To launch the VTune Profiler from the command line, run the following scripts from the
<install-dir>/bin64 directory:
• vtune-gui for the standalone graphical interface
• vtune for the command line interface
To open a specific VTune Profiler project or a result file, enter: > vtune-gui <path> where <path>
is one of the following:
• full path to a result file (*.vtune)
• full path to a project file (*.vtuneproj)
• full path to a project directory. If the project file does not exist in the directory, the
New Project dialog box opens and prompts you to create a new project in the given
directory.
For example, to open the matrix project in the VTune Profiler GUI on Linux, run: vtune-gui
/root/intel/vtune/projects/matrix/matrix.vtuneproj

210303105356 Page | 23
PRACTICAL- 7
Aim: Analyze the code using Nvidia-Profilers.
Nvidia Profiler
Introduction:
Nvidia Profiler is a powerful tool used to analyze the performance of applications running on
Nvidia GPUs. It provides developers with valuable insights into the execution behavior of their
code, helping them identify performance bottlenecks and optimize their applications for better
GPU utilization. The theory behind Nvidia Profiler revolves around understanding the GPU's
architecture, the concepts of parallelism, memory hierarchy, and various performance metrics.

Type of Nvidia Profilers:

Nvidia Visual Profiler (NVVP):
The Nvidia Visual Profiler is a graphical user interface (GUI) tool that enables developers to
profile and analyze the performance of CUDA applications. It provides a range of visualizations
and metrics to understand how the application utilizes the GPU resources, including kernel
execution time, memory access patterns, occupancy, and more. NVVP offers an intuitive way to
identify performance bottlenecks and optimize CUDA code.

1. Nvidia Command Line Profiler (nvprof):

Nvprof is a command-line tool that allows developers to profile CUDA applications from the
terminal or command prompt. It provides various profiling metrics and outputs the results in a
textual format. Developers can use nvprof to quickly gather performance data and integrate it into
scripts or automated workflows for batch profiling.

2. Nvidia Nsight Systems:

Nsight Systems is a powerful system-wide performance analysis tool provided by Nvidia. It
allows developers to analyze the performance of CPU and GPU activities in a unified timeline
view. Nsight Systems provides insights into the interaction between the CPU and GPU, helping to
identify potential bottlenecks due to data transfer or synchronization.

210303105356 Page | 24
3. Nvidia Nsight Compute:
Nsight Compute is a profiler dedicated to analyzing the performance of CUDA kernels at a low-
levelinstruction and hardware operation level. It offers detailed metrics related to instruction-level
execution, memory transactions, and cache behavior, providing a deep understanding of how the
GPU executes specific kernels.

4. Nvidia Nsight Graphics:

Nsight Graphics is a profiler designed specifically for DirectX and Vulkan-based applications. It
helps game developers and graphics programmers analyze GPU performance in rendering
workloads, shaders, and graphics API calls. It provides insights into GPU utilization, frame
pacing, and renderingpipeline efficiency.

5. Nvidia Nsight for AI:

This profiler is focused on deep learning workloads and AI applications. It provides performance
analysis and insights into GPU utilization, memory usage, and data transfer efficiency for deep
learning frameworks like TensorFlow, PyTorch, and others.

Basic code of add two numbers:

%%writefile
cudabasic.cu
#include<stdio.h>
#include<cuda.h>
#include "curand.h"
#include<cuda_runtime_api.h>
global void add(int *a, int *b, int *c)
{
*c = *a + *b;
}
int main()
{
int a,b,c;
int *d_a, *d_b, *d_c;
int size=sizeof(int);

210303105356 Page | 25
cudaMalloc((void **)&d_a,
size);
cudaMalloc((void **)&d_b,
size);
cudaMalloc((void **)&d_c,
size);
a=2;
b=7;
cudaMemcpy(d_a,&a,size,cudaMemcpyHostToDevic
e);
cudaMemcpy(d_b,&b,size,cudaMemcpyHostToDevie
);
add<<<1,1>>>(d_a, d_b, d_c);
cudaMemcpy(&c, d_c, size, cudaMemcpyDeviceToHost);
cudaFree(d_a), cudaFree(d_b), cudaFree(d_c);
printf("%d", c);
return 0;
}

Applying Nvidia Profiler to this code

!nvcc -0 cudabasic cudabasic.cu
! ./cudabasic

210303105356 Page | 26
!nvprof ./cudabasic

210303105356 Page | 27
PRACTICAL- 8
Aim: Write a program to perform load distribution on GPU using CUDA.
Performing load distribution on GPU using CUDA in Google Colab involves several
steps. CUDA is a parallel computing platform and application programming interface (API)
developed by NVIDIA for general-purpose GPU programming. Google Colab provides free
access to GPU resources, making it an excellent platform to experimentwith CUDA.
Here are the steps to create a simple CUDA program in Google Colab:
1. Access Google Colab
2. Create a New Notebook
3. Set GPU as the Runtime Type
4. Install Required Libraries
If you need to install any libraries, you can do so using pip. For CUDA programming,
you might need to install the pycuda library:
Code :
!pip install pycuda

5. Import Required Libraries:

Import the necessary Python libraries, including pycuda and numpy for this example.
6. Write CUDA Kernel Code:
Write the CUDA kernel code in a code cell. This is the part of the code that will run
on the GPU. Here's a simple example that adds two arrays element- wise:

Code :

from numba import cuda

import numpy as np

@cuda.jit

def add_arrays(a, b, result):

idx = cuda.grid(1)

if idx < a.size:

result[idx] = a[idx] + b[idx]

7. Allocate Memory on GPU:

In another cell, allocate memory on the GPU for the input
arrays and theresult array using cuda.to_device:

210303105356 Page | 28
Code:

a = np.array([1, 2, 3, 4, 5])

b = np.array([10, 20, 30, 40, 50])

result = np.empty_like(a)

d_a = cuda.to_device(a)

d_b = cuda.to_device(b)

d_result = cuda.to_device(result)

1. Configure GPU Grid and Block:

Configure the GPU grid and block dimensions to control how the GPUthreads are organized:

threads_per_block = 256

blocks_per_grid = (a.size + threads_per_block - 1) // threads_per_block

2. Launch the CUDA Kernel:

Call the CUDA kernel function with the configured grid and blockdimensions:
add_arrays[blocks_per_grid, threads_per_block](d_a, d_b, d_result)

3. Copy the Result Back to CPU:

Copy the result array from the GPU back to the CPU:
d_result.copy_to_host(result)

4. Print the Result:

Finally, print the result to verify that the GPU computation was successful:
print(result)

Output:

210303105356 Page | 29
PRACTICAL- 9
Aim: Write a simple CUDA program to print “Hello World!”.
Code:
%%writefile hello.cu
#include <stdio.h>

global void hello()

{
printf("Hello World !!!, Myself Mihir Vaghasiya\n");
}

int main()
{
hello<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}

!nvcc hello.cu -o hello

!./hello

210303105356 Page | 30
Output:

210303105356 Page | 31
PRACTICAL- 10
Aim: Write a CUDA program to add two arrays.
Code:
%%writefile addtwoarr.cu
//func.c
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
global void addKernel(int*c, const int*a, const int*b, int size)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i<size)
{
c[i]=a[i]+b[i];
}
}

void addWithCuda(int* c, const int* a, const int* b, int size)

{
int* dev_a = nullptr; int* dev_b = nullptr; int* dev_c = nullptr;
cudaMalloc((void**)&dev_c, size * sizeof(int));
cudaMalloc((void**)&dev_a, size * sizeof(int));
cudaMalloc((void**)&dev_b, size * sizeof(int));

cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);

cudaMemcpy(dev_b, b, size * sizeof(int), cudaMemcpyHostToDevice);
addKernel<<<2, (size + 1) / 2>>>(dev_c, dev_a, dev_b, size);

cudaDeviceSynchronize();

cudaMemcpy(c, dev_c, size * sizeof(int), cudaMemcpyDeviceToHost);

210303105356 Page | 32
cudaFree(dev_c);
cudaFree(dev_a);
cudaFree(dev_b);
}
int main(int argc, char** argv)
{
const int arraySize = 5;
const int a[arraySize] = {1, 2, 3, 4, 5};
const int b[arraySize] = {10, 20, 30, 40, 50};
int c[arraySize] = { 0 };
addWithCuda(c, a, b, arraySize);
printf("{1, 2, 3, 4, 5} + {10, 20, 30, 40, 50} = {%d, %d, %d, %d}\n",c[0], c[1], c[2], c[3], c[4]);
cudaDeviceReset();
return 0;
}

!nvcc addtwoarr.cu -o addtwoarr

!./addtwoarr

210303105356 Page | 33

Modern Tkinter for Busy Python Developers: Quickly Learn to Create Great Looking User Interfaces for Windows, Mac and Linux Using Python's Standard GUI Toolkit
From Everand
Modern Tkinter for Busy Python Developers: Quickly Learn to Create Great Looking User Interfaces for Windows, Mac and Linux Using Python's Standard GUI Toolkit
Mark Roseman
No ratings yet
HPC - Manual Kush
No ratings yet
HPC - Manual Kush
36 pages
HPC Lab Manual Zeel
No ratings yet
HPC Lab Manual Zeel
22 pages
ML Model Healthcare
No ratings yet
ML Model Healthcare
11 pages
Python
No ratings yet
Python
5 pages
Lab Exercise 1 - 172
No ratings yet
Lab Exercise 1 - 172
2 pages
1B Python and Notebook
No ratings yet
1B Python and Notebook
6 pages
Vishnu HPC
No ratings yet
Vishnu HPC
14 pages
Definition تحليل وتصميم النظم
No ratings yet
Definition تحليل وتصميم النظم
3 pages
EXP1(1) (2)
No ratings yet
EXP1(1) (2)
4 pages
Google
No ratings yet
Google
16 pages
Google Colab: A Seminar Report On
No ratings yet
Google Colab: A Seminar Report On
17 pages
HPC Final
No ratings yet
HPC Final
50 pages
Lab 0 - Getting Started with Google Colab
No ratings yet
Lab 0 - Getting Started with Google Colab
5 pages
Google_Colab_Tutorial
No ratings yet
Google_Colab_Tutorial
1 page
01 - Installation Python
No ratings yet
01 - Installation Python
29 pages
Google Colab - Rugma
No ratings yet
Google Colab - Rugma
17 pages
Lec1.1.3
No ratings yet
Lec1.1.3
42 pages
Artificial Intelligence 3171105 Lab Manual
No ratings yet
Artificial Intelligence 3171105 Lab Manual
38 pages
DL Student Lab Manual
No ratings yet
DL Student Lab Manual
81 pages
Python_GGColabs_b1 (3)
No ratings yet
Python_GGColabs_b1 (3)
40 pages
Google Colab Tutorial
80% (5)
Google Colab Tutorial
47 pages
W_GoogleColab and TensorFlow
No ratings yet
W_GoogleColab and TensorFlow
12 pages
e8X2UNX8S2j2qXX4dOGUxOh9WZ0WAX
No ratings yet
e8X2UNX8S2j2qXX4dOGUxOh9WZ0WAX
18 pages
PyCharm and Colab
No ratings yet
PyCharm and Colab
28 pages
Get Started With Google Colab For Machine Learning and Deep Learning
No ratings yet
Get Started With Google Colab For Machine Learning and Deep Learning
14 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
36 pages
Python Data Analysis For Newbies Numpypandasmatplotlibscikit Learnkeras
No ratings yet
Python Data Analysis For Newbies Numpypandasmatplotlibscikit Learnkeras
95 pages
Untitled
No ratings yet
Untitled
41 pages
1-10
No ratings yet
1-10
19 pages
DT-1. Familiarization With AIML Platforms
No ratings yet
DT-1. Familiarization With AIML Platforms
25 pages
Intro to Google Colab
No ratings yet
Intro to Google Colab
18 pages
Google Colab
No ratings yet
Google Colab
58 pages
Introduction to Google Colab
No ratings yet
Introduction to Google Colab
11 pages
Tools and Technology: Page - 6
No ratings yet
Tools and Technology: Page - 6
4 pages
Making the Most of your Colab Subscription - Colab
No ratings yet
Making the Most of your Colab Subscription - Colab
3 pages
01_DS and Env Setup
No ratings yet
01_DS and Env Setup
17 pages
Introduction to Google Colab
No ratings yet
Introduction to Google Colab
19 pages
DEEP LEARNING LAB Manuals
No ratings yet
DEEP LEARNING LAB Manuals
55 pages
How To Use Google Colab
No ratings yet
How To Use Google Colab
10 pages
Himanshu Gupta Configuration Manual
No ratings yet
Himanshu Gupta Configuration Manual
16 pages
Vivek 210033252 BDCW.ipynb - Colaboratory
No ratings yet
Vivek 210033252 BDCW.ipynb - Colaboratory
112 pages
Importing All The Necessary Libraries - Transcript
No ratings yet
Importing All The Necessary Libraries - Transcript
3 pages
Getting Started With Google Colab - Deep Learning - Great Learning
No ratings yet
Getting Started With Google Colab - Deep Learning - Great Learning
7 pages
Colab
No ratings yet
Colab
8 pages
HPC 2
No ratings yet
HPC 2
40 pages
Colab Tutorial
No ratings yet
Colab Tutorial
21 pages
A Beginners Guide to Cursor
From Everand
A Beginners Guide to Cursor
Steven Mcananey
No ratings yet
Dhruv Python Lab File
No ratings yet
Dhruv Python Lab File
20 pages
Google Colab
No ratings yet
Google Colab
11 pages
Final Project Report
No ratings yet
Final Project Report
34 pages
ML Lecture 5
No ratings yet
ML Lecture 5
14 pages
Report Data
No ratings yet
Report Data
5 pages
Practical-3:: Practical 3.1 AIM: Installing Anaconda On Windows. 1
No ratings yet
Practical-3:: Practical 3.1 AIM: Installing Anaconda On Windows. 1
20 pages
Python Series
No ratings yet
Python Series
29 pages
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
18. Deep Learning with PyTorch Step-by-Step
No ratings yet
18. Deep Learning with PyTorch Step-by-Step
136 pages
Me228_tutorial1.Ipynb - Colab
No ratings yet
Me228_tutorial1.Ipynb - Colab
20 pages
Python Lab Workbook - Final - 2021 - Updated
100% (1)
Python Lab Workbook - Final - 2021 - Updated
97 pages
Colab Tutorial 2022
No ratings yet
Colab Tutorial 2022
21 pages
Digital Assignment - 4
No ratings yet
Digital Assignment - 4
13 pages
Cybersecurity Essentials 3.0-Module06
No ratings yet
Cybersecurity Essentials 3.0-Module06
29 pages
Coc 3: Set-Up Computer Server
100% (1)
Coc 3: Set-Up Computer Server
3 pages
VolMemLyzer Volatile Memory Analyzer for Malware Classification Using Feature Engineering
No ratings yet
VolMemLyzer Volatile Memory Analyzer for Malware Classification Using Feature Engineering
8 pages
1 CommCell Environment
No ratings yet
1 CommCell Environment
67 pages
PART-06 (VXLAN L3 Packet Forwading)
No ratings yet
PART-06 (VXLAN L3 Packet Forwading)
4 pages
Resume - Yash Agarwal
No ratings yet
Resume - Yash Agarwal
1 page
Docker Seminar
No ratings yet
Docker Seminar
37 pages
License Key
No ratings yet
License Key
12 pages
This Study Resource Was
No ratings yet
This Study Resource Was
5 pages
Chap 3
No ratings yet
Chap 3
39 pages
CN Exp 7
No ratings yet
CN Exp 7
10 pages
Unit 2 DLL PPT
No ratings yet
Unit 2 DLL PPT
63 pages
Ds Unit-5
No ratings yet
Ds Unit-5
5 pages
Android Reverse Eng - 20241003 - 201138 - 0000
No ratings yet
Android Reverse Eng - 20241003 - 201138 - 0000
14 pages
Week 2: Introduction To Web Forms & C#
No ratings yet
Week 2: Introduction To Web Forms & C#
45 pages
Unit6 C#
No ratings yet
Unit6 C#
11 pages
Postal: Book Package
No ratings yet
Postal: Book Package
2 pages
Unit II - SECURE DEVELOPMENT AND DEPLOYMENT
No ratings yet
Unit II - SECURE DEVELOPMENT AND DEPLOYMENT
24 pages
C++ with codechef
No ratings yet
C++ with codechef
11 pages
Displays
No ratings yet
Displays
5 pages
Paperback Practical Laravel
No ratings yet
Paperback Practical Laravel
177 pages
1.1 Brief Introduction:: Management System Provides An Online Portal That Is Beneficial For The Client
No ratings yet
1.1 Brief Introduction:: Management System Provides An Online Portal That Is Beneficial For The Client
4 pages
Block-Diagram-PICASSO-FP5-N14PB
No ratings yet
Block-Diagram-PICASSO-FP5-N14PB
1 page
Image Steganography PPT
No ratings yet
Image Steganography PPT
20 pages
Mcafee Database Security 4.7.x Product Guide 1-12-2024
No ratings yet
Mcafee Database Security 4.7.x Product Guide 1-12-2024
213 pages
End Term
No ratings yet
End Term
6 pages
15ECSC701 - 576 - KLE47-15Ecsc701-set1 Cse Paper
No ratings yet
15ECSC701 - 576 - KLE47-15Ecsc701-set1 Cse Paper
5 pages
2021 PO R1 Question Paper Eng
No ratings yet
2021 PO R1 Question Paper Eng
2 pages
Module-8-INTEGRATIVE PROGRAMMING 2
No ratings yet
Module-8-INTEGRATIVE PROGRAMMING 2
9 pages

Divy HPC

Uploaded by

Divy HPC

Uploaded by

FACULTY OF ENGINEERING AND TECHNOLOGY

HIGH PERFORMANCE COMPUTING

Computer Science & Engineering

This is to certify that Mr. Divy Patel with enrollment

no.210303105288 has successfully completed his laboratory

experiments in the High Performance Computing (203105430)

from the department of Computer Science & Engineering during

the academic year 2023-24.

Date of Submission: ......................... Staff In charge: ...........................

Head Of Department: ...........................................

Study the facilities provided by

Demonstrate basic Linux

Using Divide and Conquer Strategies

Write a program to check task

Use Intel V-Tune Performance

Analyze the code using Nvidia-

Write a program to perform load

Write a simple CUDA program to

10. Write a CUDA program to add two

Why should you choose google colab?

2. cd: Change Directory

4. ls: List all the files and directories

7. cp: Copy file

What is divide and conquer Strategy?

What is quick Sort?

What is an HPC Cluster?

How do you build an HPC cluster?

Key components of an HPC cluster:

Profiling involves instrumenting your code to collect data about

• Time spent in different functions and code blocks

• Frequency of function calls

• Memory allocation and usage patterns

• CPU and other hardware resource utilization

!gprof func1 gmon.out > func.txt

static void func2(void)

#Profiling enabled while compilation

#Execute the code

New in Intel® V-Tune Profiler

Install Intel® VTuneTM Profiler

Set Environment Variables

Open VTune Profiler from the Command Line

Type of Nvidia Profilers:

1. Nvidia Command Line Profiler (nvprof):

2. Nvidia Nsight Systems:

4. Nvidia Nsight Graphics:

5. Nvidia Nsight for AI:

Basic code of add two numbers:

Applying Nvidia Profiler to this code

5. Import Required Libraries:

from numba import cuda

def add_arrays(a, b, result):

if idx < a.size:

result[idx] = a[idx] + b[idx]

7. Allocate Memory on GPU:

b = np.array([10, 20, 30, 40, 50])

1. Configure GPU Grid and Block:

blocks_per_grid = (a.size + threads_per_block - 1) // threads_per_block

2. Launch the CUDA Kernel:

3. Copy the Result Back to CPU:

4. Print the Result:

global void hello()

!nvcc hello.cu -o hello

void addWithCuda(int* c, const int* a, const int* b, int size)

cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);

cudaMemcpy(c, dev_c, size * sizeof(int), cudaMemcpyDeviceToHost);

!nvcc addtwoarr.cu -o addtwoarr

You might also like