0% found this document useful (0 votes)

97 views31 pages

Introduction To Data and Memory Intensive Computing

This document provides an overview of data and memory intensive computing and how the Gordon supercomputer can help with such problems. It discusses how data intensive problems involve large input/output datasets while memory intensive problems require more memory than a single node. It also outlines how flash memory, parallel file systems, virtual shared memory (vSMP), and software tools on Gordon can improve performance for these types of applications.

Uploaded by

Qamar Nangraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views31 pages

Introduction To Data and Memory Intensive Computing

Uploaded by

Qamar Nangraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Introduction to Data and Memory Intensive Computing

Gordon Summer Institute & Cyberinfrastructure Summer Institute for Geoscientists 8/8/2011 Robert Sinkovits Gordon Applications Lead San Diego Supercomputer Center

Overview
Data Intensive Computing Memory Intensive Computing Blurring boundary between data/memory How Gordon can help
Flash memory Parallel le systems vSMP Allocations and operations SDSC expertise, XSEDE AUSS

Early success stories

Data intensive computing

Data intensive problems can be characterized by the sizes of the input, output, and intermediate data sets Can also be classied according to patterns of data access (e.g. random vs. sequential, small vs. large reads/writes) Performance can be improved through changes to hardware, systems software, le systems, or user application

Data mining and certain types of visualization applications often require processing large amounts of raw data, but can end up producing fairly small amounts of output. In some cases, the result can be single number

See Leetaru presentation on Thursday morning

Human Genomics Particle Physics Large Hadron Collider

(7000PB)
1GB / person 200PB+ captured 200% CAGR

(15PB)
Annual Email Trafc, no spam

http://www.int ttp://www.inte World Wide Web tp://www.intel p://www.intel. (~1PB) ://www.intel.c //www.intel.co
Estimated On-line RAM in Google

wiki wiki Wikipedia iki (10GB) w wiki ki wiki wi 100% CAGR i wiki wik
Personal Digital Photos

Internet Archive

(300PB+)
200 of London s Trafc Cams

(1PB+)
2004 Walmart Transaction DB

(8PB)
Typical Oil Company

(1000PB+)
100% CAGR
Merck Bio Research DB

(8TB/day)
UPMC Hospitals Imaging Data

(500TB)

(350TB+)

(1.5TB/qtr)
One Day of Instant Messaging in 2002

MIT Babytalk Terashake Earthquake Speech Experiment Model of LA Basin

(500TB/yr)

(1.4PB)

(1PB)

(750GB)

Phillip Gibbons, Intel Research Pittsburgh, 2008

Simulations involving integration of ODEs (e.g. molecular dynamics) or PDEs (e.g. CFD, structural mechanics, weather and climate modeling) may involve modest amounts of input data, but end up generating large amounts of output 4D data sets proportional to problem size x number of time steps

Many problems in domains such as graph algorithms, de novo sequence assembly, and quantum chemistry require intermediate les that are disproportionately large relative to the size of the input/output les

See Pearce presentation on Tuesday morning Pfeiffer presentation on Wed morning

Generic compute node / terminology

Core

Processor (Socket) Node (Board)

Peak = nodes !

processors cores flops ! ! clock speed ! node processor cycle

Memory intensive computing

No uniform denition for memory intensive computing, but here we use it to refer to problems that require more shared memory than is available on standard compute hardware
machine Kraken Ranger Lonestar4 Gordon (1/1/12) Athena Trestles Steele Condor Pool Lincoln Blacklight peak (TF) nodes 1174 579 302 > 200 166 100 66 60 48 37 9408 3936 1888 1024 4152 324 893 1750 192 256 mem (TB) 147 123 44 64 16 20 28 27 3 32 mem/node (GB) 16 32 24 64 64 (512, 1024, 2048, ) 4 64 16-32 0.5-32 16 128 128 (16384)

Most HPC systems are designed for distributed memory applications. Data structures decomposed into chunks that are assigned to distinct compute nodes, each with their own local memory

Why not just use a distributed memory model?

Data structures
Many important/interesting problems do not have data structures that map well to distributed memory graphs, trees, unstructured grids

Programmer effort
In some cases, the burden to develop distributed memory application (e.g. using MPI) is too great to justify the effort

Efciency of implementation
Sometimes the communications overhead for a distributed memory implementation is too high and results in poor performance

OpenMP
Thread based parallelism that employs a fork-join model. Straightforward to use and requires minimal code modication. Addition of pragmas or directives that are ignored unless compiler ags are set. Allows for incremental parallelization of code; ideal for loop level parallelism

#pragma omp parallel for \ reduction(+: sum) \ schedule(static, 10) for (i=0; i<n; i++) { a[i] = b[i] + c[i] sum += a[i] }

!$OMP parallel do !$OMP& reduction(+: sum) !$OMP& schedule(static, 10) do i=1,n a(i) = b(i) + c(i) sum = sum + a(i) enddo

In my opinion, this is THE best place to learn more about OpenMP https://computing.llnl.gov/tutorials/openMP/

Memory intensive problem conformational sampling Generation of molecular conformations from low order probability distribution functions (PDFs) makes it possible to calculate
thermodynamic quantities that are not accessible from MD

Somani, Killian, and Gilson, J Chem Phys 130 (2009) Somani and Gilson, J Chem Phys 134 (2011)

Conformational sampling data structures and memory access

Data structures for N deg. of freedom and M bins Singlet PDFs: N blocks of size M Doublet PDFs: N(N-1)/2 blocks of size M2 Triplet PDFs: N(N-1)(N-2)/6 blocks of size M3 Typically N ~50-200, M=30 Sample rows (pencils) from 2D (3D) arrays, with different access pattern for each conformation. Convenient to have entire problem in single shared memory. For N=200, M=30, required memory ~ 130 GB

Doublet sampling with N=6, M=5

Conformation 1

Conformation 2

Conformation 3

Gilson, Somani, and Sinkovits Dash/Gordon collaboration

Memory intensive problem subset removal algorithm Transient objects detected in night sky by Large Synoptic
Survey Telescope (LSST) Detections grouped into tracks that may represent partial orbits of asteroids or other near earth object As candidate tracks are constructed, nd that some tracks are wholly contained within others - subset removal algorithm used to detect and delete these tracks In addition to avoiding duplications, reduces computational load in later steps of the pipeline

See Myers presentation on Wed morning

subset removal data structure

Detections are stored in redblack tree to minimize access time. Tracks associated with each detection are also stored as trees, resulting in a tree-oftrees data structure. Subset removal algorithm is most efcient if entire data structure is stored in shared memory. For realistic problems, memory footprint ~ 100 GB

See Myers presentation on Wed morning

Blurring the boundary between data and memory intensive computing

/scratch does not necessarily have to be hard disk (hdd) To improve performance, can write scratch les to ash drives - O(102) lower latency than hdd To do even better, can write les to DRAM O(103-105) lower latency than hdd
See Strande presentation on Mon afternoon and Tatineni presentations on Tuesday morning

How Gordon can help you solve data/memory intensive problems

Flash memory 80 GB local to each compute node (1024 nodes) 4.8 TB served from each I/O node (64 nodes) Parallel le systems Lustre le system with 4 PB capacity 100 GB/s aggregate bandwidth into I/O nodes vSMP Aggregate memory - 512, 1024, 2048 GB Allocations and operations Dedicated long-term access to I/O nodes Interactive queues for visualization Software Advanced User Support Services
See Strande presentation on Monday afternoon Wilkins-Diehr presentation on Thursday morning

For data intensive applications, the main advantage of ash is the low latency

Performance of the memory subsystem has not kept up with gains in processor speed As a result, latencies to access data from hard disk are O(10,000,000) cycles Flash memory lls this gap and provides O(100) lower latency As new non-volatile memories are developed, they can ll the role of ash

Using ash in memory hierarchy Parallel Streamline Visualization

Camp et al, accepted to IEEE Symp. on Large-Scale Data Analysis and Visualization (LDAV 2011) See Camp presentation on Tuesday morning

Introduction to vSMP

N x Servers 16 x 16 x 16 x 16 x OS 16 x OS 16 x OS 16 x OS 16 x OS 16 x OS OS OS N OS x OS 1 VM

1 OS

Virtualization software for aggregating multiple off-the-shelf systems into a single virtual machine, providing improved usability and higher performance

PARTITIONING
Subsetofthephysicalresource
Virtual Machines App OS App OS App OS

AGGREGATION
Concatena1onofphysicalresources
Virtual Machine App OS

Hypervisor or VMM
Hypervisor or VMM Hypervisor or VMM Hypervisor or VMM Hypervisor or VMM

See Paikowsky presentation on Wed afternoon

vSMP node congured from 16 compute nodes and one I/O node
vSMP node

To user, logically appears as a single, large SMP node

Overview of a vSMP node

/proc/cpuinfo indicates 128 processors (16 nodes x 8 cores/node = 128)

Top shows 663 GB memory (16 nodes x 48 GB/node = 768 GB) Difference due to vSMP overhead

Gordon Software
chemistry adf amber gamess gaussian gromacs lammps namd nwchem
distributed computing globus Hadoop MapReduce visualization idl NCL paraview tecplot visit VTK genomics abyss blast hmmer soapdenovo velvet data mining IntelligentMiner RapidMiner RATTLE Weka

compilers/languages gcc, intel, pgi MATLAB, Octave, R PGAS (UPC) DB2, PostgreSQL

libraries ATLAS BLACS fftw HDF5 Hypre SPRNG superLU

* Partial list of software to be installed, open to user requests

From I/O bound to compute bound Breadth First Search

MR-BFS serial performance 134217726 nodes
3000

2500

I/O time non-I/O time

2000 t (s)

1500

1000

500

0 SDDs HDDs

Implementation of Breadth-rst search (BFS) graph algorithm developed by Munagala and Ranade Benchmark problem: BFS on graph containing 134 million nodes Use of ash drives reduced I/O time by factor of 6.5x. As expected, no measurable impact on non-I/O operations Problem converted from I/O bound to compute bound

Using ash to improve serial performance LIDAR

4000

3500

3000

SSDs HDDs

2500

2000

1500

1000

Remote sensing technology used to map geographic features with high resolution Benchmark problem: Load 100 GB data into single table, then count rows. DB2 database instance Flash drives 1.5x (load) to 2.4x (count) faster than hard disks

t (s)

500

0 100GB Load 100GB Load 100GB Count(*) 100GB Count(*) FastParse Cold Warm

Using ash to improve concurrency LIDAR 1200

1000

SSDs HDDs

800

600

400

200

0 1 Concurrent 4 Concurrent 8 Concurrent

Remote sensing technology used to map geographic features with high resolution Comparison of runtimes for concurrent LIDAR queries obtained with ash drives (SSD) and hard drives (HDD) using the Alaska Denali-Totschunda data collection. Impact of SSDs was modest, but signicant when executing multiple simultaneous queries

t (s)

vSMP case study MOPS (subset removal)

16 8

MOPSsubsetremoval 79,684,646tracks
vSMP(3.5.175.22dyn) vSMP(3.5.175.17dyn) vSMP(3.5.175.17stat) PDAF

Rela%vespeed

Results at higher thread counts to be shown Wed.

0.5 1 2 4 8 16 32

cores

Total memory usage ~ 100 GB (3 boards) See Myers presentation on Wed morning

Sets of detections collected using the Large Synoptic Survey Telescope are grouped into tracks representing potential asteroid orbits Subset removal algorithm used to identify and eliminate those tracks that are wholly contained within other tracks 7.3x speedup on 8 cores Better performance and scaling ( 8 threads) than physical large memory PDAF node

Summary
Gordon can be used to solve data and memory intensive problems that cannot be handled by even the largest HPC systems Already having great success stories with Dash, but things will only get better with improved processors, interconnect, ash drive, and vSMP software SDSC provides much more than cycles - our expertise can help you make the most of Gordon and enable transitions from desktop to supercomputing

Tesseract Pim Architecture For Graph Processing - Isca15
No ratings yet
Tesseract Pim Architecture For Graph Processing - Isca15
13 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
Data Intensive Computing
No ratings yet
Data Intensive Computing
18 pages
Architecture-Conscious Data Mining
No ratings yet
Architecture-Conscious Data Mining
16 pages
Max Core Value & Performance
No ratings yet
Max Core Value & Performance
7 pages
STL For Large Data Management
No ratings yet
STL For Large Data Management
51 pages
Data-Oriented Design and C++ - Mike Acton - CppCon 2014
100% (1)
Data-Oriented Design and C++ - Mike Acton - CppCon 2014
201 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
Modern C++ Performance Optimization
No ratings yet
Modern C++ Performance Optimization
92 pages
Exploring DRAM Cache Prefetching For Pooled Memory
No ratings yet
Exploring DRAM Cache Prefetching For Pooled Memory
12 pages
High Performance Computing Labs & Concepts
No ratings yet
High Performance Computing Labs & Concepts
5 pages
Marc Snir NGDM07
No ratings yet
Marc Snir NGDM07
36 pages
Dic PLB L1
No ratings yet
Dic PLB L1
64 pages
Moving Processing To Data: On The Influence of Processing in Memory On Data Management
No ratings yet
Moving Processing To Data: On The Influence of Processing in Memory On Data Management
21 pages
OpenMP 4.0: GPU Programming Shift
No ratings yet
OpenMP 4.0: GPU Programming Shift
128 pages
Pedestrian Detection Using FPGA
No ratings yet
Pedestrian Detection Using FPGA
20 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
To HPC With MPI For Data Science: Frank Nielsen
No ratings yet
To HPC With MPI For Data Science: Frank Nielsen
304 pages
XSEDE15 Part1 Intro
No ratings yet
XSEDE15 Part1 Intro
101 pages
Reconfigurable Dataflow Graphs For Processing-In-memory
No ratings yet
Reconfigurable Dataflow Graphs For Processing-In-memory
11 pages
Operating Systems: Memory Management (Chapter 8: 8.1-8.6)
No ratings yet
Operating Systems: Memory Management (Chapter 8: 8.1-8.6)
48 pages
Datascience Unit3
No ratings yet
Datascience Unit3
19 pages
Algorithm Performance On Modern Architectures
100% (5)
Algorithm Performance On Modern Architectures
7 pages
High Performance Computing: 772 10 91 Thomas@chalmers - Se
No ratings yet
High Performance Computing: 772 10 91 Thomas@chalmers - Se
75 pages
Advanced Optimization Techniques
No ratings yet
Advanced Optimization Techniques
30 pages
Operating System Assignment PDF
No ratings yet
Operating System Assignment PDF
12 pages
Performance Optimization Insights
No ratings yet
Performance Optimization Insights
104 pages
Kokkos for C++ HPC Developers
No ratings yet
Kokkos for C++ HPC Developers
322 pages
Using-Modern-Cpp-Techniques-To-Enhance-Multicore-Optimizations - Das's Edution
No ratings yet
Using-Modern-Cpp-Techniques-To-Enhance-Multicore-Optimizations - Das's Edution
17 pages
Paper 2 PDF
No ratings yet
Paper 2 PDF
29 pages
Understanding CUDA Programming Basics
No ratings yet
Understanding CUDA Programming Basics
101 pages
Opreting System
No ratings yet
Opreting System
18 pages
Advanced Python and HPC Optimization
No ratings yet
Advanced Python and HPC Optimization
70 pages
User's Guide of Med Memory V 3.2
No ratings yet
User's Guide of Med Memory V 3.2
67 pages
CPS - Data Intensive Distributed Computing
No ratings yet
CPS - Data Intensive Distributed Computing
11 pages
High Performance Computing For Computational Mechanics: ISCM-10
No ratings yet
High Performance Computing For Computational Mechanics: ISCM-10
63 pages
In-Memory Computing Overview and Applications
No ratings yet
In-Memory Computing Overview and Applications
16 pages
Book 9 Advanced Topics
No ratings yet
Book 9 Advanced Topics
204 pages
Big Data: Ghislain Fourny
No ratings yet
Big Data: Ghislain Fourny
120 pages
Data Parallelism
No ratings yet
Data Parallelism
33 pages
First Steps in Using SDFs for Edits
No ratings yet
First Steps in Using SDFs for Edits
102 pages
Cloud COMPUTING Module 4
No ratings yet
Cloud COMPUTING Module 4
50 pages
Lec01 1 Introduction
No ratings yet
Lec01 1 Introduction
36 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
ECE408 MT2 Review FA24
No ratings yet
ECE408 MT2 Review FA24
58 pages
p120 Protien Memory
No ratings yet
p120 Protien Memory
12 pages
Ruud Van Der Pas - Eric Stotzer - Christian Terboven - Using Openmp - The Next Step - Affinity, Accelerators, Tasking, and Simd (2017, Mit Press) PDF
No ratings yet
Ruud Van Der Pas - Eric Stotzer - Christian Terboven - Using Openmp - The Next Step - Affinity, Accelerators, Tasking, and Simd (2017, Mit Press) PDF
381 pages
4 Memory Models
No ratings yet
4 Memory Models
19 pages
Memory Devices and Applications For In-Memory Computing
No ratings yet
Memory Devices and Applications For In-Memory Computing
16 pages
HPC Unit 1 Solution
No ratings yet
HPC Unit 1 Solution
8 pages
MSc Computer Science Project List
No ratings yet
MSc Computer Science Project List
26 pages
Hai Jin
No ratings yet
Hai Jin
56 pages
Neural Network Accelerators: CS223 Computer Architecture & Organization
No ratings yet
Neural Network Accelerators: CS223 Computer Architecture & Organization
45 pages
002 IntroHPC
No ratings yet
002 IntroHPC
33 pages
Memory Hierarchy and Access Patterns
No ratings yet
Memory Hierarchy and Access Patterns
32 pages
EE5902R Chapter 1 Slides
No ratings yet
EE5902R Chapter 1 Slides
46 pages
L09 AddressTranslation
No ratings yet
L09 AddressTranslation
39 pages
Introduction To Parallel Computing: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Introduction To Parallel Computing: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
15 pages
Mcqs All Computer
No ratings yet
Mcqs All Computer
88 pages
Class XI Computer Science Study Material
No ratings yet
Class XI Computer Science Study Material
138 pages
C++ Short Note
No ratings yet
C++ Short Note
6 pages
Computer Basics Worksheet
100% (1)
Computer Basics Worksheet
8 pages
ASP Net Questions
No ratings yet
ASP Net Questions
12 pages
MCQ C Plus Plus First Set
0% (1)
MCQ C Plus Plus First Set
13 pages
CS506 HandOuts
No ratings yet
CS506 HandOuts
491 pages
CS602 Short Questions Answer
100% (1)
CS602 Short Questions Answer
13 pages
Operating System Short Questions and Answers
40% (5)
Operating System Short Questions and Answers
62 pages
CS601 Data Communication MCQs Guide
No ratings yet
CS601 Data Communication MCQs Guide
54 pages
Software Engineering MCQ Exam
100% (2)
Software Engineering MCQ Exam
8 pages
Software Engineering MCQ Exam
100% (2)
Software Engineering MCQ Exam
8 pages
Computer Networking Short Questions and Answers
67% (3)
Computer Networking Short Questions and Answers
81 pages
What Is Democratic Education
No ratings yet
What Is Democratic Education
5 pages
Psychology: Growth & Development
No ratings yet
Psychology: Growth & Development
13 pages
Unit II and III - WT
No ratings yet
Unit II and III - WT
54 pages
Principles of School Management
100% (6)
Principles of School Management
9 pages
Bank Examination Question Papers - Probationary - Clerical - Specialist Officers
No ratings yet
Bank Examination Question Papers - Probationary - Clerical - Specialist Officers
10 pages
Software Testing Interview Guide
No ratings yet
Software Testing Interview Guide
5 pages
Definition and Purpose of School Discipline
67% (3)
Definition and Purpose of School Discipline
2 pages
Mcqs All Computer
No ratings yet
Mcqs All Computer
88 pages
Multiple Choice Questions C Language
100% (1)
Multiple Choice Questions C Language
7 pages
Classroom Management and Organization: Jennelyn Castro
No ratings yet
Classroom Management and Organization: Jennelyn Castro
20 pages
Mcqs All Computer
No ratings yet
Mcqs All Computer
88 pages
Pakistan Affairs MCQs 2005-2007
No ratings yet
Pakistan Affairs MCQs 2005-2007
19 pages
ASP Net Questions
No ratings yet
ASP Net Questions
12 pages
Educational Measurement and Evaluation Guide
100% (13)
Educational Measurement and Evaluation Guide
53 pages
Programming Language - Wikipedia, The Free Encyclopedia
No ratings yet
Programming Language - Wikipedia, The Free Encyclopedia
18 pages
Arduino Cheat Sheet PDF
No ratings yet
Arduino Cheat Sheet PDF
1 page
Symfony 2.3 Cookbook Guide
No ratings yet
Symfony 2.3 Cookbook Guide
361 pages
Asynchronous FIFO
No ratings yet
Asynchronous FIFO
3 pages
7.3. Objectives of Distributed Transaction Management
No ratings yet
7.3. Objectives of Distributed Transaction Management
2 pages
Oracle eAM Implementation Guide
No ratings yet
Oracle eAM Implementation Guide
14 pages
ENG - Samsung Drive Manager User's Manual Ver 2.6
No ratings yet
ENG - Samsung Drive Manager User's Manual Ver 2.6
120 pages
Kali Linux Reaver Setup Untuk Beginner
No ratings yet
Kali Linux Reaver Setup Untuk Beginner
2 pages
Check Point Secure Gateway R80 10 RSA SecurID Access
No ratings yet
Check Point Secure Gateway R80 10 RSA SecurID Access
28 pages
Cloud Identity Patterns and Strategies
No ratings yet
Cloud Identity Patterns and Strategies
16 pages
Aspiring Software Engineer's Profile
No ratings yet
Aspiring Software Engineer's Profile
1 page
Computer Architecture Midterm Exam 2013
No ratings yet
Computer Architecture Midterm Exam 2013
9 pages
C For Scientists and Engineers Bronson 1993 PDF
100% (1)
C For Scientists and Engineers Bronson 1993 PDF
664 pages
Maximo Integration Framework - Architecture1.0
No ratings yet
Maximo Integration Framework - Architecture1.0
17 pages
Phishing Defense in Social Networks
No ratings yet
Phishing Defense in Social Networks
20 pages
Assignment 01
No ratings yet
Assignment 01
4 pages
An Examination of The Transition of The Arjuna Distributed Transaction Processing Software From Research To Products
No ratings yet
An Examination of The Transition of The Arjuna Distributed Transaction Processing Software From Research To Products
12 pages
T-Setting Up Myq-See Ddns
No ratings yet
T-Setting Up Myq-See Ddns
13 pages
Baseworksheet PDF
No ratings yet
Baseworksheet PDF
4 pages
Cambridge International As and A Level Computer Science Coursebook
95% (38)
Cambridge International As and A Level Computer Science Coursebook
451 pages
Mca Calicut 4 To 6 Semester
No ratings yet
Mca Calicut 4 To 6 Semester
26 pages
PIC18Fxxx Comprehensive Tutorial Containing 7Mb of Info
100% (4)
PIC18Fxxx Comprehensive Tutorial Containing 7Mb of Info
278 pages
How To Create A Login Activity - Sketchware - Medium
100% (1)
How To Create A Login Activity - Sketchware - Medium
16 pages
Intel Nehalem Core Architecture
No ratings yet
Intel Nehalem Core Architecture
123 pages
Seminar Topics For Mca Students
No ratings yet
Seminar Topics For Mca Students
3 pages
Siemens TNMS (Technical Descriptions)
100% (4)
Siemens TNMS (Technical Descriptions)
41 pages
IUST v19n1p29 en
No ratings yet
IUST v19n1p29 en
4 pages
CSO-102: Data Structures - Tasks 1 & 2.: Input
No ratings yet
CSO-102: Data Structures - Tasks 1 & 2.: Input
2 pages
Types of Intruders in Network Security
No ratings yet
Types of Intruders in Network Security
27 pages

Introduction To Data and Memory Intensive Computing

Uploaded by

Introduction To Data and Memory Intensive Computing

Uploaded by

Introduction to Data and Memory Intensive Computing

Early success stories

Data intensive computing

See Leetaru presentation on Thursday morning

Human Genomics Particle Physics Large Hadron Collider

MIT Babytalk Terashake Earthquake Speech Experiment Model of LA Basin

Phillip Gibbons, Intel Research Pittsburgh, 2008

See Pearce presentation on Tuesday morning Pfeiffer presentation on Wed morning

Generic compute node / terminology

Processor (Socket) Node (Board)

processors cores flops ! ! clock speed ! node processor cycle

Memory intensive computing

Why not just use a distributed memory model?

Conformational sampling data structures and memory access

Doublet sampling with N=6, M=5

Gilson, Somani, and Sinkovits Dash/Gordon collaboration

See Myers presentation on Wed morning

subset removal data structure

See Myers presentation on Wed morning

Blurring the boundary between data and memory intensive computing

How Gordon can help you solve data/memory intensive problems

Using ash in memory hierarchy Parallel Streamline Visualization

See Paikowsky presentation on Wed afternoon

To user, logically appears as a single, large SMP node

Overview of a vSMP node

Overview of a vSMP node

/proc/cpuinfo indicates 128 processors (16 nodes x 8 cores/node = 128)

libraries ATLAS BLACS fftw HDF5 Hypre SPRNG superLU

* Partial list of software to be installed, open to user requests

From I/O bound to compute bound Breadth First Search

I/O time non-I/O time

Using ash to improve serial performance LIDAR

Using ash to improve concurrency LIDAR 1200

0 1 Concurrent 4 Concurrent 8 Concurrent

vSMP case study MOPS (subset removal)

Results at higher thread counts to be shown Wed.

You might also like

MIT Babytalk Terashake Earthquake Speech Experiment Model of LA Basin