Multiprocessors and Thread

This document discusses thread-level parallelism and cache coherence in multiprocessor systems. It describes how MIMD multiprocessors can execute multiple threads or processes simultaneously by distributing work across multiple processors. It then focuses on symmetric shared-memory architectures and cache coherence issues that arise when multiple processors cache and modify shared data. Cache coherence protocols like directory-based and snooping approaches are described to maintain a consistent view of shared data across caches.

Uploaded by

helloansuman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views4 pages

Multiprocessors and Thread

Uploaded by

helloansuman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 4

Multiprocessors and Thread-Level Parallelism

With an MIMD, each processor is executing its own instruction stream. In many cases, each processor executes a different process. A process is a segment of code that may be run independently; the state of the process contains all the information necessary to execute that program on a processor. In a multi-programmed environment, where the processors may be running independent tasks, each process is typically independent of other processes. It is also useful to be able to have multiple processors executing a single program and sharing the code and most of their address space. When multiple processes share code and data in this way, they are often called threads. To take advantage of an MIMD multiprocessor with n processors, we must usually have at least n threads or processes to execute. The independent threads within a single process are typically identied by the programmer or created by the compiler. The threads may come from large-scale, independent processes scheduled and manipulated by the operating system. Existing MIMD multiprocessors fall into two classes- The rst group, which we call centralized shared-memory architectures. With large caches, a single memory, possibly with multiple banks, can satisfy the memory demands of a small number of processors. Because there is a single main memory that has a symmetric relationship to all processors and a uniform access time from any processor, these multiprocessors are most often called symmetric (shared-memory) multiprocessors (SMPs), and this style of architecture is sometimes called uniform memory access (UMA).

For a multiprocessor with a shared address space, that address space can be used to communicate data implicitly via load and store operations, hence the name shared memory for such multiprocessors. For a multiprocessor with multiple address spaces, communication of data is done by explicitly passing messages among the processors. Therefore, these multiprocessors are often called message-passing multiprocessors.

Symmetric Shared-Memory Architectures

The use of large, multilevel caches can substantially reduce the memory bandwidth demands of a processor. If the main memory bandwidth demands of a single processor are reduced, multiple processors may be able to share the same memory. Symmetric shared-memory machines usually support the caching of both shared and private data. Private data are used by a single processor, while shared data are used by multiple processors; essentially providing communication among the processors through reads and writes of the shared data. When a private item is cached, its location is migrated to the cache, reducing the average access time as well as the memory bandwidth required. Since no other processor uses the data, the program behavior is identical to that in a uniprocessor. When shared data are cached, the shared value may be replicated in multiple caches. In addition to the reduction in access latency and required memory bandwidth, this replication also provides a reduction in contention that may exist for shared data items that are being read by multiple processors simultaneously. Caching of shared data, however, introduces a new problem: cache coherence.

What Is Multiprocessor Cache Coherence?

Caching shared data introduces a new problem because the view of memory held by two different processors can have two different values for the same location. This difculty is generally referred to as the cache coherence problem. The protocols to maintain coherence for multiple processors are called cache coherence protocols. There are two classes of protocols which use different techniques to track the sharing status, in use:
1. Directory based- The sharing status of a block of physical memory is kept in just one

location, called the directory. Directory-based coherence has slightly higher implementation overhead than snooping, but it can scale to larger processor counts.
2. Snooping- Every cache that has a copy of the data from a block of physical memory

also has a copy of the sharing status of the block, but no centralized state is kept. The caches are all accessible via some broadcast medium (a bus or switch), and all cache controllers monitor or snoop on the medium to determine whether or not they have a copy of a block that is requested on a bus or switch access.

Snooping Protocols
There are two ways to maintain the coherence requirement described in the prior subsection. One method is to ensure that a processor has exclusive access to a data item before it writes that item. This style of protocol is called a write invalidate protocol because it invalidates other copies on a write. It is by far the most common protocol, both for snooping and for directory schemes. Exclusive access ensures that no other readable or writable copies of an item exist when the write occurs: All other cached copies of the item are invalidated.

The alternative to an invalidate protocol is to update all the cached copies of a data item when that item is written. This type of protocol is called a write update or writes broadcast protocol. Because a write update protocol must broadcast all writes to shared cache lines, it consumes considerably more bandwidth. For this reason, all recent multiprocessors have opted to implement a write invalidate protocol, and we will focus only on invalidate protocols for the rest of the chapter. Basic Implementation Techniques The key to implementing an invalidate protocol in a small-scale multiprocessor is the use of the bus, or another broadcast medium, to perform invalidates. To perform an invalidate, the processor simply acquires bus access and broadcasts the address to be invalidated on the bus. All processors continuously snoop on the bus, watching the addresses. The processors check whether the address on the bus is in their cache. If so, the corresponding data in the cache are invalidated. When a write to a block that is shared occurs, the writing processor must acquire bus access to broadcast its invalidation. If two processors attempt to write shared blocks at the same time, their attempts to broadcast an invalidate operation will be serialized when they arbitrate for the bus. The rst processor to obtain bus access will cause any other copies of the block it is writing to be invalidated. If the processors were attempting to write the same block, the serialization enforced by the bus also serializes their writes. One implication of this scheme is that a write to a shared data item cannot actually complete until it obtains bus access. All coherence schemes require some method of serializing accesses to the same cache block, either by serializing access to the communication medium or another shared structure. In addition to invalidating outstanding copies of a cache block that is being written into, we also need to locate a data item when a cache miss occurs. In a write-through cache, it is easy to nd the recent value of a data item, since all written data are always sent to the memory, from which the most recent value of a data item can always be fetched. For a write-back cache, the problem of nding the most recent data value is harder, since the most recent value of a data item can be in a cache rather than in memory. Happily, write-back caches can use the same snooping scheme both for cache misses and for writes: Each processor snoops every address placed on the bus. If a processor nds that it has a dirty copy of the requested cache block, it provides that cache block in response to the read request and causes the memory access to be aborted. The normal cache tags can be used to implement the process of snooping, and the valid bit for each block makes invalidation easy to implement. Read misses, whether generated by invalidation or by some other event, are also straightforward since they simply rely on the snooping capability. For writes wed like to know whether any other copies of the block are cached because, if there are no other cached copies, then the write need not be placed on the bus in a write-back cache. Not sending the write reduces both the time taken by the write and the required bandwidth.

To track whether or not a cache block is shared, we can add an extra state bit associated with each cache block, just as we have a valid bit and a dirty bit. By adding a bit indicating whether the block is shared, we can decide whether a write must generate an invalidate. When a write to a block in the shared state occurs, the cache generates invalidation on the bus and marks the block as exclusive. No further invalidations will be sent by that processor for that block. The processor with the sole copy of a cache block is normally called the owner of the cache block. When invalidation is sent, the state of the owners cache block is changed from shared to unshared (or exclusive). If another processor later requests this cache block, the state must be made shared again. Since our snooping cache also sees any misses, it knows when the exclusive cache block has been requested by another processor and the state should be made shared.

Nec Fip Display
100% (1)
Nec Fip Display
21 pages
Snooping Cache and Directory Based Multiprocessors
No ratings yet
Snooping Cache and Directory Based Multiprocessors
59 pages
2.Symmetric Shared Memory Architectures
No ratings yet
2.Symmetric Shared Memory Architectures
12 pages
MODULE 4 hpc
No ratings yet
MODULE 4 hpc
41 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
46 pages
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
No ratings yet
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
11 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Multiprocessor Cache Coherence
No ratings yet
Multiprocessor Cache Coherence
13 pages
MN Cache Coherence
No ratings yet
MN Cache Coherence
11 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Distributed Shared Memory: Introduction & Thisis
No ratings yet
Distributed Shared Memory: Introduction & Thisis
22 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
Krishna M. Kavi The University of Alabama in Huntsville: Cache Memories
No ratings yet
Krishna M. Kavi The University of Alabama in Huntsville: Cache Memories
5 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
Computer Architecture Assignment 3 (ARCH)
No ratings yet
Computer Architecture Assignment 3 (ARCH)
9 pages
Unit 4 - Advanced Computer Architecture - Www.rgpvnotes.in
No ratings yet
Unit 4 - Advanced Computer Architecture - Www.rgpvnotes.in
14 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
23 pages
CH20 COA11e
No ratings yet
CH20 COA11e
40 pages
Cache Coherence
No ratings yet
Cache Coherence
3 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
Cache Coherence_20250120_142158_0000
No ratings yet
Cache Coherence_20250120_142158_0000
34 pages
The MESI Protocol
100% (1)
The MESI Protocol
4 pages
comporg6_ch12
No ratings yet
comporg6_ch12
36 pages
2.4.6 Cache Coherence in Multiprocessor Systems
No ratings yet
2.4.6 Cache Coherence in Multiprocessor Systems
3 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages
Term Paper: Cahe Coherence Schemes
No ratings yet
Term Paper: Cahe Coherence Schemes
12 pages
Week 5
No ratings yet
Week 5
52 pages
Lec13 Multiprocessors
No ratings yet
Lec13 Multiprocessors
69 pages
Shared Memory Architectures
No ratings yet
Shared Memory Architectures
34 pages
Cache Coherence: Caches Memory Coherence Caches Multiprocessing
No ratings yet
Cache Coherence: Caches Memory Coherence Caches Multiprocessing
4 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
Shared-Memory Multiprocessors - Symmetric Multiprocessing Hardware
No ratings yet
Shared-Memory Multiprocessors - Symmetric Multiprocessing Hardware
7 pages
CS 523 Advanced Computer Architecture: Introduction To Cache Coherence Protocols
No ratings yet
CS 523 Advanced Computer Architecture: Introduction To Cache Coherence Protocols
24 pages
18bce2429 Da 2 Cao
No ratings yet
18bce2429 Da 2 Cao
13 pages
Term Paper: Computer Organization and Architecure (Cse211)
No ratings yet
Term Paper: Computer Organization and Architecure (Cse211)
7 pages
Cache Coherence Protocols: Evaluation Using A Multiprocessor Simulation Model
No ratings yet
Cache Coherence Protocols: Evaluation Using A Multiprocessor Simulation Model
26 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Cache Coherence and Synchronization - Tutorialspoint
No ratings yet
Cache Coherence and Synchronization - Tutorialspoint
7 pages
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
No ratings yet
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
54 pages
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
No ratings yet
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
25 pages
25895
No ratings yet
25895
4 pages
CSA Mod 3-Part 2 Notes (Cache Coherence)
No ratings yet
CSA Mod 3-Part 2 Notes (Cache Coherence)
19 pages
ARM Multi Core Processing
No ratings yet
ARM Multi Core Processing
38 pages
IJARCCE-46_cachemesiwithverilog
No ratings yet
IJARCCE-46_cachemesiwithverilog
5 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
Cache Coherence (Part 1)
No ratings yet
Cache Coherence (Part 1)
13 pages
Module 4
No ratings yet
Module 4
40 pages
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
No ratings yet
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
21 pages
Coherence
No ratings yet
Coherence
16 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Optimized Caching Techniques: Application for Scalable Distributed Architectures
From Everand
Optimized Caching Techniques: Application for Scalable Distributed Architectures
Peter Jones
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet
JavaScript File Handling from Scratch: A Practical Guide with Examples
From Everand
JavaScript File Handling from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Invoice: SL No. 1 Amd Cpu
No ratings yet
Invoice: SL No. 1 Amd Cpu
2 pages
Apexel India 25, First Floor, 4th Cross Left, Pasumpon Nagar Karaikal, Puducherry, India 609602 Phone Num - 8895590157
No ratings yet
Apexel India 25, First Floor, 4th Cross Left, Pasumpon Nagar Karaikal, Puducherry, India 609602 Phone Num - 8895590157
1 page
Synopsis Seminar
No ratings yet
Synopsis Seminar
1 page
Advanced Computer Architecture Test-2 Answer
No ratings yet
Advanced Computer Architecture Test-2 Answer
3 pages
Advanced Computer Architecture Test-1 Answer
No ratings yet
Advanced Computer Architecture Test-1 Answer
2 pages
Business Ethics
No ratings yet
Business Ethics
36 pages
Applicant Memorial
No ratings yet
Applicant Memorial
20 pages
Rdad 005
No ratings yet
Rdad 005
28 pages
CSharp For Sharp Kids - Part 5 Beyond This Book
No ratings yet
CSharp For Sharp Kids - Part 5 Beyond This Book
10 pages
G.R. No. 198688. November 24, 2020
No ratings yet
G.R. No. 198688. November 24, 2020
30 pages
Dingg-Pamphlet 2 - c2c - 27-8-2022 - Open Font - For Digital
No ratings yet
Dingg-Pamphlet 2 - c2c - 27-8-2022 - Open Font - For Digital
2 pages
Form 1 - Classroom Level: School Based Deworming Accomplishment Report (1St Round/) ELEMENTARY SY2022-2023
No ratings yet
Form 1 - Classroom Level: School Based Deworming Accomplishment Report (1St Round/) ELEMENTARY SY2022-2023
12 pages
LO1. Prepare-Stake Out Building Lines - Copy 1
100% (1)
LO1. Prepare-Stake Out Building Lines - Copy 1
13 pages
The FORCE Companion: Quick Tips and Tricks (Force Drawing Series) 1st Edition Mattesi pdf download
No ratings yet
The FORCE Companion: Quick Tips and Tricks (Force Drawing Series) 1st Edition Mattesi pdf download
63 pages
Leadership Portrait Template
No ratings yet
Leadership Portrait Template
1 page
DATA Act of 2013
No ratings yet
DATA Act of 2013
29 pages
IELTS Reading General Training Session 1
No ratings yet
IELTS Reading General Training Session 1
3 pages
Council Meeting July 5, 2016 PDF
No ratings yet
Council Meeting July 5, 2016 PDF
47 pages
Bipap
No ratings yet
Bipap
26 pages
Rain Dance by Alice Gomez and Marilyn Rife PDF
No ratings yet
Rain Dance by Alice Gomez and Marilyn Rife PDF
6 pages
People Vs Navarro - 96229 - March 25, 1997 - J PDF
No ratings yet
People Vs Navarro - 96229 - March 25, 1997 - J PDF
7 pages
Institute For Steel Development & Growth (Insdag)
No ratings yet
Institute For Steel Development & Growth (Insdag)
1 page
CBSE-XII-NCERT-CHEM-CH-2.ELECTROCHEMISTRY-TEST-1
No ratings yet
CBSE-XII-NCERT-CHEM-CH-2.ELECTROCHEMISTRY-TEST-1
1 page
Fraud and Misrepresentation
50% (2)
Fraud and Misrepresentation
32 pages
Conbextra EP Data Sheet
No ratings yet
Conbextra EP Data Sheet
4 pages
Designof Damsfor Mining Industry 2017
No ratings yet
Designof Damsfor Mining Industry 2017
11 pages
Aircraft Ground Handling
No ratings yet
Aircraft Ground Handling
15 pages
SimboluriGraficeScheme PDF
No ratings yet
SimboluriGraficeScheme PDF
14 pages
Designer's Guide To EN 1997-1 Eurocode 7: Geotechnical Design - General Rules
No ratings yet
Designer's Guide To EN 1997-1 Eurocode 7: Geotechnical Design - General Rules
15 pages
Chumba-Casino-Sweepstake-Rules
No ratings yet
Chumba-Casino-Sweepstake-Rules
3 pages
Screenshot 2025-01-25 at 6.46.45 pm
No ratings yet
Screenshot 2025-01-25 at 6.46.45 pm
106 pages
Maintenance and Service Guide: HP 340 G7 Notebook PC HP 348 G7 Notebook PC
No ratings yet
Maintenance and Service Guide: HP 340 G7 Notebook PC HP 348 G7 Notebook PC
52 pages
M21B Package Unit (Including Shipped-Loose Items)
No ratings yet
M21B Package Unit (Including Shipped-Loose Items)
3 pages
A Smart Fault Detection System For Indian Railways: I. II. Literature Review
No ratings yet
A Smart Fault Detection System For Indian Railways: I. II. Literature Review
5 pages

Multiprocessors and Thread

Uploaded by

Multiprocessors and Thread

Uploaded by

Multiprocessors and Thread-Level Parallelism

Symmetric Shared-Memory Architectures

What Is Multiprocessor Cache Coherence?

You might also like