0% found this document useful (0 votes)
32 views6 pages

Cache Simulation Project Task Guide

Uploaded by

tornardoarda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views6 pages

Cache Simulation Project Task Guide

Uploaded by

tornardoarda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Practical Exercise – Project Task

Basic Laboratory: Computer Architecture

Cache Simulation and Analysis Project Task – Cache Associativity

1. Organizational Information

On the following pages, you will find the tasks for your project assignment for the practical
exercise.

Unlike in homework assignments, no modules are provided in this project. Discuss within
your group what level of abstraction is reasonable for implementation and collectively
discuss and finalize the design of the modules. The parts of the task that require C code
should be written in C according to the C17 standard. The parts of the task that require C++
code should be written in C++ according to the C++14 standard. The respective standard
libraries are part of the language specification and may also be used. SystemC version 2.3.3
should be used.

The submission deadline is Sunday, July 21, 2024, 23:59 CET. Submissions should be made
via Git to the project repository set up for your group on Artemis.

We wish you much success and enjoyment in working on your task!

Best regards, The Practical Exercise Management


2. Cache Simulation and Analysis

In modern high-performance computer systems, the speed of memory and processors plays
a crucial role in overall performance. A central problem here is the discrepancy between the
speed of memory accesses and the processing speed of the processors. This problem is
often referred to as the Von Neumann bottleneck, describing the limitation of system
performance by the limited bandwidth and relatively slow data transfer rates between the
processor and main memory.

Cache Memory

To mitigate the Von Neumann bottleneck, modern computer systems use various strategies,
including the use of smaller, faster intermediate storage: caches. Caches are special
memory areas that store frequently needed data and instructions, significantly reducing
access times. They act as a buffer between the fast processor and the relatively slow main
memory. The applications of caches extend beyond merely storing data in a program. Other
use cases include:

● Instruction Cache: The program itself is also data crucial for executing an operation.
Not only operands but also the operations themselves must be loaded from memory.
Therefore, it makes sense to cache instructions as well.
● Translation Lookaside Buffer (TLB): Modern multi-user systems use virtual
memory to isolate programs from each other, among other things. Virtual memory
addresses of applications are mapped to physical memory addresses. To keep this
abstraction transparent to the program, the system must translate the virtual address
into a physical address with every memory access. The translation table itself resides
in main memory and accessing it takes many cycles. Here, the Translation Lookaside
Buffer (TLB) comes into play, a special cache that speeds up address translation.
The TLB stores the results of previous address translations, making frequently used
memory addresses more quickly accessible.

Cache Architecture

The architecture of caches plays a central role in their performance. Important aspects
include latency, associativity, cache lines, and the size of cache lines.

● Latency: Latency refers to the delay that occurs when data is retrieved from the
cache. The lower the latency, the faster the needed data can be provided, improving
the overall system performance. Cache latencies depend, among other things, on the
associativity of the cache and the size of the cache. A distinction is made between
the latency of a hit (Hit-Latency) and the latency of a miss (Miss-Latency).
● Associativity: Associativity describes how flexibly a cache can store data. A fully
associative cache can store data at any location, while a direct-mapped cache can
only store data at a specific location. Higher associativity can improve the cache hit
rate but also increases complexity and potentially access time.
● Cache Lines: The cache is divided into several small blocks called cache lines. Each
cache line stores a specific data block from main memory. When the processor
accesses data, it checks if the data is present in one of the cache lines.
● Line Size: The size of the cache lines is also crucial. Larger cache lines can store
more data at once, which can be advantageous for sequential memory accesses.
However, this can also lead to more unnecessary memory accesses if the processor
frequently accesses non-sequential memory addresses.

3. Tasks

As part of your task, you will investigate the differences between a direct-mapped and a
four-way associative cache. Your tasks for the SystemC system design final project are
divided into research and implementation (practical) areas. Use your research results to
convince the audience of the relevance of the topic and to substantiate the correctness of
the implementation in your presentation. Provide a brief overview of the research results in
the Readme file. The answers to the implementation tasks will be reflected in your code. The
implementation, along with the presentation, forms the main part of the project. Use
abstraction wisely to simulate the system (memory cache - CPU). Document the personal
contribution of each group member in the Readme.md file.

3.1 Theoretical Part

Research typical sizes for direct-mapped and four-way associative caches as well as main
memory and cache latencies in modern processors. Examine the memory access behavior
of a memory-intensive algorithm. Create CSV files that exemplify the memory accesses of
the algorithm. Use your implementation to observe the behavior of the cache architectures
concerning access times. Use the CSV files to test the simulation and briefly discuss the
results. Document your results in the Readme.md file. Provide at least one case study of the
memory access pattern of your chosen algorithm.

3.2 Practical Part

To investigate the performance of caches, you should implement a cache simulation in


SystemC with C++ and a framework program in C. The simulation will be given parameters
describing the size and behavior of the cache and a list of memory accesses. Design an
appropriate structure/module architecture for your implementation.
Framework Program

The framework program should be implemented in C. Your framework program must be able
to process the following options and arguments when called. Your framework program
should exhibit similar characteristics to Linux command-line programs when processing
command-line arguments. For example, the order and spaces between options and
arguments are largely irrelevant. If possible, the program should define reasonable default
values so that not all options need to be set each time. Your framework program should
correctly handle and provide meaningful error messages for invalid or inappropriate options
in the context of your implementation. We recommend using getopt_long to parse the
command-line parameters. getopt_long also accepts options that are not fully spelled out.
Whether you want to use this behavior for your program is up to you.

● -c <number>/--cycles <number> — The number of cycles to simulate.


● --directmapped — Simulates a direct-mapped cache.
● --fourway — Simulates a four-way associative cache.
● --cacheline-size <number> — The size of a cache line in bytes.
● --cachelines <number> — The number of cache lines.
● --cache-latency <number> — The cache latency in cycles. Latency is
independent of hit or miss and read or write operations.
● --memory-latency <number> — The main memory latency in cycles. Read and
write operations have the same latency.
● --tf=<filename> — Output file for a trace file with all signals. If this option is not
set, no trace file should be created.
● <filename> — Positional argument: The input file containing the data to be
processed.
● -h/--help — Outputs a description of all program options and usage examples,
and then terminates the program.

You may implement additional options, such as predefined test cases. However, your
program must be usable with only the above options.

Input File

The input file is in CSV format. Each line represents a request. The first column indicates
whether it is a read or write access. The second column indicates the address being
accessed. The third column indicates the value to be written if it is a write access. The
address and value can be in decimal or hexadecimal representation. For a read access, the
value in the third column must be empty. Your implementation should terminate with a
meaningful error message on stderr and a brief usage explanation if files are read that do
not conform to this format.

Example input file:


3.3 Simulation

Implement the following function in C++:

The parameter cycles specifies the number of cycles to simulate. The parameter
directMapped indicates whether to simulate a direct-mapped cache. A value of 0
simulates a four-way associative cache, while all other values simulate a direct-mapped
cache. The parameters cacheLines and cacheLineSize specify the number of cache
lines and the size of a cache line in bytes. Your program must only support powers of two as
cache line sizes. The parameters memoryLatency and cacheLatency specify the latency
of the main memory and the cache in cycles. The cache should be write-through, meaning
write accesses are directly updated in the main memory. This is necessary as the simple
input format does not support operations that enforce writing to the main memory. Such
operations are necessary in practice, for example, to implement memory-mapped I/O. The
number of requests is stored in numRequests. The requests are stored in the array
requests. The parameter tracefile specifies the filename in which the trace file should
be stored. If tracefile is NULL, no trace file should be created.

The function performs a simulation of the system with the given arguments and returns a
struct with the results. The requests should be processed in the order they are passed in the
array requests. A new request should only be processed when the previous request is
completed. For read accesses, the read value should be written into the data field of the
request. The structs Request and Result are defined as follows:

The field we indicates whether it is a read or write access (0 for read access, 1 for write
access). The field data contains the value to be written if it is a write access. The field addr
contains the address being accessed. The data bus and address bus are 32 bits wide, as
can be seen from the struct. The memory is byte-addressable.
The function returns a struct with the results of the simulation. The cycles field should
specify the number of cycles needed to process all requests. If not all requests could be
processed within the given cycles, cycles should have the value SIZE_MAX. The fields
misses and hits should indicate the number of misses and hits, respectively. The
primitiveGateCount field is an estimate of the number of gates needed to implement the
system in hardware. This estimate should only count gates directly needed for the
accelerator, including the gates necessary for data storage and replacement strategy; main
memory and simulation logic should not be counted. The estimate does not need to be
exact. For example, storing one bit can be estimated to require 4 gates, and adding two
32-bit numbers can be estimated to require approximately 150 primitive gates.

You can assume that the passed Request structs are valid. All checks for invalid inputs
should already be performed in the framework program. Ensure that this function is callable
from C program code.

4. Overview of Files to be Submitted

● Readme.md - Documentation of the theoretical results and a brief section on each


group member's personal contribution.
● Makefile - Makefile for compiling an executable file.
● src/ - Directory for the source code.
○ C code for the framework program.
○ C++ code for the cache simulation in SystemC.
● examples/ - Directory for CSV case studies.
● slides/ - Directory for the presentation slides and other materials for the
presentation. During the presentation, only the file slides.pdf will be displayed on
the projector.
○ slides.pdf - Presentation slides in PDF format. This document will be
presented during the presentation.

You may create additional files and directories as needed for your implementation.

You might also like