0% found this document useful (0 votes)
203 views255 pages

Embedded Systems (PDFDrive)

Uploaded by

yagababa041
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
203 views255 pages

Embedded Systems (PDFDrive)

Uploaded by

yagababa041
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 255

EMBEDDED

SYSTEMS
EMBEDDED
SYSTEMS
(SECOND EDITION)

D P KOTHARI
Ex-Visiting Professor
Royal Melbourne Institute of Technology
Melbourne, AUSTRALIA

SHRIRAM K VASUDEVAN
Technical Manager - Learning and Development
Amrita University, Coimbatore, INDIA

SUNDARAM R M D
Technical Leader, Wipro Technologies, INDIA

MURALI N
Lecturer, Nizwa College of Technology, OMAN

New Academic Science Limited


27 Old Gloucester Street, London, WC1N 3AX, UK
www.newacademicscience.co.uk
NEW
ACADEMIC
SCIENCE e-mail: [email protected]
Copyright © 2015 by New Academic Science Limited
27 Old Gloucester Street, London, WC1N 3AX, UK
www.newacademicscience.co.uk • e-mail: [email protected]

ISBN : 978 1 781830 83 3

All rights reserved. No part of this book may be reproduced in any form, by photostat, microfilm,
xerography, or any other means, or incorporated into any information retrieval system, electronic or
mechanical, without the written permission of the copyright owner.

British Library Cataloguing in Publication Data


A Catalogue record for this book is available from the British Library

Every effort has been made to make the book error free. However, the author and publisher have no
warranty of any kind, expressed or implied, with regard to the documentation contained in this book.
Dedications
Kothari, D.P.—To son-in-laws Pankaj and Rahul
Shriram K Vasudevan—To Parents and Sister
Sundaram R M D—To Mom and Dad
Murali N.—To Friends and Parents
Preface

Embedded Systems, present almost everywhere, have occupied an inevitable


place in the market. We, the consumers, live with Embedded Systems all the
way, be watches, mobile phones, refrigerators, cars, music systems and what
not…
Even the medical field is fully supported with the modern equipments which,
too, are embedded systems. Embedded Systems occupy a vital place in military
as well, where weapons mostly come under this category. Automobile industry
will become handicapped without Embedded Systems.
In this book, every topic has been supported with practical examples. In
addition, the programming concepts have been fully supported with simple and
elegant C codes which have been executed in Linux OS as well.
After every chapter, the reader is presented with a set of interesting quiz
questions, which will make the reader think for sure. In short, it will be good
and friendly learning experience for the reader.
We have covered the basics of Embedded Systems in Chapter-1 followed
by building blocks (components) of the system. Then the book moves towards
the design methodologies and modeling of Embedded Systems in Chapter-3.
Layered approach is being followed in building an Embedded System. This
approach has been discussed in Chapter-4. Chapters-5 and 6 cover the basics
of operating system and programming with C in Linux. Chapter-7 is on networks
for Embedded Systems. Then microcontrollers are discussed in the next two
chapters which include 8051 to latest ARM controllers. A practical example is
also indepth discussed in Chapter-11 after discussing the coding guidelines in
Chapter-10.
viii Preface

We wish to thank all the good hearts who have helped us in this project. In
particular, we wish to thank Subashri V, Sriram Karthik, Sivaraman R, and
Sunandhini M for their immense help and support in bringing the book to a good
shape.

D.P. Kothari
Shriram K Vasudevan
Sundaram R M D
Murali N.
Contents
Preface vii
1. Embedded Systems—An Introduction 1—10
1.1 Basic Idea on System 1
1.2 Embedded Systems—Definitions 1
1.3 Characteristics of Embedded Systems—An Overview
with Examples 2
1.4 Challenges in Designing an Embedded System 6
1.5 Categorization of Embedded Systems 7
1.6 Examples of Embedded Systems 8
1.7 Quiz 9

2. Components of Embedded Systems 10—30


2.1 Understanding of Microprocessor and Microcontroller 11
2.2 Functional Building Blocks of Embedded Systems 12
2.3 Processor and Controller 13
2.4 Memory, Ports and Communication Devices 14
2.4.1 Memory 15
2.4.2 Ports 16
2.4.3 Communication Devices 16
2.5 CISC vs. RISC Processors 17
2.6 General Purpose Processor and DSP Processor 18
2.7 Direct Memory Access 19
2.8 Cache memory and its types 22
2.9 Co-design of Hardware and Software 23
x Contents

2.10 System on Chip 24


2.11 Tools for Embedded Systems 25
2.12 Quiz 29

3. Design Methodologies, Life Cycle and Modeling of


Embedded Systems 31—53
3.1 Software Life Cycle 31
3.2 Embedded Life Cycle 33
3.3 Modeling of Embedded Systems 38
3.4 Simulation and Emulation 50
3.4.1 Simulation 50
3.4.2 Emulation 51
3.5 Quiz 53

4. Layers of an Embedded System 54—61


4.1 Introduction 54
4.2 Need for Layering 55
4.2.1 The Hardware Layer 55
4.2.2 The System Software Layer (or Simply, the OS layer) 57
4.2.3 The Middleware 59
4.2.4 The Application Layer 60
4.3 Quiz 61
5. Real Time Operating Systems (RTOS)
— An Introduction 62—72
5.1 What is an Operating System? 62
5.2 How is Resource Management Carried out? 63
5.3 What is Kernel? 64
5.3.1 Kernel Components 66
5.4 Why RTOS is Needed? 69
5.5 What is Real Time? 69
5.6 Quiz 72

6. Real Time Operating Systems — A Detailed


Overview 73—134
6.1 LINUX—An Introduction 74
Contents xi

6.1.1 Comparison of UNIX and LINUX 74


6.1.2 File System Architecture Details 75
6.1.3 Types of File Systems in UNIX/LINUX 76
6.1.4 Basic UNIX Commands 77
6.1.5 Proc and File Descriptor Table 78
6.2 RTOS Concepts 81
6.2.1 Task 81
6.2.2 Task States 82
6.2.3 Task Transitions 83
6.2.4 Task Scheduling 84
6.3 Inter Process Communication (IPC) Methodologies 98
6.3.1 Pipe 98
6.3.2 Named Pipe 102
6.3.3 Message Queue 106
6.3.4 Shared Memory 112
6.3.5 Task and Resource Synchronization 117
6.4 Memory Management 123
6.5 Cache Memory 126
6.6 Dynamic Memory Allocation 128
6.7 Fragmentation 130
6.8 Virtual Memory 131
6.9 Context Switching 132
6.10 Quiz 134

7. Networks for Embedded Systems 135—157


7.1 Serial Communication Basics 135
7.1.1 RS-232 Model 137
7.1.2 I C Model
2
139
7.2 CAN and CAN OPEN 140
7.3 SPI and SCI 143
7.3.1 SPI 143
7.3.2 SCI 145
7.4 USB 146
xii Contents

7.5 IEEE 1394—Apple Fire Wire 148


7.6 HDLC—An Insight 150
7.7 Parallel Communication Basics 151
7.7.1 PCI Interface 152
7.7.2 PCI-X Interface 154
7.8 Device Drivers—An Introduction 154
7.8.1 Serial Port Device Driver 155
7.8.2 Parallel Port Device Driver 155
7.9 Quiz 157

8. An Overview and Architectural Analysis of 8051


Microcontroller 158—195
8.1 Introduction 159
8.2 Microcontroller Resources 165
8.3 Internal and External Memory 175
8.4 Memory Organization 179
8.5 Timer or Counter 180
8.6 Input and Output Ports 181
8.7 Interrupts—An Insight 183
8.8 Assembly Language Programming 186
8.9 Quiz 194

9. Advanced Architectures 196—211


9.1 Basic Introduction to Processors 196
9.2 ARM Architecture 197
9.2.1 Different Versions of ARM Processor 197
9.2.2 ARM Internals—Core Block Diagram 198
9.2.3 ARM—Register Set 199
9.2.4 ARM—Instruction Set 199
9.2.5 ARM Programming Model and Data Types 199
9.2.6 C Assignments in ARM—A Few Examples 200
9.3 SHARC Architecture 201
9.3.1 SHARC Working Principle 201
9.3.2 SHARC Addressing Modes 202
Contents xiii

9.3.3 SHARC—C Assignments with Examples 203


9.4 ARM vs. SHARC 203
9.5 Blackfin Processors 204
9.5.1 Core Features 204
9.5.2 Memory and DMA 205
9.5.3 Microcontroller Features 206
9.5.4 Peripherals 206
9.6 TI-DSP Processors 207
9.7 Assembly Language Programming on Hardware Processors 208
9.8 Quiz 211

10. Coding Guidelines 212—227


10.1 Coding Standards—The Definition 212
10.2 The Purpose 213
10.3 The Limitations 213
10.4 Common Programming Standards 214
10.4.1 Modularization 214
10.4.2 Data Typing, Declarations, Variables and
Other Objects 215
10.4.3 Names 216
10.4.4 Organizing Control Structures 217
10.4.5 Program Layout 220
10.4.6 Comments and (Program) Documentation 224
10.5 Project Dependent Standards 226
10.6 Summary 227

11. Embedded Systems — Application, Design


and Coding Methodology 228—240
11.1 Embedded System—Design 2228
11.2 Designers, Perspective 229
11.3 Requirements Specifications 234
11.4 Implementation of the Proposed System 234
11.5 Quiz 239
1
Embedded Systems
—An Introduction

Learning Outcomes
R Basic Idea on System
R Definition of Embedded Systems
R Characteristics of Embedded Systems
R Challenges in Designing an Embedded System
R Categorization of Embedded Systems
R “Examples of Embedded Systems”
R Recap
R Quiz

Embedded Systems are available everywhere in this modern world. This chapter
will touch on all basic aspects of understanding an Embedded System.

1.1 BASIC IDEA ON SYSTEM


What is a System? A system can be defined as simple as “It can accept some
input, analyze and then it should give us the output that the system is meant for
or it should drive the next piece of machine connected to it”.

1.2 EMBEDDED SYSTEMS—DEFINITIONS


Definition of Embedded Systems can now be seen. Many people have given
many definitions all over the world. If internet is surfed for definition of an
embedded system one will get enormous amount of results getting displayed.
Few of them are picked here to add clarity and finally embedded system can be
defined in a smooth way.
According to Wayne Wolf,
“An Embedded System is a computing system other than desktop
computers”. This looks pretty simple and other definitions are as follows:
2 Embedded Systems

• An embedded system is the one that has computer hardware with software
embedded in it as one of its most important components.
• It is a device that includes a programmable computer but is not itself
intended to be a general purpose computer.
Embedded System can be well defined by taking couple of classical examples.
First, an air conditioner is taken for understanding. What does an air
conditioner do? The temperature is set as per requirement, say 20°C.
There may be variations in external temperature and that will also reflect
in the room air conditioner is fitted. But however the external temperature
varies, the AC machine facilitates user with cool atmosphere (i.e., 20 °C)
inside the room as per requirement. What is the action taken?
Consider a second example of the pace maker. Its work is to trigger the
heart beat if at all heart is getting into trouble. How is this done?
Answers for both the questions are the same.
When looking into the definition of an Embedded System, one can get
answer for above quoted cases.
• An electronic controller built into the application, continuously monitors
the process variables and ensures that the Process Variable (PV) does
not change; in the event of a change the controller generates a counteracting
signal applied to the application so that the deviated PV is brought to its
normal operating value. This could define embedded systems clearly!
So here it is made very clear. In air conditioner, temperature is the process
variable. A controller inside will keep on monitoring the process variable. If at all
the room temperature changes due to variation in external temperature, controller
will take a counter acting signal and PV (temperature) will be brought to required
range.
Second case, controller inside a pace maker will keep monitoring the heart
beat count. If it is getting low, immediately a counter acting action will be taken
and it will boost up the heart.
Food for brain! Is laptop an embedded system?—This question will be
answered shortly!

1.3 CHARACTERISTICS OF EMBEDDED SYSTEMS


—AN OVERVIEW WITH EXAMPLES
When provided with a system, it should be identified, if it is an Embedded System.
Certain common characteristics are there for all Embedded Systems. Having
able to understand the characteristics, embedded systems can be spotted easily.
• Single Functioned
Embedded Systems—An Introduction 3

• Tightly Constraint
• Real Time and Reactive
• Complex Algorithms
• User Interface
• Multi Rate
Each of the above characteristics are discussed below in detail.
1. Single Functioned
An Embedded System can execute a specific function repeatedly i.e., dedicated
function. As an example, Air conditioner will be cooling the room. Cooling is its
dedicated functionality and it cannot be used for any other purposes. AC can’t
be used for making calls. Likewise mobile phone is an Embedded System that
can be used to make and receive calls and it can’t be used for controlling room
temperature.
Consider the list of embedded systems that are being used every day.
1. Pager
2. Microwave oven
3. Mobile phone
4. ATMs
5. Car braking systems
6. Automobile cruise controllers
7. Pace makers
8. Modem
9. Network cards and many more

Fig. 1.1: Few applications of embedded systems (all single functioned)


4 Embedded Systems

From the examples quoted, it is understood about single functioned behaviour of


Embedded Systems.
Is Laptop an Embedded System -> No, since it can be used for different
purposes. It can play media players and at the same time, laptop can be used as
a gaming machine. And the next day it can be used for typing data. So it is
multifunctional and it can’t be an Embedded System.

2. Tightly Constraint
Whatever system is being designed, they have constraints. Embedded Systems
are also tightly constraint in many aspects. Few aspects are being analyzed
here.
1. Manufacturing Cost
2. Performance
3. Size
4. Power
The above four parameters decide the success of Embedded System.
Consider buying a mobile phone as an example. If the mobile phone costs
in lakhs, will it be bought? (Instead Landline phone would be preferred). Next
scenario, the mobile phone that is bought, if it takes 1/2 an hour for making a call
and if it also hangs frequently, will it be opted? (No Way!). Third point if the
phone is weighing 3 kgs, will it be preferred? Finally coming to power criteria.
All Embedded Systems are almost battery operated. And it is mobile as well! So
it should be capable of retaining the charge for some reasonable amount of
time. Else the battery will drain faster and one has to keep charger handy all the
time. So it is very important to have this constraint in mind when designing an
embedded system.

3. Real Time and Reactive


What is real time? —A nice question to start with!
A definition can be given through an example here. Take an instance of travel in
BMW car. (Great feel it would be). (The braking system is an embedded system).
And unfortunately a lorry is coming opposite to the car... The driver is applying
brake there!. What would be the action required? It should immediately stop the
car right. This is a real time and reactive behaviour. The brake may be applied
at any point in time. And the vehicle should be stopped immediately at the instance
of applying brake. It is never known when brake has to be applied, so the
system should be ready to accept the input at any time and should be ready to
process it.
Embedded Systems—An Introduction 5

So keeping above example in mind and defining Real Time, it is logical


correctness of the operation in deterministic deadline (The vehicle should
be stopped immediately, which means as logical correctness of the operation in
deterministic deadline).
Few examples can be spotted for Real time and Reactive behaviour of an
Embedded System:
(a) Pace Maker’s action
(b) Flights Landing Gear Control
(c) ECG Machines output
And so on … Many examples could be cited here!

4. Complex Algorithms
The processor inside the embedded system should perform operations that are
complex in nature. An example is digital camera. It is used to take colour
photographs, motion pictures, black and white pictures, etc. It needs to pull in
lots of complex algorithms for performing all the above mentioned operations.
So as a point to conclude, every Embedded System will have lots of complex
algorithms running inside it.

5. User Interface
Here too with an example the concept can be explained. NOKIA mobile phones
are very big hit in market right, Why? What is the reason? Is that because other
mobile did not perform well? No, is the answer. Nokia had excellent and simple
user interface. Calls can be made and received very easily. Typing SMS is also
easier… So it has been read by the people very well.
So designing system with easier and comfortable interface is most
important. Also it should have options required for the operation of the device.
Example is ATM machine; it has got comfortable interfaces and options. Keep
it in mind and design the system.

6. Multi Rate
Embedded Systems need to control and drive certain operations at one rate and
certain other operations at different rate. Example can be Digital Camera. It is
used to take pictures which are still. Also it is capable of shooting video. So it
has to be capable of driving the first operation from a speed different than the
second one.
A Small Recap: Please do not forget the definition of Real Time; down the line it
will be needed.
6 Embedded Systems

1.4 CHALLENGES IN DESIGNING AN EMBEDDED SYSTEM


First and foremost problem in designing an Embedded System is “Very Less
Availability of Tools and Debuggers”.
Other than the point quoted, there are several other challenges.

Meeting Deadlines

Hardware
Hardware Selection
Selection Embedded System Upgradability
Upgradability
Design Challenges

Will
Willititwork?
work?

Fig. 1.2: Challenges in embedded system design

Figure 1.2 diagrammatically represents the challenges. Reader will be exposed


to all these challenges with some relevant examples.

1. Meeting Deadlines
How can the deadline be met that is meant for the product? Meeting deadline
accurately will need high speed hardware. Increasing hardware components
with quality would increase the cost of the product. This is the first challenge in
front of designers.

2. Hardware Selection
Embedded Systems never had a luxury of having much hardware. Taking memory
into consideration first, Embedded Systems will have very little inbuilt memory.
Adding more memory of smaller size will increase cost factor. So keep memory
only as much as needed. It can have an expansion slot for the system, if user is
willing to expand memory, who bothers, let user expand.
Coming to processor selection, if a very high speed processor is selected, it
would end up in draining the battery at the earliest. But it can’t be compromised
with speed also. So select a processor that perfectly fits in with requirement.
Too high speed processor would cost more and can drain battery also.
3. Is it upgradable and maintainable?
Assume that a mobile phone has been designed and it is released in the market.
But after reaching to the people, the product was found with problems in one or
Embedded Systems—An Introduction 7

two aspects. The developer would know that problem and it can be fixed. But
how will it reach the phone that had already reached to the public?
So it must be supporting with upgradation of versions of software for it.
Keep this in mind that the product should be upgradable with the same hardware!
Secondly, when develop software for embedded systems, it should be kept
in mind on maintainability. The code should not be just written in such a way that
only developer who developed it can understand. It should be understandable
for other engineers also. Other engineers should also be able to understand and
fix bugs in the code if any, if need be.
4. Will it work?
Nice Question. Isn’t it? Yeah. Please ensure if the system that has been designed
is really working fine. How can it be ensured? Through rigorous testing it is
possible; it needs to be proceeded with testing in many ways. First can be Unit
Testing, next stage is Sanity Testing and the third stage can be Regression testing.
Also even if the product has entered, it has to be constantly monitored. If any
customer complaint rises, that bug has to be looked into and has also to be fixed.
And more importantly, the bug that is fixed should not introduce any new bugs.
Let’s now get to know about the categorization of the embedded systems!
1.5 CATEGORIZATION OF EMBEDDED SYSTEMS
Embedded Systems can be categorized based on the complexity in building, cost
factors, purpose of the system, tools and other related environment availability,
etc. Keeping these points, Table 1.1 has been framed and it has dealt with
categories. Broadly one can classify embedded system into any of these,
1. Small Scale Embedded Systems,
2. Medium Scale Embedded Systems, and
3. Sophisticated Embedded Systems.
Table 1.1: Categorization of Embedded Systems

Small Scale Medium Scale Sophisticated


Embedded System Embedded System Embedded System
Processor Here it can be 8 bit It can be a 16 bit or PLA, PAL, FPGA
or 16 bit processor. 32 bit processor here. and ASIC will fall in
(it can’t do (think of little complex this category.
computationally and intensive process (These are high end
intensive process) with this processor) design elements that
can be used to do
wonders)

Contd...
8 Embedded Systems

Hardware Very little complexity We will have to face Highly Complex.


Complexity will be visualized more complexity in (designers need
here. terms of peripheral enormous expertise
additions, interfaces to do proceed with
etc., this)

Software No complexity will There will be Yeah. Most


Complexity be there in this complexity added up. Complex.
coding. Because the This will have few Designer needs to
device is not meant functions and code be a master to work
for performing might be lengthy. on the code.
complex
functionalities.
Power Battery operated Battery operated Can be Battery or
Live Source based
on the application.
Tools Can be programmed Here we will have to Designer needs
Availabiltiy in simple use Debugger, sophisticated
environment. So no Compiler, IDE etc., as environment here.
much research on the task goes slightly
tools is required cumbersome.
here.
Examples Calculator can be Washing Machine, Flight Landing Gear
the simplest Microwave Oven, and Systems, Car
example. Stepper Vending machine. Braking Systems,
motor controller can Military Applications,
be added to the list. Robots.

1.6 EXAMPLES OF EMBEDDED SYSTEMS


Coming to examples, everything around us can be taken! Yeah, each and
everything these days are embedded systems. Let’s take this way.
From the time of getting up in the morning,
1. Digital Water Heater! (After bath need to breakfast!). Here temperature
is the process variable and it is the input to be set by user. Controller will
take care on the controlling action and will take care of the heating process.
2. Microwave Oven (coffee after breakfast). Here again temperature will
be the process variable. Same controlling action will be taken.
3. Braking System, Music Player, Tire Pressure Monitoring System, Airbags,
Power Windows and GPS (Many more embedded systems are there
inside … only 1% is quoted, workplace has been reached now. Car braking
system can be an instance; it shows the real time and reactive behaviour.
Brake would be applied at any point in time, but still it would stop the car.)
4. All gym equipments from treadmill to cycling equipments are embedded
systems.
Embedded Systems—An Introduction 9

5. Video games, all digital gaming machines, I pod, MP3/MP4 players, Xbox
and what not!

POINTS TO REMEMBER
1. An Embedded System is single functioned and AC can be remembered
as a simple example.
2. Embedded Systems are Real time systems which is reactive in nature.
3. Many design challenges are associated with making an embedded
system, including cost, power etc.
4. Embedded Systems are classified into three major divisions—Small scale,
medium scale and large scale Embedded Systems.

Review Questions
1. What is an Embedded System? Give an example.
2. Embedded Systems are quoted as single functioned systems. Justification
is required in this case.
3. Define Real time.
4. Throw an example for real time and reactive Embedded System.
5. What are the major categories of the Embedded Systems? Give an example
for each division.
6. Is LCD projector an Embedded System? Please justify.

1.7 QUIZ
1. Pick odd one out! (Embedded is the clue)
(a) Laptop (b) Projector
(c) Mobile phone (d) MP3 player
2. Some of the important characteristics expected in consumer electronics
products are ...............
(a) Recovering from failures
(b) Low cost
(c) Performance
(d) Low unit cost, low power consumption and smaller product size
3. One of the most important characteristics of Embedded Systems for
automotive industry (with respect to the inmates of the vehicle) is
................
(a) Recovering from failures
10 Embedded Systems

(b) Low cost


(c) Safety
(d) Low unit cost, low power consumption and smaller product size
4. .................. manages resources like CPU time and memory of an
Embedded System.
(This is a general question, try to answer this.)
5. Embedded System can do multiprocessing — True or False.
6. What is Real time? Please give an example.
7. Give an example of multi rate characteristic of an Embedded System.
8. Embedded Systems are almost battery operated—True or False.
9. Mobile phone is not an Embedded System—True or False.

Answers for Quiz


1. Laptop
2. Low unit cost, low power consumption and smaller product size
3. Safety
4. Operating system
5. False
6. Braking, Pace maker’s behaviour can be a good example here.
7. A digital camera can work on a black and white image, colour image,
video and audio. Processing is different in all these which is referred as
multi rate.
8. True
9. False (May be in future, Mobile phones would do multiple things and can
break this question).
2
Components of
Embedded Systems

Learning Outcomes
R Understanding of Microprocessor and Microcontroller
R Functional Building Blocks of Embedded Systems
R Processor and Controller
R Memory, Ports and Communication Devices
R CISC vs. RISC Processors
R General Purpose Processor and DSP Processor
R Direct Memory Access—Indepth Analysis
R Cache Memory and its types
R Co-design of Hardware and Software
R System on Chip
R Tools for Embedded Systems
R Recap
R Quiz

2.1 UNDERSTANDING OF MICROPROCESSOR AND


MICROCONTROLLER
A microprocessor incorporates most or all of the functions of a computer’s
Central Processing Unit (CPU) on a single integrated circuit. A microcontroller
is a small computer on a single integrated circuit containing a processor core,
memory, and programmable input/output peripherals.
Microprocessors generally require external components to implement
program memory, RAM memory and input/output. Intel’s 80186, 80188, and
80386 are examples of microprocessors. Microcontrollers incorporate program
memory, RAM memory and input/output resources internal to the chip.
Microchip’s ‘PIC’ series and Atmel’s ‘AVR’ series are examples of
microcontrollers.
12 Embedded Systems

2.2 FUNCTIONAL BUILDING BLOCKS OF EMBEDDED


SYSTEMS
Embed means to fix (something) firmly and deeply or enclose something closely
in a surrounding mass. Figure 2.1 shows a simple embedded system.
The various blocks that build the embedded systems include ports, RAM, switch
controllers, Reset software buttons, LAN, PCI card, JTAG connector, etc.

Serial USB Power


2 port WAN Console (host) Printer port connector

Switch
Controller
JTAG
PCMCIA
MCU
card
Boot Flash

Extension

4 port LAN LEDs SD RAM Reset S/W


Flash

Fig. 2.1: A simple embedded system

An Embedded System has a different set of issues to deal with than does desktop
software. This includes:
1. Handle situations that don’t arise with desktop software like doing several
things at once, responding to external events (button presses, sensor
readings, etc.).
2. Cope with all unusual conditions without human intervention including
Meet strict processing deadlines and never fail.
3. Able to log issues when things go out of control which helps in debugging
the issue later.
Embedded Systems were initially hardwired systems and then were
Microprocessor-based, then moved on to Microcontroller-based and
Specialized (application) processors and finally we have reached System
on a Chip (SOC).
Components of Embedded Systems 13

Address bus

Micro Memory System


Processor Input/Output

Data bus

Digital Digital
Outputs Inputs

Fig. 2.2: Basic building blocks of an embedded processor


Memory system forms a major block in embedded system. This includes RAM,
ROM, cache memory and hard drive memory. There should be provision to
provide input to the system by means of cable, SD cards etc. and proper output
mechanism is also required to test the system functionality. The basic building
blocks of Embedded System are illustrated in Fig. 2.2.
Required features for Embedded System include
R Throughput
R Response
R Low power
R Reliability
R Safety
R Maintainability
R Cost, size and weight

2.3 PROCESSOR AND CONTROLLER


Microprocessors generally require external components to implement program
memory, RAM memory and input/output. Intel’s 80186, 80188, and 80386 are
examples of microprocessors.
Microcontrollers incorporate program memory; RAM memory and input/
output resources internal to the chip. Microchip’s PIC series and Atmel’s AVR
series are examples of microcontrollers. Figure 2.3 depicts a PIC 18F8720
microcontroller in an 80-pin TQFP package.
It is not unusual to see these terms used interchangeably. In simple terms,
Microprocessor = CPU
Microcontroller = CPU, peripherals and memory
14 Embedded Systems

Peripherals = ports, clock, timers, UART, ADC converters, LCD drivers, DAC
and others.
Memory = EEPROM, SRAM, Flash, etc.
About 55% of all CPUs sold in the world are 8-bit microcontrollers and
microprocessors.

Fig. 2.3: A PIC 18F8720 microcontroller in an 80-pin TQFP package

A typical home in a developed country is likely to have only four general purpose
microprocessors but around three dozen microcontrollers. A typical mid range
automobile has as many as 30 or more microcontrollers. They can also be found
in many electrical devices such as washing machines, microwave ovens, and
telephones.

2.4 MEMORY, PORTS AND COMMUNICATION DEVICES


Computer data storage, often called storage or memory, refers to computer
components and recording media that retain digital data used for computing for
some interval of time. Computer data storage provides one of the core functions
of the modern computer, that of information retention. It is one of the fundamental
components of all modern computers, and coupled with a central processing
unit (CPU, a processor), implements the basic computer model used since the
1940s.
A register file is an array of processor registers in a Central Processing
Unit (CPU). Modern integrated circuit-based register files are usually
implemented by way of fast static RAMs with multiple ports. Such RAMs are
distinguished by having dedicated read and write ports, whereas ordinary
multiported SRAMs will usually read and write through the same ports.We will
discuss in detail about memory, ports and devices below.
Components of Embedded Systems 15

2.4.1 Memory
Many types of memory devices are available for use in modern computer systems.
The RAM family includes two important memory devices: static RAM
(SRAM) and dynamic RAM (DRAM) as shown in Fig. 2.4. The primary
difference between them is the lifetime of the data they store. SRAM retains its
contents as long as electrical power is applied to the chip. If the power is turned
off or lost temporarily, its contents will be lost forever. DRAM, on the other
hand, has an extremely short data lifetime, typically about four milliseconds.
This is true even when power is applied constantly.
Memories in the ROM family are distinguished by the methods used to
write new data to them, and the number of times they can be rewritten. This
classification reflects the evolution of ROM devices from hardwired to
programmable to erasable-and-programmable. A common feature of all these
devices is their ability to retain data and programs forever, even during a power
failure.
In practice, almost all computers use a variety of memory types, organized
in a storage hierarchy around the CPU, as a trade-off between performance
and cost. Generally, the lower a storage is in the hierarchy, the lesser its bandwidth
and the greater its access latency is from the CPU. This traditional division of
storage to primary, secondary, tertiary and off-line storage is also guided by cost
per bit.

Fig. 2.4: Writable volatile random access memory

As memory technology has matured in recent years, the line between RAM
and ROM has blurred. Now, several types of memory combined features of
both. These devices do not belong to either group and can be collectively referred
to as hybrid memory devices. Hybrid memories can be read and written as
desired, like RAM, but maintain their contents without electrical power, just like
ROM. Two of the hybrid devices, EEPROM and flash, are descendants of
ROM devices. These are typically used to store code. The third hybrid, NVRAM,
is a modified version of SRAM. NVRAM usually holds persistent data.
16 Embedded Systems

2.4.2 Ports
Data can be sent either serially, one bit after another through a single wire, or in
parallel, multiple bits at a time, through several parallel wires. Most famously,
these different paradigms are visible in the form of the common PC ports “serial
port” and “parallel port”.
Early parallel transmission schemes often were much faster than serial
schemes but at added cost and complexity of hardware. Serial data transmission
is much more common in new communication protocols due to a reduction in
the I/O pin count, hence a reduction in cost. Common serial protocols include
SPI, and I2C. Surprisingly, serial transmission methods can transmit at much
higher clock rates per bit transmitted, thus tending to outweigh the primary
advantage of parallel transmission.
Parallel transmission protocols are now mainly reserved for applications
like a CPU bus or between IC devices that are physically very close to each
other, usually measured in just a few centimetres. Serial protocols are used for
longer distance communication systems, ranging from shared external devices
like a digital camera to global networks or even interplanetary communication
for space probes; however some recent CPU bus architectures are even using
serial methodologies as well.

2.4.3 Communication Devices


Embedded Systems talk with the outside world via peripherals, such as:
R Serial Communication Interfaces (SCI): RS-232, RS-422, RS-485, etc.
R Synchronous Serial Communication Interface: I2C, SPI, PCI, etc.
R Universal Serial Bus (USB), Multi Media Cards (SD Cards, Compact
Flash, etc.)
R Networks: Ethernet, Controller Area Network, etc.
R Timers: PLL(s), Capture/Compare and Time Processing Units.
R Discrete IO: aka General Purpose Input/Output (GPIO).
R Analog to Digital/Digital to Analog (ADC/DAC).
R Debugging: JTAG, ISP, ICSP, BDM Port, BITP and DP9 ports.
Some commonly used communication devices include I2C, SPI and PCI. I2C is
a multi-master, low-bandwidth, short distance, serial communication bus protocol.
Nowadays it is not only used on single boards, but also to attach low-speed
peripheral devices and components to a motherboard, embedded system, or
cell-phone, as the new versions provide lots of advanced features and much
Components of Embedded Systems 17

higher speed. The features like simplicity and flexibility make this bus attractive
for consumer and automotive electronics.

2.5 CISC VS. RISC PROCESSORS


Reduced Instruction Set Computing or RISC is a CPU design strategy
based on the insight that simplified instructions can provide higher performance
if this simplicity enables much faster execution of each instruction. A computer
based on this strategy is a Reduced Instruction Set Computer. There are many
proposals for precise definitions, but the term is slowly being replaced by the
more descriptive load-store architecture. Well known RISC families include
DEC Alpha, AMD 29k, ARC, ARM, Atmel AVR, MIPS, PA-RISC, Power
(including Power PC) and SPARC.
A Complex Instruction Set Computer or CISC, is a computer where
single instruction can execute several low-level operations (such as a load from
memory, an arithmetic operation, and a memory store) and/or is capable of
multi-step operations or addressing modes within single instruction. The term
was retroactively coined in contrast to Reduced Instruction Set Computer
(RISC).
Before the RISC philosophy became prominent, many computer architects
tried to bridge the so called semantic gap, i.e., to design instruction sets that
directly supported high-level programming constructs such as procedure calls,
loop control, and complex addressing modes, allowing data structure and array
accesses to be combined into single instructions. Instructions are also typically
highly encoded in order to further enhance the code density. The compact nature
of such instruction sets results in smaller program sizes and fewer (slow) main
memory accesses, which at the time resulted in a tremendous savings on the
cost of computer memory and disc storage, as well as faster execution. It also
meant good programming productivity even in assembly language, as high level
languages such as FORTRAN or Algol were not always available or appropriate.

→CISC Migration
RISC→
(a) Determined by VLSI technology.
(b) Software cost goes up constantly. To be convenient for programmers.
(c) To shorten the semantic gap between HLL and architecture without
advanced compilers.
(d) To reduce the program length because memory was expensive.
(e) VAX 11/780 reached the climax with > 300 instructions and >20 addressing
modes.
18 Embedded Systems

→RISC Migration
CISC→
(a) Things changed: HLL, Advanced Compiler, Memory size, etc.
(b) Finding: 25% instructions used in 95% time.
(c) Size: usually <100 instructions and <5 addressing modes.
(d) Other properties: fixed instruction format, register based, hardware control,
etc.
(e) Gains: CPI is smaller, Clock cycle shorter, Hardware simpler, Pipeline
easier.
(f) Cheaper: Programmability becomes poor, but people use HLL instead of
IS.

2.6 GENERAL PURPOSE PROCESSOR AND DSP PROCESSOR


The essential difference between a DSP and a microprocessor is that a DSP
processor has features designed to support high-performance, repetitive,
numerically intensive tasks. In contrast, general purpose processors or
microcontrollers are either not specialized for a specific kind of applications or
they are designed for control-oriented applications.
Most general purpose microprocessors and operating systems can execute
DSP algorithms successfully, but are not suitable for use in portable devices
such as mobile phones and PDAs because of power supply and space constraints.
A specialized digital signal processor, however, will tend to provide a lower-cost
solution, with better performance, lower latency, and no requirements for
specialized cooling or large batteries. Features that accelerate performance in
DSP applications include:
(a) Single-cycle multiply-accumulate capability; high-performance DSPs often
have two multipliers that enable two multiply-accumulate operations per
instruction cycle; some DSP have four or more multipliers.
(b) Specialized addressing modes, for example, pre- and post-modification of
address pointers, circular addressing, and bit-reversed addressing.
(c) Most DSPs provide various configurations of on-chip memory and
peripherals tailored for DSP applications. DSPs generally feature multiple-
access memory architectures that enable DSPs to complete several
accesses to memory in a single instruction cycle.
(d) Specialized execution control. Usually, DSP processors provide a loop
instruction that allows tight loops to be repeated without spending any
instruction cycles for updating and testing loop counter or for jumping
back to the top of the loop.
Components of Embedded Systems 19

(e) DSP processors are known for their irregular instruction sets, which
generally allow several operations to be encoded in a single instruction.
For example, a processor that uses 32-bit instructions may encode two
additions, two multiplications, and four 16-bit data moves into a single
instruction. In general, DSP processor instruction sets allow a data move
to be performed in parallel with an arithmetic operation. GPPs/MCUs, in
contrast, usually specify a single operation per instruction.
While the above differences traditionally distinguish DSPs from GPPs/MCUs,
in practice it is not important what kind of processor you choose. What is really
important is to choose the processor that is best suited for your application; if a
GPP/MCU is better suited for your DSP application than a DSP processor, the
processor of choice is the GPP/MCU. It is also worth noting that the difference
between DSPs and GPPs/MCUs is fading: many GPPs/MCUs now include
DSP features, and DSPs are increasingly adding microcontroller features.

2.7 DIRECT MEMORY ACCESS


Direct Memory Access (DMA) is a feature of modern computers and
microprocessors that allows certain hardware subsystems within the computer
to access system memory for reading and/or writing independently of the central
processing unit. Many hardware systems use DMA including disk drive
controllers, graphic cards, network cards and sound cards. DMA is also used
for intra-chip data transfer in multi-core processors, especially in multiprocessor
system-on-chips, where its processing element is equipped with a local memory
(often called scratchpad memory) and DMA is used for transferring data between
the local memory and the main memory.
Computers that have DMA channels can transfer data to and from devices
with much less CPU overhead than computers without a DMA channel. Similarly
a processing element inside a multi-core processor can transfer data to and
from its local memory without occupying its processor time and allowing
computation and data transfer concurrency.
Without DMA, using Programmed Input/Output (PIO) mode for
communication with peripheral devices, or load/store instructions in the case of
multicore chips, the CPU is typically fully occupied for the entire duration of the
read or write operation, and is thus unavailable to perform other work. With
DMA, the CPU would initiate the transfer, do other operations while the transfer
is in progress, and receive an interrupt from the DMA controller once the operation
has been done. This is especially useful in real time computing applications
where not stalling behind concurrent operations is critical. Another and related
20 Embedded Systems

application area is the various forms of stream processing where it is essential


to have data processing and transfer in parallel, in order to achieve sufficient
throughput.

Address Bus

CPU Data Bus


Memory
Control Bus

BR BG

Bus I/O
Grand DMA Data
I/O
Bus Controller I/O Device
Request Control

Fig. 2.5: Embedded system with DMA

The DMA controller includes several registers: address register, count


register, control register and status register as illustrated in Fig. 2.5. The DMA
Address Register contains the memory address to be used in the data transfer.
The CPU treats this signal as one or more output ports. The DMA Count Register,
also called Word Count Register, contains the no. of bytes of data to be
transferred. Like the DMA Address register, it too is treated as an output port
by the CPU.
The DMA Control Register accepts commands from the CPU. It is also
treated as an output port by the CPU. Most DMA controllers also have a Status
Register which can be accessed to get necessary status information.
To initiate a DMA transfer, the CPU loads the address of the first memory
location of the memory block (to be read or written from) into the DMA Address
register. It does this via an I/O output instruction, such as the OTPT instruction
for the relatively simple CPU. It then writes the number of bytes to be transferred
into the DMA Count register in the same manner. Finally, it writes one or more
commands to the DMA Control register.
These commands may specify transfer options such as the DMA transfer
mode, but should always specify the direction of the transfer, either from I/O to
memory or from memory to I/O. The last command causes the DMA controller
to initiate the transfer. The controller then sets BR to 1 and, once BG becomes
1, seizes control of the system buses.
Components of Embedded Systems 21

Modes vary by how the DMA controller determines when to transfer data,
but the actual data transfer process is the same for all the modes.
DMA Modes are:
(a) BURST Mode
(b) Cycle Stealing Mode
(c) Transparent Mode
Choice of mode is based on the application and software. Different applications
require different type of modes to be supported. Let’s take a look at the different
modes in general.
BURST Mode
1. Sometimes called Block Transfer Mode.
2. An entire block of data is transferred in one contiguous sequence. Once
the DMA controller is granted access to the system buses by the CPU, it
transfers all bytes of data in the data block before releasing control of the
system buses back to the CPU.
3. This mode is useful for loading programs or data files into memory, but it
does render the CPU inactive for relatively long periods of time.
CYCLE STEALING Mode
1. Viable alternative for systems in which the CPU should not be disabled
for the length of time needed for Burst transfer modes.
2. DMA controller obtains access to the system buses as in burst mode,
using BR and BG signals. However, it transfers one byte of data and then
de-asserts BR, returning control of the system buses to the CPU. It
continually issues requests via BR, transferring one byte of data per request,
until it has transferred its entire block of data.
3. By continually obtaining and releasing control of the system buses, the
DMA controller essentially interleaves instruction and data transfers. The
CPU processes an instruction, then the DMA controller transfers a data
value, and so on.
4. The data block is not transferred as quickly as in burst mode, but the CPU
is not idled for as long as in that mode.
5. Useful for controllers monitoring data in real time.
TRANSPARENT Mode
1. This requires the most time to transfer a block of data, yet it is also the
most efficient in terms of overall system performance.
2. The DMA controller only transfers data when the CPU is performing
operations that do not use the system buses. For example, the relatively
22 Embedded Systems

simple CPU has several states that move or process data solely within
the CPU:
NOP1: (No operation)
LDAC5: AC ← DR
JUMP3: PC ← DR,TR
CLAC1: AC ← 0, Z ←1
3. Primary advantage is that CPU never stops executing its programs and
DMA transfer is free in terms of time.
4. Disadvantage is that the hardware needed to determine when the CPU is
not using the system buses can be quite complex and relatively expensive.

2.8 CACHE MEMORY AND ITS TYPES


A CPU cache is a cache used by the central processing unit of a computer to
reduce the average time to access memory. The cache is a smaller, faster
memory which stores copies of the data from the most frequently used main
memory locations. As long as most memory accesses are cached memory
locations, the average latency of memory accesses will be closer to the cache
latency than to the latency of main memory.
When the processor needs to read from or write to a location in main
memory, it first checks whether a copy of that data is in the cache. If so, the
processor immediately reads from or writes to the cache, which is much faster
than reading from or writing to main memory.
Most modern desktop and server CPUs have at least three independent
caches: an instruction cache to speed up executable instruction fetch, a data
cache to speed up data fetch and store, and a translation look aside buffer used
to speed up virtual-to-physical address translation for both executable instructions
and data.
Types of Cache Memory
Direct Mapping
This is the simplest of the Cache mapping schemes. Each block of main memory
is mapped only to one possible cache line. The mapping is expressed as: Cache
line number = (main memory block number) modulo (number of lines in the
cache).
Associative Mapping
This method overcomes the disadvantage of direct mapping. Each memory
block can be loaded into any line of the cache. Memory address is just interpreted
as a tag and a word field. Tag field uniquely identifies a block of main memory.
Components of Embedded Systems 23

Set Associative Mapping


This combines both the schemes (direct and associative mapping). This has the
advantages of both direct and associative mapping. Cache is divided into v sets,
each of which consists of k lines.

2.9 CO-DESIGN OF HARDWARE AND SOFTWARE


Software/Hardware co-design can be defined as the simultaneous design of
both hardware and software to implement in a desired function. Successful co-
design goes hand in hand with co-verification, which is the simultaneously
verification of both software and hardware and in what extent it fits into the
desired function.
There are so many traditional barriers to effective co-design and co-
verification such as organizational structures and old fashioned paradigms of
other companies in the same market or concepts developed in the past and
worked well back then. Suppliers often lack an integrated view of the design
process, too. What we need are tools which better estimate the constraints
between the boundaries, before iterating through a difficult flow.
By using simulation models, we can find conflicts between top-down
constraints, which come from design requirements and bottom-up constraints,
which come from physical data. Bottom-up constraints for software can only
be realized in a hardware context because the abstraction-level of software is
higher than that of hardware on which it is executed.
Hardware-software co-design exists for several decades. To ensure system
capability, designers had to face the realities of combining digital computing with
software algorithms. To verify interaction between these two prototypes,
hardware had to be built. But in the 90’s this won’t suffice because co-design is
turning from a good idea into an economic necessity.
Predictions for the future point to greater embedded software content in
hardware systems than ever before. So something has to be done to speed up
and improve traditional software-hardware co-design. Developments in this matter
direct to:
1. Top-down system level co-design and co-synthesis work at universities.
2. Major advances made by EDA (Electronic Design Automation) companies
in high speed emulation systems.
Co-design focuses on the areas of system specification, architectural design,
hardware-software partitioning and iteration between hardware and software
as design progresses. Finally, co-design is complimented by hardware-software
integration and tested. Design re-use is being applied more often, too.
24 Embedded Systems

Previous and current generation IC’s are finding their way into new designs
as embedded cores in a mix-and-match fashion. This requires greater
convergence of methodologies for co-design and co-verification and high
demands on system-on-a-chip-density. That’s why this concept was an elusion
for many years, until recently. In the future the need for tools to estimate the
impact of design changes earlier in the design process will increase.
To create a system-level design, the following steps should be taken:
1. Specification capture: Decomposing functionality into pieces by creating
a conceptual model of the system. The result is a functional specification,
which lacks any implementation detail.
2. Exploration: Exploration of design alternatives and estimating their quality
to find the best suitable one.
3. Specification: The specification as noted in 1 is now refined into a new
description reflecting the decisions made during exploration as noted in 2.
4. Software and hardware: For each of the components an implementation
is created, using software and hardware design techniques.
5. Physical design: Manufacturing data is generated for each component.
When successfully run over the steps above, Embedded System design
methodology from product conceptualization to manufacturing is roughly defined.
This hierarchical modeling methodology enables high productivity, preserving
consistency through all levels and thus avoiding unnecessary iteration, which
makes the process more efficient and faster.

2.10 SYSTEM ON CHIP


System on chip refers to integrating all components of a computer or other
electronic system into a single integrated circuit called chip. It may contain
digital, analog, mixed-signal, and often radio-frequency functions—all on a single
chip substrate. A typical application is in the area of Embedded Systems.
The contrast with a microcontroller is one of degree. Microcontrollers
typically have fewer than 100K of RAM and often really are single-chip-systems;
whereas the term SOC is typically used with more powerful processors, capable
of running software such as Windows or Linux, which need external memory
chips (flash, RAM) to be useful, and which are used with various external
peripherals.
In short, for larger systems System-on-a-chip is hyperbole, indicating
technical direction more than reality: increasing chip integration to reduce
manufacturing costs and to enable smaller systems. Many interesting systems
are too complex to fit on just one chip built with a process optimized for just one
of the system’s tasks.
Components of Embedded Systems 25

A typical SOC consists of:


R One microcontroller, microprocessor or DSP core(s).
R Memory blocks including a selection of ROM, RAM, EEPROM and flash.
R Timing sources including oscillators and phase-locked loops.
R Peripherals including counter-timers, real-time timers and power-on reset.
R External interfaces including industry standards such as USB, UART and
SPI.
R Analog interfaces including ADCs and DACs.
These blocks are connected by either a proprietary or industry-standard bus
such as the AMBA bus from ARM. DMA controllers route data directly between
external interfaces and memory, bypassing the processor core and thereby
increasing the data throughput of the SOC.

2.11 TOOLS FOR EMBEDDED SYSTEMS


Most Embedded Systems run hardware diagnostics to check the health of the
hardware. Diagnostics are also used to confirm a fault that might have been
detected during normal operations. Used to verify reliability of the embedded
device, the basic hardware access code can be used as reference for writing
device driver code.
Diagnostic Types
(a) Power on Self Tests (POST)—Can test internal working of the board
immediately after it is powered up.
(b) Out of Service Tests—The board to be tested has to be configured in “out
of service” mode and then verify its interfaces with neighbouring boards.
(c) In-Service Monitoring—Checks health of the system when the system is
actually running in normal mode.
Break Point—Debug from RAM or ROM
Software Breakpoints—is created by inserting a 2 byte TRAP instruction which
will divert normal program flow to the debugger. Software breakpoints are useless
with ROM memory since a TRAP cannot be inserted.
Hardware Breakpoints—use comparators to detect accesses to a location and
no code memory contents are modified. These Logic circuits watch every bus
cycle, stopping execution when the address at which you’ve set the breakpoint
occurs. Only hardware breakpoints can be used if the execution is from ROM
systems.
26 Embedded Systems

Compilers
A compiler is a computer program (or set of programs) that transforms source
code written in a programming language (the source language) into another
computer language (the target language, often having a binary form known as
object code). The most common reason for wanting to transform source code
is to create an executable program.
Linker
In computer science, a linker or link editor is a program that takes one or more
objects generated by a compiler and combines them into a single executable
program as shown in Fig. 2.6. In IBM mainframe environments such as OS/360
this program is known as a linkage editor.

lib obj obj

linker

lib dll exe

Fig. 2.6: Linking process

On Unix variants, the term loader is often used as a synonym for linker. Other
terminology was in use, too. For example, on SINTRAN III, the process
performed by a linker (assembling object files into a program) was called loading
(as in loading executable code onto a file). Because this usage blurs the distinction
between the compile-time process and the run-time process, this article will use
linking for the former and loading for the latter. However, in some operating
systems the same program handles both the jobs of linking and loading a program;
see dynamic linking.
Simulators and Emulators
A simulator is a software that duplicates some processor in almost all the possible
ways. Emulator is a piece of computer hardware or software used with one
device to enable it to emulate another i.e., an emulator is a hardware which
duplicates the features and functions of a real system, so that it can behave like
the actual system.
Usually the emulators and simulators are used for the testing of new
architectures and also to give training in some complex systems. A most famous
example for a simulator is the flight simulator that simulates the functionalities
of an aircraft.
Components of Embedded Systems 27

A hardware emulator is an emulator which takes the form of a hardware


device. Examples include the DOS-compatible card installed in some old-world
Macintoshes like Centris 610 or Performa 630 that allowed them to run PC
programs and FPGA-based hardware emulators. The advantages of using
emulator are listed below.
• Emulators maintain the original look, feel, and behaviour of the digital
object, which is just as important as the digital data itself.
• Despite the original cost of developing an emulator, it may prove to be the
more cost efficient solution over time.
• Reduces labour hours, because rather than continuing an ongoing task of
continual data migration for every digital object, once the library of past
and present operating systems and application software is established in
an emulator, these same technologies are used for every document using
those platforms.
ISS and Debuggers
The basic instruction simulation technique is to first execute the monitoring
program passing the name of the target program as an additional input parameter.
GDB is one of debuggers which have compiled-in ISS. The target program
is then loaded into memory, but control is never passed to the code.
The machine code instructions are treated as an input stream which can be
monitored and executed. The debugger can work along with other memory
protection tools. These tools protect against accidental or deliberate Buffer
overflow.
For test and debugging purposes, the monitoring program can provide
facilities to view and alter registers, memory, and restart location or obtain a
mini core dump. Core dump has the recorded state of the working computer
program at a specific time, generally when the program has terminated
abnormally. It prints symbolic program names with current data values.
Instruction simulation provides the opportunity to detect errors before execution
which means that the conditions are still exactly as they were and not destroyed
by the error.
We can debug
(a) either an executable or
(b) a core file or
(c) a running process
A watch point is a special breakpoint that stops your program when the
value of an expression changes. We can use a watch point to stop execution
28 Embedded Systems

whenever the value of an expression changes, without having to predict a


particular place where this may happen.
Watch points may be implemented in software or hardware. GDB does
software watch pointing by single-stepping your program and testing the variable’s
value each time, which is hundreds of times slower than normal execution.
Causes for Core Dump
A “segmentation fault” is often caused by trying to access a memory location
that is not in the address space of the process.
A “bus error” is typically caused by trying to access an object with an improperly
aligned address.
An “illegal instruction” typically occurs when execution branches into data.
This sometimes happens when the stack is overwritten.
An “arithmetic exception” is typically caused by integer division by zero.
Debugging Deadlocks/Race Conditions
Necessary and Sufficient Conditions:
1. Serially reusable resources—the processes involved share resources which
they use under mutual exclusion.
2. Incremental acquisition—processes hold on to resources already allocated
to them while waiting to acquire additional resources.
3. No pre-emption—once acquired by a process, resources cannot be “pre-
empted” (forcibly withdrawn) but are only released voluntarily.
4. Wait-for cycle—a circular chain (or cycle) of processes exists such that
each process holds a resource which its successor in the cycle is waiting
to acquire.
Stack Overflow
Common conditions for stack overflow include the size of stack which may not
be sufficient to keep local variables and passed parameters. The other reason
could be large buffers which may be used as local variables. To avoid this
scenario, use
1. Determine the right stack size required for each task and allocate stack
from the heap of the main task.
2. While creating the task, assign the above allocated stack.
3. Fill the top of the stack with the known pattern “oxdeadfeed”.
4. Assign a separate task to monitor the stacks of each task periodically.
5. If the top of the stack is hit, the known pattern would have been overwritten.
6. To detect stack overflow quickly, the stack size can be reduced temporarily.
Components of Embedded Systems 29

7. Gain knowledge about available memories in the system and utilize it


gracefully.

POINTS TO REMEMBER
1. During the 1960s, computer processors were often constructed out of
small and medium-scale ICs containing from tens to a few hundred
transistors. The integration of a whole CPU onto a single chip greatly
reduced the cost of processing power.
2. Microcontrollers are “embedded” inside some other device (often a
consumer product) so that they can control the features or actions of
the product. Another name for a microcontroller, therefore, is “embedded
controller”.
3. The difference between RISC and CISC chips is getting smaller and
smaller. What counts is how fast a chip can execute the instructions it is
given and how well it runs existing software.
4. A DMA transfer copies a block of memory from one device to another.
While the CPU initiates the transfer by issuing a DMA command, it
does not execute it. For so-called “third party” DMA, as is normally
used with the ISA bus, the transfer is performed by a DMA controller
which is typically part of the motherboard chipset.
5. SOC designs usually consume less power and have a lower cost and
higher reliability than the multi-chip systems that they replace. And with
fewer packages in the system, assembly costs are reduced as well.
6. Generally, high-level programming languages, such as Java, make
debugging easier, because they have features such as exception handling
that make real sources of erratic behaviour easier to spot. In programming
languages such as C or assembly, bugs may cause silent problems such
as memory corruption, and it is often difficult to see where the initial
problem happened. In those cases, memory debugger tools may be
needed.

2.12 QUIZ
1. Hardware is tangible, but software is intangible – right or wrong?
2. Which type of memory is most closely connected to the processor?
(a) Main memory
(b) Secondary memory
(c) Disk memory
30 Embedded Systems

3. How is it possible that both programs and data can be stored on the same
floppy disk?
(a) A floppy disk has two sides, one for data and one for programs.
(b) Programs and data are both software, and both can be stored on any
memory device.
(c) A floppy disk has to be formatted for one or for the other.
4. Which one do you prefer while developing an application – CISC or RISC?
– Justify your choice.
(a) CISC
(b) RISC
5. What are the two general types of programs?
(a) Entertainment and Productivity
(b) Microsoft and IBM
(c) System software and Application software
6. Why does not a processor contain a cache of huge size?
(a) Costly
(b) Spacious
(c) Not available in the market
(d) No use
7. What tool is used for debugging the software on a hardware?
(a) JTAG
(b) BTAG

Answers for Quiz


1. Right
2. (a)
3. (b)
4. (b) (Parallelism)
5. (c)
6. (a)
7. (a)
3 Design Methodologies,
Life Cycle and Modeling
of Embedded Systems

Learning Outcomes
R Software Life Cycle
R Embedded Life Cycle
• Waterfall model
• Spiral model
• Consecutive refinement model
• Rapid Application Development (RAD) Model
R Modeling of Embedded Systems
• UML (Unified Modeling Language)
• FSM (Finite State Machine) and
• Petri net modeling
R Simulation and Emulation
R Recap
R Quiz

3.1 SOFTWARE LIFE CYCLE


Before one understands life cycle of an embedded system, it would be good to
go with understanding the basic software life cycle. Before a project is triggered
in any software concern, there are sequences and steps to be pursued. In other
words, a good start will lead to good finish. So, following are the sequences will
effect in better class of the deliverables.
1. Requirement collection and analysis
2. Design (breakdown into modules)
3. Coding
4. Integration (amalgamate of all the modules)
32 Embedded Systems

5. Testing (does it really toil?)


Above steps are sequential and are being represented in the flowchart format in
Fig. 3.1. Every step has been explained shortly.

Requirement collection and analysis

Design (Divide and Conquer approach)

Coding (Implementation of design)

Testing

Integration (Modules integration)

Fig. 3.1: Block diagrammatic representation of software life cycle

Life cycle starts with requirement collection. First the designer has to be
clear on what exactly the product is meant for. Requirements have to be collected
with paramount care as missing out one of the requirements would nuisance the
final product. If the product is designed for customer usage, it would be better to
analyze the same sort of products that are already in market, so that the cons of
the existing product can be addressed clearly in the new product. A better start
will lead to a better finish as always.
Coming to the second phase, a product might have to execute so many
operations. For an instance, assume cellular phone. It should be proficient of
making and receiving calls, it would be good if it has a FM radio, Digital camera
and an MP3 player. These are all different ratio and require different expertise
to develop these features. It would be wise if the requirements are now broken
into small modules and assigned to different teams for working on them. Here
divide and conquer policy is being followed and the work is being assigned to the
right people with right expertise. This would also be easier for the managers to
manage the teams. And most importantly as many people work on the product,
total duration required to manufacture would be reduced in a big way.
Design Methodologies, Life Cycle and Modeling of Embedded Systems 33

The third and most important action is coding. Developers should write the
code for the functionalities that they are assigned with. Language can be (C,
CPP or JAVA etc.,) chosen based on the application. Coding has to be written in
a modular way. Also code should have enough comments and there should be
enough scope for upgradation of the code. And it would be better if needless
inclusion of header files is avoided.
Integration comes next. Here all the small modules (Assuming mobile phone
again, modules could be Radio, call handling, messaging, music player etc.)
have to be integrated to get a final end product. During integration great care
should be taken for ensuring that all the functionalities are integrated properly in
such a way that no functionality is getting affected. After integration testing has
to be done at various levels.
The final phase of cycle is testing. The product has to be tested for the
functionality. Initially, developer will do some minimal level of testing. It is referred
as unit testing. But there would be specific testers who can test the product
better at various levels. Software and hardware have to be tested for stability.
There should be set of test cases that have to be run to ensure the working of
the product. And more importantly even after the product has been released it
has to be constantly monitored for the performance. There might be some
complaints that would be raised by consumers. Until the product gets stabilized
in the market, testing has to be done constantly.

3.2 EMBEDDED LIFE CYCLE


Keeping the above in mind, one can move on to the embedded system design
life cycle. There are lots of methodologies that can be followed for designing an
embedded system.
1. Waterfall model
2. Spiral model
3. Consecutive refinement model
4. Rapid Application Development (RAD) model
All the above methodologies are discussed in this chapter.
1. Waterfall Model
The block diagrammatic representation of Waterfall model is shown in
Fig. 3.2. The lines quoted in the backward direction have to be forgotten initially.
Explanation of it would be given below.
The flow will remain the same as normal software life cycle. But, there is
a very big drawback in the conformist approach. No attention is being given at
each step if the requirement has been met successfully. For example, if 3rd
34 Embedded Systems

phase is being carried out, after completion of the phase in this model there is no
scope for checking with previous couple of phases if the target has been achieved
accurately.
But considering the lines that are drawn in backward direction the above
problem can be addressed easily. After the completion of phase 2, one should go
back and check with phase one if the requirements that have been quoted at
phase 1 are met in phase 2. So here even if one of the requirements is missed
out it could be speckled and corrections can be done immediately instead of
checking all the things at the end which would require lots of human efforts and
money.

Requirement
collection and analysis

Design
(Divide and Conquer approach)

Coding (Implementation
of design)

Integration
(Modules integration)

Testing

Fig. 3.2: Block diagrammatic representation of waterfall model


Advantages
1. Simple method.
2. Flow is easy even for a new entrant to understand the process.
Disadvantages
1. Without backtracking it seems to be highly improbable.
2. Secondly, with backtracking one has to spend a lot of time in each and
every phase.
2. Spiral Model
This is one of the other ways which one can go behind. The diagrammatic
representation is presented in Fig. 3.3.
Design Methodologies, Life Cycle and Modeling of Embedded Systems 35

Testing &
Requirement
Analysis

Fig. 3.3: Block diagrammatic representation of spiral model


(As the spiral gets larger, time required is also getting more)

Here in this model, each and every phase will have to undergo testing and
checking with requirement after the completion of that phase. So it will take lots
of time for each phase to be completed. And spiral gets larger and larger as it
moves on (It is an indication of more time being spent in that phase). It is
because testing will take more time at each phase. After the initial system is
made, designer and testing team should focus more on functionality of the system
and they must ensure if it is working fine and if they are meeting the requirement.
If the testers find some problems, they have to be fixed and again fully fledged
testing has to be carried out. After confirmation the final product can be produced
and could be released to the market.
Advantages
1. Highly realistic and chances for errors are less.
Disadvantages
1. Though it is a realistic model, it takes a lot of time which may affect the
product to reach the market on time.
3. Consecutive Refinement Model
This model is represented in Fig. 3.4.
36 Embedded Systems

Requirement collection and analysis Requirement collection and analysis

Design (Divide and Conquer approach) Design (Divide and Conquer approach)

Coding (Implementation of design) Coding (Implementation of design)

Integration (Modules integration) Integration (Modules integration)

Testing Testing

First time–initial system Slightly refined system–Better refined system

Fig. 3.4: Block diagrammatic representation of consecutive refinement model

When a designer is provided with requirement, the requirement and system’s


expected behaviour may not be clearly proverbial for designer. With few
uncertainties in mind, designer would make the initial system and it would not be
up to the anticipation.
But if the designer has been given chance with working on the same system
again, there would be little comfort for the designer. If this is made consecutive
then the designer would gain lots of proficiency on the flow. First time the initial
system may have few bugs, next the polished system would be there with most
of the bugs fixed, further refinement will cause product very unwavering. Also
the biggest advantage is the designer will understand the product very well.
Advantages
1. Very precise and highly realistic.
2. Product will be bug free.
Disadvantages
1. Highly time intense.
2. Each phase of iteration is rigid with no overlaps.
3. Costly system architecture and design issues may arise.
4. Rapid Application Development Model
RAD is, in essence, the “try before you buy” approach to software development.
The theory is that end users can produce better feedback when examining a live
Design Methodologies, Life Cycle and Modeling of Embedded Systems 37

system, as opposed to working strictly with documentation. RAD-based


development cycles have resulted in a lower level of rejection when the
application is placed into production, but this success most often comes at the
expense of a dramatic overruns in project costs and schedule.
The RAD approach was made possible with significant advances in software
development environments as in Fig. 3.5 to allow rapid generation and change
of screens and other user interface features. The end user is allowed to work
with the screens online, as if in a production environment. This leaves little to
the imagination, and a significant number of errors are caught using this process.

Design

Fig. 3.5: Block diagrammatic representation of Rapid Application


Development model

The down side to RAD is the propensity of the end user to force scope
creep into the development effort. Since it seems so easy for the developer to
produce the basic screen, it must be just as easy to add a widget or two. In most
RAD life cycle failures, the end users and developers were caught in an unending
cycle of enhancements, with the users asking for more and more and the
developers trying to satisfy them. The participants lost sight of the goal of
producing a basic, useful system in favour of the siren song of glittering perfection.
The advantages of this model include—it minimizes feature creep by
developing in short intervals resulting in miniature software projects and releasing
the product in mini-increments. The disadvantage is short iteration which may
not add enough functionality, leading to significant delays in final iterations. Since
Agile emphasizes real time communication (preferably face-to-face), utilizing it
is problematic for large multi-team distributed system development. Agile methods
38 Embedded Systems

produce very little written documentation and require a significant amount of


post-project documentation.

3.3 MODELING OF EMBEDDED SYSTEMS


Modeling a system before making it a product is so mandatory. For an instance,
if a designer is posted with a challenge of making an elevator, designer should
first model it. Elevator needs to be checked for its functionality, if all the required
options are working fine, if the elevator door is not opening when it is in motion,
how the inputs are handled, how would the priority be handled etc., should be
modeled first so that designer will be in a comfort zone when product is made.
In short, and precise modeling will help the designer to understand the system
better and will increase the confidence level. Many modeling techniques are
available. It is up to the designer to make a choice out of it. Here few modeling
techniques are discussed. A typical embedded system looks like the one in the
Fig. 3.6.

Fig. 3.6: A typical reactive real-time embedded system architecture

The order of discussion would be


1. UML (Unified Modeling Language) and
2. FSM (Finite State Machine)
3. Petri net Modeling

1. UML (Unified Modeling Language)


This is a globally followed approach in the industry for modeling a system. UML
is extensively used in software architectures. UML is Unified Modeling
Design Methodologies, Life Cycle and Modeling of Embedded Systems 39

Language. Why is it Unified? The reason is its applicability to many designs and
processes.
The Unified Modeling Language (UML) is used to specify, visualize, modify,
construct and document the artifacts of an object-oriented software intensive
system under development. The standard is managed and maintained by Object
Management Group (OMG). One can understand the complete history of UML
and its success stories by going through the site www.uml.org. Here reader can
comprehend the primitive and basic elements of UML. The basic rudiments of
UML can be listed as follows:
A. Class diagram
B. Object diagram
C. Package diagram
D. Stereotype diagram
E. State diagram and
F. Deployment diagram
Every diagram mentioned in the above list will be discussed in detail below.
A. Class diagram
Class diagram describes the structure of a system by showing the system’s
classes, their attributes, and the way the classes are related to each other. A
simple class diagram is represented in the following Fig. 3.7. From this one can
visualize how a class diagram would be represented. A class diagram would
have a class name, attributes and behaviours of a system.

BANK ACCOUNT Class name

ACCOUNT HOLDER NAME: MR. XYZ


BALANCE AMOUNT: (in $) 10,000
Attributes of
the class
DEPOSIT: (in $)
WITHDRAWAL LIMIT; (in $) 5,000

Fig. 3.7: Class diagram


For writing comments in UML, the following notation as in Fig. 3.8 has to be
followed.
40 Embedded Systems

THIS IS A CLASS DIAGRAM.


AND COMMENT HAS TO BE
GIVEN IN THIS WAY

Fig. 3.8: Comment in UML


B. Object diagram
Object diagram is a depiction that shows a complete or partial view of the
structure of a modeled system at a specific time. Or to define it in a simpler
way, it is an instance of a class. Figure 3.9 represents the way to draw object
diagram.

Object name

PETER: BANK ACCOUNT Class name

ACCOUNT HOLDER NAME: MR. XYZ


BALANCE AMOUNT: (in $) 10,000
Attributes of
the class
DEPOSIT: (in $)
WITHDRAWAL LIMIT; (in $) 5,000

Fig. 3.9: Object in UML

C. Package diagram
Package describes how a system is split up into logical subsystems by showing
the dependencies among these groupings. Its representation is given in Fig.
3.10.
PACKAGE A
CLASS 1

CLASS 2

CLASS 3

Fig. 3.10: Package in UML


Design Methodologies, Life Cycle and Modeling of Embedded Systems 41

D. Stereotype diagram
It is the collection of elements that can be frequently used/invoked. For an instance,
the timer/counter example would be good. It can be related to the functions in C
programming. Diagrammatically the same is represented in Fig. 3.11.

TIMER/COUNTER

Fig. 3.11: Stereotype in UML


E. State diagram
It will clearly specify the states and state transitions of a system. One such
example is given in Fig. 3.12.

Pause

System Running System Paused

Un Pause

Fig. 3.12: State diagram in UML

F. Deployment diagram
It will clearly depict the hardware used in system implementations and the
execution environments and artifacts deployed on the hardware of the system.
The schematic diagram is shown in Fig. 3.13 below.

Browser

User
User

Database Web server


server

Presentation layer
MySQL (Web interface)
database

Database Log life


interface

Fig. 3.13: Deployment diagram in UML


42 Embedded Systems

Next methodology that can be used to model an embedded system is Finite


State Machine approach (FSM). The approach is discussed below.

2. FSM (Finite State Machine)


Finite state machine is a simple modeling approach that is very useful for
understanding a system, however the system may be complex. In simple words,
it is a model composed of number of finite states, the possible transitions between
that states, actions caused due to the transitions and it shall also show what an
action would be taken, etc. The FSM modeling basically will start with one of
the state available. It can be quoted as START in the diagram. It will go through
series of transitions and may finally end with any of the states available. One
example would help the user to understand the concept better. But before getting
into an instance for understanding,
, one needs to know few basic terminologies
of FSM.
A current state is determined by past states of the system. As such, it can
be said to record information about the past, i.e., it reflects the input changes
from the system start to the present moment. The number and name of the
states typically depend on the different possible states of the memory, e.g., if the
memory is three bits long, there are 8 possible states. A transition indicates a
state change and is described by a condition that would need to be fulfilled to
enable the transition. An action is a description of an activity that is to be
performed at a given moment. There are several action types:
R Entry action—which is performed when entering the state.
R Exit action—which is performed when exiting from the state.
R Input action—which is performed depending on present state and input
conditions.
R Transition action—which is performed when performing a certain transition.
The current state is basically determined by past states. In short or to be
more precise, it records the changes from the starting point to present state.
Number of states is basically governed by number of bits of the memory, for an
example, if memory is 4 bits long, there would be 16 possible states.
Other than state, there are few other terms as well such as Transition and
action. A transition is an indication of a state change that has happened as a
result of meeting a condition. An action is simply a description of an activity that
has to be performed at a given point in time. Actions can be performed at entry,
exit, transition and input.
There are two different groups of state machines: Acceptors/Recognizers and
Transducers.
Design Methodologies, Life Cycle and Modeling of Embedded Systems 43

Acceptors and Transducers produce a binary output, saying either yes or no


to answer whether the input is accepted by the machine or not. All states of the
FSM are said to be either accepting or not accepting. At the time when all input
is processed, if the current state is an accepting state, the input is accepted;
otherwise it is rejected. The concept can be better understood by referring to
Fig. 3.14, FSM works basically based on the combination of CURRENT STATE
and INPUT to determine the next state. The following is the simplest
representation for understanding FSM.

Current state

State A State B State C


Condition ↓
Condition X
Y ... ... ...

Condition Y ... State C ...


Condition Z ... ... ...

Fig. 3.14: FSM principles

An Elevator has been taken as an example here and it is represented in FSM


format. Figure 3.15 shows the same.

An Elevator's functionality, represented in


the form of FSM. (Assume that there are 3
floors where the elevator has been deployed)

Button press > Current floor


Motion UP

Door
Door should
should bebe kept
keptopen
openforfor15
15seconds.
seconds.There
therewill
will
be a timer
a timer for checking
for checking thisthis functionality
functionality
Button press = Current floor
State 0 / idle Open Door

Button press < Current floor Motion Down

Fig. 3.15: FSM example (elevator)

Assume that an elevator has been deployed in 3 floored building (including


ground floor). Elevator initially shall be in idle state in any one of the three floors
(Would be in the floor where it was last used, so shall be ground floor mostly).
Elevator is an Embedded System that only does moving up and down based on
the request given by the user. Assume elevator is in ground floor and user got
into elevator and has given request to move to 2nd floor. By taking the above
diagram as reference, one can understand the scenario.
44 Embedded Systems

From the idle state after receiving the input from user, the controller designed
for the elevator will compare the input with current floor number. If greater, it is
overt that the elevator has to move up. And great care should be taken that
door should not open when the lift is in motion. So disabling the door open
function during motion is much appreciable. Next point as in figure, after reaching
the floor the door has to be kept open for some time (15 seconds) so that user
can step out and will keep it open until a new request comes in. And in the
meantime if at all there is an input from different user for bringing the elevator
to ground floor, it will follow the same mechanism as discussed above. Most
importantly do not change the direction of lift when there is no request for
moving up or down! Now reader can take a close look at elevator FSM
representation. It will be easier to understand the diagram.
The different applications of Finite State Machine in hardware and software
are discussed in lengths below.
Hardware Applications
In a digital circuit, an FSM may be built using a programmable logic device, a
programmable logic controller, logic gates and flipflops or relays. More
specifically, a hardware implementation requires a register to store state variables,
a block of combinational logic which determines the state transition, and a second
block of combinational logic that determines the output of an FSM. One of the
classic hardware implementations is the Richard’s controller.
Mealy and Moore machines produce logic with asynchronous output,
because there is a propagation delay between the flipflop and output. This causes
slower operating frequencies in FSM. A Mealy or Moore machine can be
convertible to a FSM which is given output directly from a flipflop, which makes
the FSM run at higher frequencies. This kind of FSM is sometimes called
Medvedev FSM. A counter is the simplest form of this kind of FSM.
Software Applications
The following concepts are commonly used to build software applications with
finite state machines:
• Automata-based programming
• Event driven FSM
• Virtual FSM (VFSM)
A question may now arise in the reader’s mind that which methodology is
to be preferred for modeling. It is purely based on one’s comfort and expertise
with the method.
Design Methodologies, Life Cycle and Modeling of Embedded Systems 45

3. Petri net Modeling


Petri net is primarily used for studying the dynamic concurrent behaviour of
network-based systems where there is a discrete flow. Petri nets have a wide
spread applications in Academic, Industrial and other areas as well.
Before jumping in depth into Petri net modeling, the basic units of Petri
nets should be known well. The basic elements are Places and Transitions. An
arc is the third term being used, which exists only from a transition to place or
place to a transition. Figure 3.16 has the representations of arc, transition and
places respectively.

p t p

Fig. 3.16: Place, transition and arc

From the above figure, places are always represented with circles;
transitions are represented by bars, arc by arrows and tokens by dots.
Properties of Petri nets
Before using Petri net for modeling, it is very much necessary to know the
properties of it.
(a) Sequential Execution
One simple example can be taken to make this property understood. In Alphabets,
letter C will come only after A and B are encountered. Same is the case here.
Transition t2 can fire only after t1 has completed its firing. Order of precedence
is very vital and has become inevitable here. Figure 3.17 reveals the same here.

p1 t1 p2 t2 p3

Fig. 3.17: Sequential execution

(b) Synchronization
This property is simple where transition t1 will be enabled only when at least one
token is there at each of its input places.
(c) Merging
When there are several tokens arriving from several places for service at same
transition, merging happens. Figure 3.18 is showing the same.
46 Embedded Systems

t1

Fig. 3.18: Merging

(d) Concurrency
This is one of the most important properties of Petri net which can be very
easily understood from the following snap shot Fig. 3.19.

t1

t2

Fig. 3.19: Concurrency

Here in this case shown, t1 and t2 are concurrent. This makes Petri net to be
used to model systems of distributed control with multiple processes executing
concurrently in time.
(e) Conflict
Again with a diagrammatic representation it would become easier to elaborate
on this property. From Fig. 3.20, it can be clearly seen that t1 and t2 both are
ready to fire, but the firing of any leads to disabling the other transition. This is
like a deadlock situation which is referred as conflict.
Design Methodologies, Life Cycle and Modeling of Embedded Systems 47

t1

t2

t1

t2

Fig. 3.20: Conflict

Since the properties are well known now, reader can be exposed to a real time
implementation situation. A chocolate vending machine can be taken as an
example here in Fig. 3.21.
User wishes to get a 15C chocolate from the machine. He deposits 5C + 5C + 5C
which is represented in the following figures.

Take15c
Take 15Cbar
bar Deposit
Deposit 10C
10c
5C
5c 15C
15c

Deposit
Deposit5C5c

Deposit
Deposit5C5c Deposit 5c
Deposit 5C
0C
0c Deposit5c
Deposit 5C

Deposit
Deposit10C
10c
10C
10c 20C
20c
Deposit
Deposit10C
10c
Take 20c
Take 20Cbar
bar

Fig. 3.21: Initial setup for chocolate vending machine


48 Embedded Systems

Take 15C bar


15c bar Deposit
Deposit10C
10c
15C
15c
5C
5c

Deposit
Deposit5C
5c

Deposit
Deposit5C
5c Deposit
Deposit5c
5C Deposit 5c
Deposit 5C
0C
0c

Deposit
Deposit10C
10c
20C
20c
10C
10c
Deposit
Deposit10C
10c
Take
Take 20C bar
20c bar

Fig. 3.22: User deposits first 5C coin

Take
Take15C
15c bar
bar Deposit
Deposit10C
10c
5C 15C
15c
5c
Deposit
Deposit5C5c

Deposit 5c
Deposit 5C Deposit
Deposit5c
5C Deposit5c
Deposit 5C
0C
0c

Deposit
Deposit10C
10c
10C
10c 20C
20c
Deposit 10c
Deposit 10C
Take
Take 20C bar
20c bar

Fig. 3.23: User deposits second 5C coin


Design Methodologies, Life Cycle and Modeling of Embedded Systems 49

Take
Take15C
15c bar
bar Deposit
Deposit10C
10c
15C
15c
5C
5c
Deposit
Deposit5C5c

Deposit
Deposit 5C
5c Deposit 5c
Deposit 5C
0C
0c Deposit
Deposit5c
5C

Deposit10C
Deposit 10c
20C
20c
10c
10C
Deposit 10C
Deposit 10c
Take
Take20C
20cbar
bar

Fig. 3.24: User deposits third 5C coin

Take 15c
Take 15Cbar
bar Deposit
Deposit10C
10c
15C
15c
5C
5c
Deposit 5c
Deposit 5C

Deposit
Deposit5C
5c Deposit5c
Deposit 5C Deposit5c
Deposit 5C
0C
0c

Deposit
Deposit 10c
10C
10c
10C 20C
20c
Deposit10C
Deposit 10c
Take20c
Take 20Cbar
bar

Fig. 3.25: Consolidated representations

And finally user will get 15 C chocolate. All the sequences have been represented
in the snapshots as shown in the above figures. Advantage of Petri net modeling
is that it gives visual effect and it is very simple to understand.
50 Embedded Systems

3.4 SIMULATION AND EMULATION

3.4.1 Simulation
Simulation is the imitation, replication of some real thing or a process. In short, it
is a model which can stand as a substitute for real system. One can test the
simulated model in angles by providing sample inputs and can expect outputs
from the simulated model. As real system can be very expensive, this is the
most commonly used way for testing and learning the system. The term simulation
is used in many contexts as simulation of a technology, safety engineering, training
and education.
One area where simulation is being used extensively is in Aviation. A pilot
before being handed with a flight will be first trained in flight simulator. It will
give the trainee lots of options to learn and at one point in time it will also
increase confidence of the trainee so that flight can be handled in real time. And
most importantly, a pilot trainee cannot practice in real flight as it would be very
risky and secondly it would be expensive as well. Microsoft has its very famous
flight simulator which has lots of options available in it which is useful for trainees
to get familiar with the system. Another classical example is testing the elevator.
An elevator before final implementation has to be tested with all possible
combinations and designer needs to ensure that request first raised has to be
responded first. Also designer should make sure all the security aspects are
fulfilled. Door of the elevator should never be opened when the elevator is
moving. All aspects of these kinds have to be tested and safety of the users
must be ensured. Simulation will help in testing all these strategies.
Short simulation is used when a real time system cannot be engaged or it
may not be accessible easily or it might be risky to use it or the system might not
be ready or system is not at all available for using it.
Coming to Embedded Systems side, Simulators play a vital role. A
programmer will not be provided with microcontrollers all the times. So a simulator
will help the programmer by simulating a microcontroller and its functions. Simple
and famous simulator that almost all the programmers use is Keil Simulator.
Some simulators go even a step better by including even the peripherals for
simulation. One hard fact to be accepted is, irrespective of speed of the PC in
which simulator installed, no simulator would be capable of simulating a
microcontroller’s behaviour in real time. Also simulating external events is a
tougher task. So one can come to a conclusion that simulator is best suited for
algorithms.
Advantages
1. First and foremost, it is not risky in any aspect to use it. Even if user
commits a mistake, it would never kill or injure the user.
Design Methodologies, Life Cycle and Modeling of Embedded Systems 51

2. Varieties of simulators are readily available in the market. So setting up


an environment for a system is not a tough task. Only thing is that certain
simulators might be expensive.
3. All possible combinations can be tested with simulator, which may not be
possible with real system.
Disadvantages
1. All combinations and possibilities cannot be tested with simulators.
2. Certain simulators are very expensive, which may not be affordable.
3. However strong the simulator is, physical testing is required.
4. Sometimes, setting up a simulator may take time.

3.4.2 Emulation
Having discussed on the demerits associated with simulator, emulator comes in.
An Emulator duplicates (emulates) the functions of one system using another
system, the second system will behave as the previous one. This is actually in
contrast to simulation which can concern an abstract model of system getting
simulated. Emulator to a better extent has overcome the problems related to
simulation. It can be faster as well as perfect. An emulator is a piece of hardware
which has behaviours similar to that of a real microcontroller with all its functionality
being inbuilt. A microcontroller’s behaviour can be emulated in real time.
In Circuit Emulator (ICE) is a commonly available emulator for Embedded
Systems. An ICE is very handy in debugging the software of an Embedded
System. The programmer can use ICE to load the code into Embedded System,
run them and can even do step through action (by keeping breakpoints in the
code) and can view and change the data used by system software. Emulator
can provide an interactive user interface for the programmer to investigate and
control the embedded system. Simply source code level debugger can be used
with a graphical window interface that is communicating through an emulator to
a target Embedded System which will now make the debugging easier and
comfortable. The most common problem being faced in embedded systems is
that they lack in intimating software failures. This issue can be fixed to a greater
extent with ICE. ICE helps the programmer to test the code in small pieces and
eventually can isolate the bugged area of the code. ICE provides execution
breakpoints, memory display, memory monitoring and peripheral control. Also
an ICE can be programmed to look for a specific condition and can identify it as
a fault.
Advantages
1. Faster comparing simulation.
2. No need to setup a special environment for using emulators.
52 Embedded Systems

Disadvantages
1. They are faster than simulators, but not as fast as real system.
2. Certain emulators are expensive which might trouble the user.

POINTS TO REMEMBER
1. Software life cycle has a sequence of five steps as: 1. Requirement
collection and analysis, 2. Design, 3. Coding, 4. Integration and 5. Testing.
2. Divide and conquer is the best approach to build a system.
3. Testing has to be carried out extensively before delivering the product
to market.
4. Embedded system can be built based on any of the following life cycles
as Waterfall model, Spiral model, Consecutive refinement model and
Rapid Application Development (RAD) Model.
5. Most followed and simple model for building an Embedded System is
Waterfall model (Provided, it should support back tracking).
6. Modeling a system before making a prototype will reduce the chances
of bugs in product and also will increase confidence of the designer.
7. Embedded systems can be modeled using UML or FSM or Petri nets.
Selection of the methodology is purely based on user’s convenience
and knowledge about the modeling technique preferred.
8. Basic components in UML are Class diagram, Object diagram, Package
diagram, Stereotype diagram, State diagram and Deployment diagram.
9. Petri nets have few important properties as Sequential Execution,
Synchronization, Merging, Concurrency and Conflict.
10. Simulation and emulation will help the designer to simulate the proposed
system and test it with all possible inputs. An example would be Aircraft
landing gear system.

Review Questions
1. Which model will you opt for building an embedded system? Justify the
reason behind your selection.
2. What is the advantage associated with Spiral Modeling?
3. Why should someone go for a modeling approach before implementation?
4. Can FSM be deployed only to model Embedded Systems? Justify.
5. Modeling will be useful for understanding the system requirements well.
Are there any other benefits with modeling?
Design Methodologies, Life Cycle and Modeling of Embedded Systems 53

6. How is state represented in UML?


7. What are the disadvantages associated with petri net modeling?
8. Differentiate simulator and emulator.
9. Emulator or simulator—Which one would be preferred?

3.5 QUIZ
1. For designing an embedded system which of the following modeling is
preferred?
(a) Waterfall model (b) Spiral model
(c) RAD model (d) Consecutive refinement model
2. The most important phase in software life cycle is
(a) Integration (b) Design
(c) Testing (d) Coding
3. A very important advantage associated with simulation is
(a) Improved safety and reliability (b) Reduced cost factor
(c) Reduced defects (d) Reduced setting up time
4. A major problem with simulation is
(a) Simulators are sometimes expensive
(b) Setting up a simulator may take time
(c) Cannot test with all possible inputs
(d) Though simulated, needs a physical system to test completely.

Answers for Quiz


1. (a)
2. (b)
3. (a)
4. (c)
4
Layers of an
Embedded System

Learning Outcomes
R Basic notion about layering
R To get insight about middleware
R To study the basics of all the layers
R Recap
R Quiz

4.1 INTRODUCTION
Those who have studied Computer Networking would be very familiar with the
concept of layering. The famous ISO-OSI layering is the reference model for
most of the networks. Similarly, each and every Embedded System that we
have seen as examples in the first chapter follow a basic structure, whose
pictorial depiction is given in Fig. 4.1.

Application Layer
Application Software Layer
(Optional)

Middleware System Software Layer


(Optional)
Abstraction Layer 1 and 2

Hardware Layer
(Optional)

Embedded controller Board

Fig. 4.1: The structure of an embedded system

As one can see from the above diagram, an embedded system will be generally
composed of 3 layers,
Layers of an Embedded System 55

1. An optional application software layer.


2. An optional system software layer.
3. A mandatory hardware layer.
Let’s briefly discuss about each of the above layer in the following sections.
4.2 NEED FOR LAYERING
It is just a way of Divide and Conquer! When a big program is divided into
modules, it will be easy to understand and easier to troubleshoot at tough times.
Likewise, we are following the layered approach, which enhances the
understanding on the flow of the working!
This is mainly because the various modules (elements) within this type of
structure are usually functionally independent. These elements also have a higher
degree of interaction, thus separating these types of elements into layers improves
the structural organization of the system without the risk of oversimplifying
complex interactions or overlooking required functionality.
The hardware layer contains all the major physical components located on
an embedded board, whereas the system and application software layers contain
all of the software located on and being processed by the embedded system.
4.2.1 The Hardware Layer
• In embedded devices, all the electronic’s hardware resides on a board,
also referred to as a Printed Wiring Board (PWB) or Printed Circuit
Board (PCB).
• PCBs are often made of thin sheets of fiberglass.
• The electrical path of the circuit is printed in copper, which carries the
electrical signals between the various components connected on the board.
• All electronic components that make up the circuit are connected to this
board, either by soldering, plugging into a socket, or some other connection
mechanism.
• All of the hardware on an embedded board is located in the hardware
layer of the Embedded Systems Model.
Pictorial representation of an embedded system model is given in Fig. 4.2.

Fig. 4.2: An embedded system model


56 Embedded Systems

The major hardware components of most boards can be classified into five
major categories:
1. Central Processing Unit (CPU)—the master processor
2. Memory—where the system’s software is stored
3. Input Device(s)—input slave processors and relative electrical
components
4. Output Device(s)—output slave processors and relative electrical
components
5. Data Pathway/Bus—interconnects the other components, providing
a “highway” for data to travel on from one component to another,
including any wires, bus bridges, and/or bus controllers.
These five categories are based upon the major elements defined by the Von
Neumann model , a tool that can be used to understand any electronic device’s
hardware architecture. The Von Neumann model is a result of the published
work of John Von Neumann in 1945, which defined the requirements of a general
purpose electronic computer. Since embedded systems are a type of computer
system, this model can be applied as a means of understanding embedded systems
hardware.
The way the buses connect the above mentioned components can be seen
in the following figure.

Embedded System Board

Controls Usages and Manipulation of Data Master Processor

55 System
SystemComponents
Components Commonly
Commonly connected
Connected via Buses
VIA Buses

Data From CPU or Input Devices


Stored in Memory Until a CPU or Memory
Output Device Requests

Brings Data into the


Embedded System Input Output Takes Data out of the
Embedded System

Fig. 4.3: Connecting the components with buses


Layers of an Embedded System 57

As depicted, the input is fed to the embedded board first, which is then
stored at the memory unit that will be accessed by the processor for performing
manipulations. Then the processor stores the result again in the memory that
will be finally given out to the output device for display!
The specifications of the hardware components such as the processor input
and output devices are given in Fig. 4.4.

Fig. 4.4: Specifications of hardware components

4.2.2 The System Software Layer (or simply, the OS layer)


The OS is a set of software libraries that serves two main purposes in an
embedded system:
1. Providing an abstraction layer for software on top of the OS to be less
dependent on hardware, making the development of middleware and
applications that sit on top of the OS easier.
2. Managing the various system hardware and software resources to ensure
the entire system operates efficiently and reliably.
As shown in the Fig. 4.5, the OS sits over either the hardware, or over the
device driver layer or over a BSP (Board Support Package).
58 Embedded Systems

Application Software Layer Application Software Layer Application Software Layer

System Software Layer System Software Layer System Software Layer

Operating System Layer Middleware Layer Operating System Layer

Board Support Package Layer Operating System Layer Middleware


Device Drivers
Device Drivers Device Drivers Layer

Hardware Layer Hardware Layer Hardware Layer

Fig. 4.5: Positioning the OS

How does an OS for an embedded system differ from the other systems?
Embedded OSes vary in what components they possess; all OSes have a kernel
at the very least. The kernel is a component that contains the main
functionality of the OS,
• Process Management.
• Interrupt and error detection management.
• The multiple interrupts and/or traps generated by the various
processes need to be managed efficiently so that they are handled
correctly and the processes that triggered them are properly
tracked.
• Memory Management.
• I/O System Management.
Figure 4.6 shows the structure of an embedded OS.

Embedded OS
Middleware (Optional)

KERNEL
Process Management Memory Management

I/O System Management

Device Drivers (Optional)

Fig. 4.6: An embedded OS

A special form of OS is an RTOS (Real Time OS) that deals with the tasks
which have stringent time requirements. A part of RTOS called Scheduler keeps
Layers of an Embedded System 59

track of state of each task and decides which one should go to the running state.
Unlike UNIX or Windows, the scheduler in RTOS are 100% simpleminded
about which task should get the processor. They simply look at the priorities you
assign to the tasks, and among the tasks that are not in the blocked state, the one
with highest priority runs, and rest of them will wait in Ready State. If a high
priority task hogs the microprocessor for a long time while lower priority tasks
are waiting in ready state, the low priority task has to wait. Scheduler assumes
that you knew what you were doing when you set the priorities.

4.2.3 The Middleware


The definition: In the most general terms, middleware software is any system
software that is not the OS kernel, device driver, or application software (Note
that some OSes may integrate middleware into the OS executable).
In short, in an embedded system middleware is a system software that
typically sits on either the device drivers or on top of the OS, and can sometimes
be incorporated within the OS itself. Middleware is a software that has been
abstracted out of the application layer for a variety of reasons.
• One reason is that it may already be included as part of the off-the-shelf
OS package.
• Other reasons to remove it from the application layer are: to allow reusability
with other applications, to decrease development costs or time by
purchasing it off-the-shelf-through a third party vendor, or to simplify
application code.
What does a middleware do?
• Middleware is usually a software that mediates between application
software and the kernel or device driver software.
• Middleware is also a software that mediates and serves different application
software.
• Specifically, middleware is an abstraction layer generally used on embedded
devices with two or more applications in order to provide flexibility, security,
portability, connectivity, intercommunication, and/or interoperability
mechanisms between applications.
• One of the main strengths in using middleware is that, it allows for the
reduction of the complexity of the applications by centralizing software
infrastructure that would traditionally be redundantly found in the application
layer.
60 Embedded Systems

• However, in introducing middleware to a system, one introduces additional


overhead, which can greatly impact on scalability and performance. In
short, middleware impacts on the embedded system at all layers.
Types of middleware
Most types of middleware commonly fall under one of two general categories:
1. General purpose
• Meaning, they are typically implemented in a variety of devices,
such as networking protocols above the device driver layer and below
the application layers of the OSI model, file systems, or some virtual
machines such as the JVM.
2. Market specific
• Meaning, they are unique to a particular family of embedded
systems, such as digital TV standard-based software that sits on an
OS or JVM.
After having a look at the middleware, let’s explore the Application Layer.

4.2.4 The Application Layer


The final type of software in an embedded system is the application software.
As shown in Fig. 4.7, application software sits on top of the system software
layer, and is dependent on, managed, and run by the system software.

Fig. 4.7: The application layer

It is the software within the application layer that inherently defines what type
of device an embedded system is, because the functionality of an application
represents at the highest level the purpose of that embedded system and does
most of the interaction with users or administrators of that device, if any
exists. Embedded applications can be divided according to whether they are
market specific (implemented in only a specific type of device, such as video-
Layers of an Embedded System 61

on-demand applications in an interactive digital TV) or general purpose (can be


implemented across various types of devices, such as a browser).

POINTS TO REMEMBER
1. Any embedded system would be composed of a hardware layer, system
software layer and an application software layer.
2. Hardware components include a processor, I/O devices, and the buses.
3. A system software layer is responsible for processing, I/O, memory
management.
4. Middleware layer.
5. Middleware is usually software that mediates between application
software and the kernel or device driver software.
6. Application layer is dependent on the underlying system software layer.

Review Questions
1. Why is layering required and how has layering been done in Embedded
Systems?
2. Refer an immediate example for layering approach in networking. (it’s
simple).

4.3 QUIZ
1. How many layers are there in an embedded system design?
(a) 2 (b) 3
(c) 4 (d) 5
2. Which layer of embedded system accommodates operating system?
(a) Hardware Layer (b) SSL (c) Application Layer.

Answers for Quiz


1. (b)
2. (b)
5 Real Time Operating
Systems (RTOS)
—An Introduction

Learning Outcomes
R Basic Idea on Operating System (OS)
R OS Functionalities
R Introduction to Kernel
• Kernel Components
R Real Time OS (RTOS) – An Introduction
R Comparison of RTOS with General Purpose OS (GPOS)
R Recap
R Quiz

5.1 WHAT IS AN OPERATING SYSTEM?


An Operating System (OS) is a software (programs and data) that runs on
computers (not only computers!) and manages the computer hardware and
renders best possible service for efficient execution of various applications. In
short one could call an OS as a resource manager. OS manages the resource
effectively with having necessity for the user to add more resources to the
system.
If at all, a user is being asked to install 1GB of RAM all of a sudden when
user was handling the system, it would be tough and user would be irritated. OS
takes care of the management of the resources available and it never asks the
user to add additional resources to the system. So to define an OS, one can call
it as a Resource Manager!
Listing out some of the basic and common functionalities of an OS as follows:
• Hiding the details of Hardware.
• User need not know what exactly the hardware components incorporated
in the system are.
Real Time Operating Systems (RTOS)—An Introduction 63

• To allocate resources to the processes


R This is the most important task of an OS. All the tasks are provided
with necessary resources such that no task will raise a complaint for
shortage of resources. For example, a user can try to open a notepad,
media player and a chat window. All the three processes will be running
fine without any disturbance. This is totally taken care by OS. OS
shares the resources effectively here and gets the job done. There
may be several processes running concurrently. OS should ensure
that no process should get more time than its fair share of CPU time.
• Provide excellent user interaction
R User, when using a Windows or Linux PC, will have lots of options
and comfort in accessing the resources. It is because of the OS. It
acts as an excellent interface and it helps out the user with lots of
facilities. All GUI based OS are very luxurious and have enhanced
services to placate the user.
• Command interpretation
R The CPU cannot understand the commands typed by the user. OS
takes that responsibility to translate these commands into a format
that the CPU can understand and work on. For example, in Linux,
user is getting this facility through the Shell.

(Will convert the input given (Response will be again


(Input in English) by user to the format that converted to the format that
CPU can understand) USER can understand)

Fig. 5.1: Command interpretation action of OS

• Peripheral Management
R The OS has to take care of all the peripherals attached to it. One
simple instance, when USB drive (Pen drive) is inserted in the system,
it would recognize it automatically and allow the user to explore and
use the contents. This is called as plug and play feature. Within no
time, OS recognizes the new peripheral being attached.
Here only little functionality has been quoted, OS really does many.

5.2 HOW IS RESOURCE MANAGEMENT CARRIED OUT?


OS carries out resource management as its most important task. Reader first
has to understand the way OS functions as a Transformer.
64 Embedded Systems

Reader has to presume a situation here for understanding the concept of a


transformer. Assume that a folder named A is created in Windows based machine.
One more folder called B is created inside A. Getting in B creates C. Likewise
(A→ B→ C→ D→ … →X→Y→Z) create folder Z inside Y. Now for a user
to reach the final folder Z, 26 clicks are required. This would be tough every
time. So, one simple idea can be followed. Create Shortcut of folder Z and keep
the shortcut in Desktop. So what has OS done here? It has transformed the
available resource into much easier form of accessing it therewith tumbling the
effort of user.
Secondly OS is acting as an Effective Scheduler. Why called so? An
example can be taken for clear understanding. Suppose that user is listening to
music in mobile phone which has MP3 player. By that time a call is being made
to that number and user gets notification and user can carry on with attending
the call. This is what scheduling means. OS automatically has scheduled the
high priority task for execution and kept the low priority task preempted. Without
OS, scheduling can’t be done so precisely. There are lots of scheduling
mechanisms available and that all will be dealt in detail in next few pages.
And finally OS does multiplexing as well. How OS acts as a Multiplexer?
One simple example can be taken to prove this concept. Assume a Windows
based PC for understanding. User can play music in music player. Concurrently
a notepad, an excel sheet, one power point document can also be opened. OS
will now take the multiplexing action. It actually will create copies of processor
and each of the above mentioned process will be given a virtual processor. And
the same is the case for memory as well. If there is a memory shortage, switching
memory for all the processes is also possible. Other than all these things user
should also know about virtual memory and cache memory which are equally
important actions of OS. They will be explained in detail in future chapters.

5.3 WHAT IS KERNEL?


Kernel is the heart of OS which does all the resource management activities.
Kernel has lots of functionalities to be managed and it has got pre-written library
functions for handling these functions. Resource management is done through
services.
1. Process management
2. Peripheral and memory management
3. File management
Diagrammatically the architecture of a common OS is shown in Fig. 5.2.
Now the definition of OS can be slightly redefined. An Operating System
(OS) is a software (programs and data) that has a kernel that runs on computers
Real Time Operating Systems (RTOS)—An Introduction 65

(not only computers!) and manages the computer hardware and renders best
possible service for efficient execution of various applications.

Fig. 5.2: Architecture—A common OS

Reader can now be exposed to the functionalities of kernel in a detailed way.


1. Process Management
Allocation of system resources like memory, processor execution time,
scheduling the processes for the execution (please recall the mobile phone
example wherein the processes were switched). Here when a task moves
from one state to another for example, from ready state to running state,
this action is controlled by kernel. Also the inter task communication
methodologies (Queue, Mailbox, Pipes, Semaphores, Mutex, FIFO etc.)
creation of process, deletion of the process are all managed by kernel.
Detailed explanations on all the inter process/task communication
methodologies are given in next chapter.
2. Memory Management and Peripheral Management
Memory allocation and deallocation is the most important area where
kernel is regarded as highly useful. Kernel manages the memory well. It
finds out and tracks the processes which are using RAM (main memory).
After usage it frees that memory and it is now made available for other
processes. Coming to peripheral management, kernel manages all the
requests. Simple instance would be insertion of USB drive. Kernel
immediately recognizes it and provides the service. Not only USB drive,
other I/O devices are handled in the same way.
One simple question: what is the right term to be used for calling insertion
of USB drive in a linux PC?
Simple, it is called Mounting!
66 Embedded Systems

3. File Management
Every user will have lots of files in his system. And it is blatant that the
files can be of different types. For example, user could have C files, Word
documents, Excel sheets, PPTs and what not? There are huge varieties
of files available and users are making use of it. But Kernel takes the
responsibility of handling these files, allocating memory for the files,
creation of the files and deletion as well. It is regarded as one of the most
important service of kernel.

5.3.1 Kernel Components


A Kernel in an OS is like heart in the human body. In fact, it does all resource
management activities. Kernel is basically composed of so many components
which facilitate the functioning of OS. All the components are well elaborated in
chapter 6. Here in this chapter reader is provided with basic introduction to all
the components and their respective functionality.
A kernel will have many components. Here most important components
are discussed in brief and the same are given detailed explanation along with C
code in Chapter 6.
1. Task Scheduler
An OS may have to handle lots of processes or tasks. So based on priority the
most important or high priority task has to be executed first and that task should
get the attention of processor for getting executed. For this to be accomplished
there are lots of scheduling methodologies available. Kernel has support for
implementation of scheduling algorithms and programmer can select one from
them.
2. Signals
Signals are the fundamental methods of interprocess communication and are
used in everything from network servers to media players. For a process to
communicate with another process signaling is the preferred basic method. This
component will provide a mechanism by which a process may be notified of, or
affected by, an event occurring in the system. A signal is generated at many
occasions and which is really amazing. A signal is getting generated in the
following scenarios:
• When an event occurs (alarm).
• When a peripheral device is ready.
• When there is a bug encountered in software or hardware.
• When an interrupt is raised (CTRL+Z or CTRL+C) etc.
Signals are in-depth discussed in forth coming Chapter on RTOS.
Real Time Operating Systems (RTOS)—An Introduction 67

3. Semaphores
Semaphores are simple but powerful mechanisms for implementing better
resource sharing ability. In all the systems tasks created are in need to share the
resources. Sharing available resource without clash is very important as moving
resource from currently executing task may be seriously affected and at the
same time it may produce unwanted results. If the tasks are independent they
do not share any resources between them, and there will not be a question of
resource sharing problem. This can be compared with two roads that run in
parallel having no need to meet. Semaphores are of two types, (a) binary
semaphore and (b) counting semaphore. Both of these are covered in detail
with good examples in Chapter 6.
4. Message Queues
Message queues provide an asynchronous way of communication possible,
meaning that the sender and receiver of the message need not interact with the
message queue at the same time. Message queue has a wide range of
applications. Very simple applications can be taken as example here.
1. Taking input from the keyboard
2. To display output on the screen and
3. Voltage reading from transducer or sensor etc.
A task which has to send the message can put message in the queue and other
tasks. A message queue is a buffer-like object which can receive messages
from ISRs and tasks and the same can be transferred to other recipients. In
short, it is like a pipeline. It can hold the messages sent by sender for a period
until receiver reads it. And biggest advantage which someone can have in queue
is a receiver and sender need not use the queue on same time. A message
queue has been constructed and executed as an example in Chapter 6. Reader
will have good understanding on executing it.
5. Pipes
Pipe is unidirectional data communication mechanism. It is used to transfer data
in unidirectional way. Also a pipe would be used for establishing communication
between related processes only. If user needs a two way communication, then
two separate pipes have to be constructed. Many RTOSes and OSes provide
inbuilt system calls for constructing pipe. A writer can write from write end of
the pipe and reader can read from read end. If there is a necessity one can go
for creating named pipe which can be used for communication between unrelated
processes.
6. Memory Management
As the need for multiprogramming increases, need for managing the memory
also increases proportionally. Memory in memory management generally refers
to the main memory (RAM) management, where each process/task which has
68 Embedded Systems

to be executed is brought in. The general memory hierarchy in any computer


system is
1. Registers
2. Cache memory
3. RAM
4. Hard disk
5. Flash memory
The hierarchy is framed based on increased storage capacity and decreased
price i.e., in this hierarchy, registers are most costliest and smallest in storage
capacity and flash memory, the cheapest with large storage capacity. Our focus
would be on cache memory and RAM.
Properties of RAM
1. Popularly known as physical memory.
2. Size is lesser than hard disk (for instance, 512 MB RAM<<<120 GB
HDD).
3. So, speeder than hard disk.
4. Costlier when compared to hard disk.
5. Volatile in nature, which means, doesn’t remember anything when the
power is shut down.
6. It holds the most important, operating system (the OS) with itself for
speeder operations.
7. Accessed by CPU for execution of programs.
8. Accessed during DMA (Direct Memory Access) by CPU.
9. Size ranges from 128 MB to 10 GB.
Why memory management?
As explained in the properties of RAM, it is only a limited capacity memory
device. It cannot hold lots and lots of programs as HDD does, which eventually
arises a question. What will happen if the size of the user program is larger than
the RAM’s? Here comes the need for memory management and it is being
provided by Kernel.
All these components are discussed in detail with relevant code in Chapter
6. Reader can practice and go through that section for clear understanding.
Having understood the need for OS one can now move on to understand the
basic details of Real Time Operating System (RTOS). Reader has to recall the
first chapter now. Definition of real time has been touched there.
Real Time Operating Systems (RTOS)—An Introduction 69

5.4 WHY RTOS IS NEEDED?


To start with, there are two varieties of OS. First one, everyone is familiar with
this type. It is General Purpose Operating System (GPOS) and second one is
little new for the reader. It is RTOS. What is the difference between the two?
Spotlight will be thrown for this discussion now.
Windows XP, Linux, UNIX, Ubuntu, Solaris etc. are all coming under the
sanctuary of general purpose operating systems. Why are they called general
purpose? Its kernel is written in such a way that it can handle all sorts of
applications i.e., multipurpose. A user can play music, run C programming compiler,
play games in a same Windows / Linux PCs. As there are lots of processes
being handled there would be a palpable delay in the execution which makes
them unfit for embedded systems. Embedded systems always require real time
behaviour. That is logical correctness in the operation within a specific deadline.
So GPOS would not fit in for life saving, safety critical equipment like Airbag in
automobiles, Antilock braking systems in Automobiles and Pace maker in medical
industry.

5.5 WHAT IS REAL TIME?


Before knowing about RTOS, it would be better to know the basic definition of
what real time is. It is logical correctness of an operation (how closer it is to
expected result, actually it should be 100% giving expected result) within a
deterministic deadline (the maximum allowed timeframe for that action to be
completed, mostly the time frame will be very limited).
Taking an instance would develop the understanding of reader. A car is the
case in point here. All modern cars have advanced braking systems which are
apparently embedded systems. A driver will have no hint on when the brake
has to be applied. If the car is being driven at high speed and in case of an
impediment the vehicle has to be stopped instantaneously. Here comes the point,
the braking system should be ready to accept the input all the times. And once
brake is applied, within no time the immediate braking action has to be taken to
stop the vehicle. This is real time behaviour where stopping the car on time with
perfection is achieved. Delaying by few seconds even will result in catastrophe.
Few more examples can be quoted here which can make reader understand
the definition better.
• Elevator system—it is a simple and excellent example for real time
operation. Any user may press any button any time. The processor should
be capable of handling all the requests in a feasible way.
• Pace maker—It is fit in human body which can monitor heart beat. In
case of heart beat getting low, this pace maker should immediately take
action. This is a life saving action which requires a real time behavior. If
pace maker delays its working by few seconds even, it would be disastrous.
70 Embedded Systems

• ATM machine—It is a perfect example for real time behaviour. Person A


may withdraw 500$ where next person may send a request for withdrawal
of 1000$. It has to handle all these scenarios in a speedy way, i.e., within
deadline. Also the operation has to be performed flawlessly. 1000$ request
should not deliver 500$.
So keeping above examples in mind one can now relate the definition of real
time which has been dealt in beginning of this topic.
So RTOS should have lots of policies and preframed rules that could
effectively help in time management and meeting the deadlines. To be precise,
RTOS helps out in bringing humdrum performance that makes it to be used in all
safety critical systems and medical equipment as well. There are few more
differences between GPOS and RTOS. Those differences are listed in Table 5.1.
Table 5.1: Comparison of RTOS and GPOS

RTOS GPOS
Example Vx Works, Nucleus, Win CE Windows (All variants),
(Windows Compact Embedded) Solaris, Linux, Unix,
Ubuntu etc.,
Memory RTOS has very less memory GPOS has very large
Requirements requirements. Example: Mobile memory requirements.
phone. Just 16 MB internal Even at times it would
memory can accommodate an require 400 to 1 GB of
RTOS in it. memory for it to be
installed. Example: Instal-
lation of windows requires
large memory.
Support and RTOS will not have so much of General purpose OS has
Facilities inbuilt support facilities. It will got lots of support that
have only the required files and can easily be rendered to
support. the user. Examples: plug
and play feature, Auto
play etc.,
Protection To be precise, RTOS has got The story is reverse here.
(From the very less protection for itself Assume a user is using
applications) from the applications. Assume a media player, word
mobile phone is being used for document and an excel
playing song in its MP3 player. sheet concurrently.
Unfortunately if the MP3 player Unexpectedly if media
gets stuck, there is no way for player gets stuck, user
the mobile to just close the MP3 would have the comfort of
player. It would require a just closing the media
complete restart. Most of the player. User would not
mobile users would have lose the data that he has
definitely felt it. typed in excel sheet or
word document.
Contd...
Real Time Operating Systems (RTOS)—An Introduction 71

Response Highly important to complete the It is not as important as


time operation on time. Else may RTOS. Some delay is
result in disasters. permitted as they are not
life saving.

POINTS TO REMEMBER
1. An operating system is a resource manager which manages all the
resources effectively.
2. OS mainly performs as a transformer, multiplexer and scheduler.
3. Heart of a system is OS and heart of the OS is Kernel.
4. Kernel has lots of components which facilitate OS to do resource
management. Semaphores, pipe, message queue, signals, memory
management unit, etc.
5. Real time behaviour differentiates a GPOS from RTOS.
6. Real time is defined as logical correctness of an operation within a
deterministic deadline.
7. RTOS will offer very less luxury to user in terms of memory, protection
from other applications, etc.
8. GPOS is luxurious in terms of memory and support features. Also installing
RTOS won’t take much memory where GPOS will occupy enormous
memory.
9. Response time is not a crucial parameter in GPOS but it is of paramount
importance in RTOS.

Review Questions
1. Define operating system.
2. Demonstrate how OS is playing the role as transformer.
3. What is real time behaviour? How is it important?
4. Why is inter process communication needed?
5. Define Semaphore.
6. Why has memory management got more importance?
7. When is a signal generated?
8. Why does embedded system need real time behaviour?
9. RTOS would not support too much functionality. Why so?
10. Differentiate GPOS and RTOS.
72 Embedded Systems

5.6 QUIZ
1. Which of the following is an RTOS?
(a) Vx Works (b) Linux
(c) Ubuntu (d) Windows XP
2. Which of the following is most important and expected behavior of an
RTOS?
(a) Reduced memory usage (b) Multitasking
(c) Response time (d) Protection from the
applications.
3. Which of the following can’t be installed in an Embedded System?
(a) Windows XP (b) VxWorks (c) WinCE
4. Which of the following is not an embedded system?
(a) Laptop (b) Cellular phone
(c) Washing machine (d) Pacemaker
5. Win. CE is the abbreviation of ________________.

Answers for Quiz


1. (a)
2. (c)
3. (a)
4. (a)
5. Windows Compact Embedded
6 Real Time
Operating Systems
—A Detailed Overview

Learning Outcomes
Note: In this chapter reader will be exposed more on the real time operating
system concepts. In particular, Linux is going to be used for explaining the
concepts. So it would be great if reader can have a Linux based system
while reading. It is very easy to collect a free Ubuntu (Linux) CD. One should
visit shipit.ubuntu.com and should fill in all the mandatory details asked.
Within ten days reader will be provided with the Ubuntu, free Linux CD. User
will be introduced to few basic operating system concepts then will be taken
through the RTOS.
R Linux—An Introduction
• Comparison of Unix and Linux
R Linux File System Architecture
• File descriptors in Linux
• Description with sample program
R RTOS concepts
• Task
• Task states
• Task transitions
• Task scheduling
R Inter Process Communication (IPC) Methodologies
• Pipe
• Named pipe or FIFO
• Message queue
• Shared memory
• Semaphores
• Task and resource synchronization
R Memory management
R Cache memory
74 Embedded Systems

R Dynamic Memory Allocation


R Fragmentation
R Virtual memory
R Context Switching
R Recap
R Quiz

6.1 LINUX—AN INTRODUCTION


In 1991, Linus Torvalds began developing an operating system kernel, which he
named “Linux”. This kernel could be combined with the FSF material and other
components to produce a freely-modifiable and very useful operating system.
This book will term the kernel itself the “Linux kernel” and an entire combination
as “Linux”. Note that many use the term “GNU/Linux” instead for this
combination. Linux is not derived from UNIX source code, but its interfaces are
intentionally like UNIX. Therefore, UNIX lessons learned generally apply to
both, including information on security.

6.1.1 Comparison of UNIX and LINUX


UNIX is copyrighted name only big companies are allowed to use the UNIX
copyright and name, so IBM AIX and Sun Solaris and HP-UX all are UNIX
operating systems. Most UNIX systems are commercial in nature. Linux is a
UNIX Clone. It is just a kernel. All Linux distributions include GUI system, GNU
utilities (e.g., ls, cp, mv, date, bash etc.), installation and management tools, GNU
C/C++ Compilers, Editors (VI) and various applications (e.g., Open Office, Firefox
etc.). However, most UNIX operating systems are considered as a complete
operating systems as everything come from a single source or vendor.
Linux is free. It can be downloaded from the Internet or redistributed under
GNU licenses. There is a best community support for Linux. Most UNIX like
operating systems are not free. However, some Linux distributions such as Red
hat/Novell provide additional Linux support, consultancy, bug fixing, and training
for additional fees.
Linux is considered as most user friendly UNIX like operating systems. It
makes it easy to install sound card, flash players, and other desktop goodies.
However, Apple OS X is most popular UNIX operating system for desktop
usage. Linux comes with open source netfilter/iptables based firewall tool to
protect the server and desktop from crackers and hackers. UNIX operating
systems come with its own firewall product or need to purchase third party
software such as Checkpoint UNIX firewall.
Real Time Operating Systems—A Detailed Overview 75

6.1.2 File System Architecture Details


Generalized file system provides a simple and unified way to access resources.
The basic unit is a file. A file consists of essential data, metadata (data about the
data), nonessential metadata, and some information. Unless the file is a directory,
the information is given “as is” and not analyzed by the file system. Essential
metadata can be edited only by the file system driver and other privileged
programs since improper editing may make the file unusable. Nonessential
metadata contains information useful for indexing systems (the indexing systems
are ordinary programs, and not a part of the file system). Nonessential metadata
have a nested structure.
A directory (also known as a folder) is a file that may contain other files
inside the file. Since the file system is flexible and extensible, different directories
may have different physical implementation. Essential metadata may include
file size; date created, last modified, and last accessed; directory structure; and
special storage properties. Metadata of a directory may apply to files inside the
directory.
A symbolic link is an empty file that points to a file. The link may indicate
either an absolute location or a location relative to the location of the link. Unless
requested otherwise, a reference to a symbolic link is a reference to the file to
which the link points. Files are identified by their path, such as /file_system/
folder/file. For example, name1/name2 identifies file name2 inside the file name1.
Copying the file copies the contents of the identified file to the identified path.
The file may then or during copying be converted to the appropriate structure
for files in that location.
The contents of the root files system must be adequate to boot, restore, recover,
and/or repair the system.
(a) To boot a system, enough must be present on the root partition to mount
other file systems. This includes utilities, configuration, boot loader
information, and other essential start-up data. /usr, /opt, and /var are
designed such that they may be located on other partitions or file systems.
(b) To enable recovery and/or repair of a system, those utilities needed by an
experienced maintainer to diagnose and reconstruct a damaged system
must be present on the root file system.
(c) To restore a system, those utilities needed to restore from system backups
(on floppy, tape, etc.) must be present on the root file system.
76 Embedded Systems

The following directories, or symbolic links to directories, are required in /.


Directory Description
Bin Essential command binaries
boot Static files of the boot loader
Dev Device files
Etc Host-specific system configuration
Lib Essential shared libraries and kernel modules
media Mount point for removable media
mnt Mount point for mounting a file system temporarily
Opt Add-on application software packages
sbin Essential system binaries
Srv Data for services provided by this system
tmp Temporary files
Usr Secondary hierarchy
Var Variable data
Each directory listed above is specified in detail in separate subsections below.
/usr and /var each have a complete section in this document due to the complexity
of those directories.

6.1.3 Types of File Systems in UNIX/LINUX


Linux supports numerous file system types
(a) Ext2: This is like UNIX file system. It has the concepts of blocks, inodes
and directories.
(b) Ext3: It is ext2 filesystem enhanced with journaling capabilities. Journaling
allows fast file system recovery. Supports POSIX ACL (Access Control
Lists).
(c) Isofs (iso9660): Used by CDROM file system.
(d) Sysfs: It is a RAM-based filesystem initially based on ramfs. It is used to
exporting kernel objects so that end user can use it easily.
(e) Procfs: The proc file system acts as an interface to internal data structures
in the kernel. It can be used to obtain information about the system and to
change certain kernel parameters at runtime using sysctl command. For
example, you can find out cpuinfo with following command:
# cat /proc/cpuinfo
(f) Or one can enable or disable routing/forwarding of IP packets between
interfaces with following command:
# cat /proc/sys/net/ipv4/ip_forward
Real Time Operating Systems—A Detailed Overview 77

# echo “1” > /proc/sys/net/ipv4/ip_forward


# echo “0” > /proc/sys/net/ipv4/ip_forward
(g) NFS: Network file system allows many users or systems to share the
same files by using a client/server methodology. NFS allows sharing all of
the above file system.
(h) Linux also supports Microsoft NTFS, vfat, and many other file systems.
See Linux kernel source tree Documentation/filesystem directory for list
of all supported filesystem.
(i) You can find out what type of file systems currently mounted with mount
command:
$mount
OR
$ cat /proc/mounts
A UNIX file system is a collection of files and directories stored. Each file
system is stored in a separate whole disk partition. The following are a few of
the file system:
(a) / — Special file system that incorporates the files under several directories
including /dev, /sbin, /tmp etc.
(b) /usr — Stores application programs
(c) /var — Stores log files, mails and other data
(d) /tmp — Stores temporary files

6.1.4 Basic UNIX Commands


All the following commands are very basic and they have to be known for
someone to gain comfortable access in Linux or Unix.
(a) ls — lists your files.
ls -l — lists your files in ‘long format’, which contains lots of useful
information.
ls -a — lists all files, including the ones whose filenames begin in a dot.
(b) more filename — shows the first part of a file, as much as will fit on one
screen.
(c) emacs filename — is an editor that lets you create and edit a file.
(d) mv filename1 filename2 — moves a file
(e) cp filename1 filename2 — copies a file
(f) rm filename — removes a file.
(g) diff filename1 filename2 — compares files, and shows where they differ.
(h) wc filename — tells you how many lines, words, and characters there
are in a file.
78 Embedded Systems

(i) chmod options filename — change the read, write, and execute
permissions on files.
(j) File Compression
(i) gzip filename — compresses files, so that they take up much less
space.
(ii) gunzip filename — uncompresses files compressed by gzip.
(iii) gzcat filename — To look at a gzipped file without actually having to
gunzip it.
(k) printing :
(i) lpr filename — print.
(ii) lpq — check out the printer queue.
(iii) lprm jobnumber — remove something from the printer queue.

6.1.5 /proc and File Descriptor Table


Linux provides a special file system, /procfs, usually made available as the
directory /proc. It will just give the status information. For example, if you
check /proc/cpuinfo will give you the detail of the processors available. The
same is revealed in following snapshot Fig. 6.1.

Fig. 6.1: /proc with cpuinfo


Real Time Operating Systems—A Detailed Overview 79

From here on /proc is used extensively for getting details of file descriptors
and process control block. Every process will have process id and it will be
updated in /proc file system.
OS maintains a database called Process Control Block which has details
of file descriptors. File descriptors are numbers allocated to all the files in Linux.
Since everything is file in Linux, Input, Output and Error messages are even
denoted by a number and that is referred as file descriptor. All the file descriptors
are updated clearly in a table called file descriptor table in PCB. The structure
of file descriptor table is referred below in Fig. 6.2.

0–stdin

1–stdout

2–stderr

3–file

4–file

Fig. 6.2: File descriptor table

From the picture one can understand that 0 is fd for standard input
(keyboard), 1 is fd for standard output (monitor) and 2 is the number allotted for
standard error message. Since 0,1 and 2 are allotted already any newly created
file will get number after 2. For an instance if a file is created it will get 3 as the
file descriptor and next file created will get 4. Theoretically discussing would
not be sufficient.
So writing a small C code in Linux with C programming will help the reader
to understand the concept in a great way. The following C code aims in creating
2 files. After creation one can manually check the file descriptor table easily
and can check if the files have been allotted with numbers as expected. GNU
tool kit in Linux helps to compile and execute C programs with GCC compilers.
GCC is expanded as GNU C Compiler. The following C code simply creates 2
files txt1.txt and txt2.txt.

// prog1.c
// C code for creating 2 files.
/* creat is the system call used to create new files through code.
While loop in code will execute the code continuously so that user can check
the
80 Embedded Systems

/proc file system. 777 in creat will give read, write and execute permissions
for all the users. This concept has been dealt in a detailed way in the later
part of this chapter.
*/
# include <stdio.h>
int main ()
{
int fd1, fd2;
printf (“ /n THIS WILL CREATE 2 FILES NOW”);
fd1 = creat (“txt1.txt”, 0777);
// creating txt1.txt file through this system call, also file permissions are
// mentioned. Since every function in C will have a return value, this will also
// have the same and here once the file is created, it will return an integer
// which is being referred as file descriptor.
fd2 = creat (“txt2.txt”, 0777);
while(1)
{
}}

Execution procedure
User has to store the file with .c extension and once done the following command
can be issued from prompt.
$ gcc –o prog1 prog1.c
Where
Gcc is – Compiler
Prog1 – Executable file name
Prog1.c – File name.
If there is no compilation error, executable file with name as mentioned
fd_demo will be available for execution. The snapshot is shown in the following
Fig. 6.3 where complete execution cycle has been followed.
Real Time Operating Systems—A Detailed Overview 81

Fig. 6.3: Execution snapshot


First, the file has to be complied with gcc –o prog1 prog1.c. Then
execution can be carried out with ./prog1 with & followed by it. It symbolizes
that the execution is being carried out in the background and due to this background
processing, a process id will be revealed immediately. Here the id is 5720. Now
user can navigate to this process with following command cd /proc/5720, which
shows the process id is updated and shall be available in /proc directory. That
particular process will have lots of associated details with it. If user tries listing
the content of the process directory with ls it can be found that there are numerous
directories. But focus of the program remains at one called file descriptor which
is also present in the directory with name fd. Further navigation into fd will
uncover lot more information, cd fd will help navigation. Then issuance of ls –lrt
(long listing) will uncover the details of the file descriptors. From the snap it can
be seen that 3 and 4 are allotted for newly created files txt1.txt and txt2.txt
respectively. 0,1 and 2 are allotted for standard input, output and error. File
descriptors find extensive usage in IPC mechanisms and it will be helpful for the
reader to understand the concepts in later part of this Chapter after reading
through initial sections.

6.2 RTOS CONCEPTS

6.2.1 Task
Preliminary thing, when someone will come across any Operating Systems (OS)
book is doing task. It is such an important element which can never be neglected.
It can be stated as a basic building block of an RTOS. A task needs to be
created first before being used. Procedure for creating task is different for
different RTOS.
82 Embedded Systems

When creating a task following things have to be issued


1. Task name (A name has to be given for the task)
2. Priority (Every task has a priority, based on priority the task will get
processor time)
3. Stack size (Stack size needed for that task has to be mentioned)
4. OS specific options (Based on preemptive or non-preemptive OS)

6.2.2 Task States


After creating a task it has to be executed. But what if any other higher priority
is getting executed by that time? This question will en route to think on states of
the task. A mobile phone would help reader in understanding the concept better.
Assume a mobile phone is being used for listening music. When the music is
being played if a call is made to that mobile phone, will it still play the music or
priority would be given to incoming call? This is what has been referred to as
task state change. Initially the music task was executing and it was getting the
processor time, but when a much higher precedence task came in, music task
was paused / halted and arriving call task is given the right of way. This state
change is very important for any real time operating system and here all the
possible states would be discussed in detail.
Possible States for a task:
1. Dormant
2. Ready
3. Running and
4. Blocked.
Dormant: This is the opening state of a task created. In other words all the
tasks when created would be in dormant state. In banks, if an account remains
un-operated that is named as dormant, same is the picture with task. If a task
remains unexecuted for a long time or not scheduled for processing for a long
time it is termed as dormant task.
Ready: When a task is created and it can also be executed, there should not be
any other task with higher priority running by that time. If any other task with
higher priority is getting executed by that time, the other task which is created
will have to wait until the higher priority task execution is complete. This state of
the task which can be executed but waiting for other one to complete is called
ready state.
Running: When a task is getting executed, i.e., is being given the processor
time, it is referred to as running state.
Real Time Operating Systems—A Detailed Overview 83

Blocked: When a task is being executed, but at some point in time it requires
some external input, it goes blocked. Assume music is being played and a call is
made to the same mobile. After the call is over music should get resumed from
the place where it was left. If user has to press the play button for it to play the
music then it is said to be in blocked state.

6.2.3 Task Transitions


A simple diagram would be very much helpful for understanding the task
transitions. The transitions are represented in Fig. 6.4.

Dormant

Ready

Task is unblocked
but is not the Task no longer has
the highest priority Task has the
highest-priority task
highest priority

Task is unblocked
and is the
highest-priority Running
Blocked
task

Task is blocked
due to a request
for an unavailable
resource

Fig. 6.4: Task states

A task upon creation will be in dormant state. If there is no other task


running or the currently created task has the highest priority then the created
task will get the processor time and it will be in the running state. In case of
another task having higher priority is running currently, the task created will
have to wait in ready state.
When a task is being executed but it needs some external input, then that
task will move on to blocked state. Once the external input has been obtained
then this task will move to running state if it has the higher priority or no other
task is in running state. If any other task is running that time with higher priority,
blocked task will move on to ready state. Once previously executing process
gets completed, ready state of the waiting task will be changed as running state.
84 Embedded Systems

6.2.4 Task Scheduling


Generally all the RTOS are capable of doing multitasking, which means,
capability of doing many jobs at the same time. When this is possible, the
processor should have some mechanisms for executing the tasks in some manner,
i.e., processor should have some mechanisms for scheduling the tasks for
execution. Scheduling is taken care by the Scheduler in an RTOS.
As already seen, every task will be associated with a state namely, dormant,
ready, running, blocked. In addition, each task will be having its own Task Control
Block (TCB). The scheduler will keep track of all the task states to determine
which task has to be executed next. There can be priorities assigned to tasks,
based on which, the scheduler will select a task for execution. From the pool of
tasks in the ready state, one with the highest priority is chosen by the scheduler
for execution and brings that task to the running state. Scheduling increases the
CPU utilization, when one task is blocked for an I/O operation, CPU should not
be kept idle, instead it should be engaged with some other work. Scheduler
helps improving CPU utilization by letting a task in the ready queue go to the
running state soon after a running task is made to wait for some reason.
There are two types of scheduling:
1. Preemptive
2. Non-preemptive.
If a task is forcibly removed from running state to waiting state or to ready
state, else if the CPU is pulled out from the running task and given to some other
task, the task is said to be preemption. Non-preemption is just the opposite, the
running task is never disturbed, instead, it is allowed to execute till it finishes.
Other tasks have to wait in the ready queue to get the processor. Starvation is
the key concept in choosing the scheduling method. No task should starve to get
the CPU. Prolonged starvation leads to the giving up of tasks. In Non-preemptive
scheduling, starvation is avoided as no priority is given to tasks. On the other
hand, there may be some starvation associated with the preemptive scheduling
methods.
Preemptive scheduling methods:
1. Shortest Job First (SJF)
2. Priority scheduling
3. Round robin
Non-preemptive scheduling methods:
1. First Come First Serve (FCFS)
2. No priority/Fairness scheduling
Real Time Operating Systems—A Detailed Overview 85

Priority/preemptive scheduling
(a) Shortest Job First (SJF)
The task which arrives early is let run irrespective of its execution time. Once
some other task arrives in the ready queue, its execution time is compared with
the one which is being executed. If the new task’s execution time is lesser than
the running task’s, the running task is preempted and moved to the ready queue,
also the one with lesser execution time is dispatched to the running state from
the ready state.
Example:
Table 6.1: SJF Scheduling

Task number Arrival time (in ms) Execution time (in ms)

T0 0 30

T1 3 21

T2 7 14

As shown in Table 6.1, T0 is arriving at time 0 and it’s the first task to arrive, so
it is allowed to execute. At time 3, another task T1 arrives with lesser execution
time than T0, so T0 is preempted and T1 is executed. Again, at time 7, yet
another task arrives with least execution time than T0, T1. So, it gets the highest
priority and it preempts T1. After T2 finishes, T1 resumes, then T0 resumes.
//C code for SJF (non-preemptive)
#include<stdio.h>
#include<conio.h>
#include<process.h>
void main ()
{
char p[10][5],temp[5];
int tot=0,wt[10],pt[10],i,j,n,temp1;
float avg=0;
clrscr ();
//get number of processes
printf(“enter no of processes:”);
86 Embedded Systems

scanf(“%d”,&n);
for(i=0;i<n;i++)
{
//for each process get the process name, burst/execution time
printf(“enter process%d name:\n”,i+1);
scanf(“%s”,&p[i]);
printf(“enter process time”);
scanf(“%d”,&pt[i]);
}
//compare the Burst times of processes and sort them in ascending order
//and execute the processes in that order
for(i=0;i<n–1;i++)
{
for(j=i+1;j<n;j++)
{
if(pt[i]>pt[j])
{
temp1=pt[i];
pt[i]=pt[j];
pt[j]=temp1;
strcpy(temp,p[i]);
strcpy(p[i],p[j]);
strcpy(p[j],temp);
}
}
}
//waiting time of first process is zero!
wt[0]=0;

//find the waiting times of the other processes


for(i=1;i<n;i++)
{
Real Time Operating Systems—A Detailed Overview 87

wt[i]=wt[i–1]+et[i–1];
//find the total waiting time
tot=tot+wt[i];
}
//find the average waiting time
avg=(float)tot/n;
//print all the values
printf(“P_name\t P_time\t w_time\n”);
for(i=0;i<n;i++)
printf(“%s\t%d\t%d\n”,p[i],et[i],wt[i]);
printf(“total waiting time=%d\n avg waiting time=%f ”,tot,avg);
getch();
}

OUTPUT:
enter no of processes: 5
enter process1 name: aaa
enter process time: 4
enter process2 name: bbb
enter process time: 3
enter process3 name: ccc
enter process time: 2
enter process4 name: ddd
enter process time: 5
enter process5 name: eee
enter process time: 1
P_name P_time w_time
eee 1 0
ccc 2 1
bbb 3 3
aaa 4 6
ddd 5 10
88 Embedded Systems

total waiting time = 20


avg waiting time = 4.00

(b) Priority scheduling


In the SJF, priority was based on the tasks’ execution time. Here, each task is
explicitly given a priority by the processor. Based on the priority, preemption
happens. When a higher priority task is waiting at the ready queue and a lower
priority task is running, the higher priority task preempts the lower priority task.
Example:
Table 6.2: Priority Scheduling

Task number Arrival time (in ms) Priority

T0 0 3

T1 3 2

T2 7 1

As shown in Table 6.2, T0 is the first task, it’s allowed to run. At time 3, T1
arrives with a higher priority than T0, which preempts T0 and gets the CPU. At
time 7, T2 comes with the least priority than the other two, so it preempts T1
and gets the CPU. Once T2 is done, T1 resumes, then T0 resumes.
// C code for Priority Scheduling:
#include<stdio.h>
#include<conio.h>
#include<process.h>
void main()
{
char p[10][5],temp[5];
int tot=0,wt[10],pt[10],i,j,n,temp1;
float avg=0;
clrscr();
//get number of processes
printf(“enter no of processes:”);
scanf(“%d”,&n);
//for each process get the process name, burst/execution time
Real Time Operating Systems—A Detailed Overview 89

for(i=0;i<n;i++)
{
printf(“enter process%d name:\n”,i+1);
scanf(“%s”,&p[i]);
printf(“enter process time”);
scanf(“%d”,&pt[i]);
}
//compare the priorities of processes and sort them in ascending order
//and execute the processes in that order

for(i=0;i<n–1;i++)
{
for(j=i+1;j<n;j++)
{
if(pt[i]>pt[j])
{
temp1=pt[i];
pt[i]=pt[j];
pt[j]=temp1;
strcpy(temp,p[i]);
strcpy(p[i],p[j]);
strcpy(p[j],temp);
}
}
}
//waiting time of first process is zero!
wt[0]=0;

//find the waiting times of the other processes


for(i=1;i<n;i++)
{
wt[i]=wt[i–1]+et[i–1];
//find the total waiting time
90 Embedded Systems

tot=tot+wt[i];
}
//find the average waiting time
avg=(float)tot/n;
//print the values
printf(“p_name\t P_time\t w_time\n”);
for(i=0;i<n;i++)
printf(“%s\t%d\t%d\n”,p[i],et[i],wt[i]);
printf(“total waiting time=%d\n avg waiting time=% f”,tot,avg);
getch();
}

OUTPUT:
enter no of processes: 5
enter process1 name: aaa
enter process time: 4
enter process2 name: bbb
enter process time: 3
enter process3 name: ccc
enter process time: 2
enter process4 name: ddd
enter process time: 5
enter process5 name: eee
enter process time: 1
p_name P_time w_time
eee 1 0
ccc 2 1
bbb 3 3
aaa 4 6
ddd 5 10

total waiting time = 20


avg waiting time = 4.00
Real Time Operating Systems—A Detailed Overview 91

(c) Round Robin / Time slicing method of scheduling


It is also a preemptive scheduling method. As the name implies, tasks are chosen
for execution in a round robin fashion. The ready tasks are executed one by one
for a fixed amount of time called a time slice. When that time slice is expired,
the task is moved to the tail end of the ready queue and the task in the front end
of the queue is run next. This task continues till all the tasks are executed.
Example: Time slice: 10 ms
Table: 6.3: Round Robin Scheduling

n
Task Number Executing time (in ms)

T0 30

T1 21

T2 14

Here as referred in Table 6.3, it’s given that the time slice is 10 ms. The execution
would be as follows:
First T0 will execute for 10 ms. Then its time slice is expired, then T1, T2
executes successively. After the first round of execution, the remaining execution
times of T0, T1, and T2 would be 20,11,4 respectively. As one can observe, T0,
T1 would need 2 more rounds, T2 needs only one round to finish its execution.
//C code for round robin scheduling:
#include<stdio.h>
#include<conio.h>
#include<process.h>
#include<string.h>
void main()
{
char p[10][5];
//timer denotes time slice
int et[10],wt[10],timer=3,count,pt[10],rt,i,j,totwt=0,t,n=5,found=0,m;
float avgwt;
clrscr();
//get the process name, burst time for 5 processes
for(i=0;i<n;i++)
{
92 Embedded Systems

printf(“enter the process name : ”);


scanf(“%s”,&p[i]);
printf(“enter the processing time : ”);
scanf(“%d”,&pt[i]);
}
m=n;
wt[0]=0;
i=0;
do
{
//check if the burst time is larger than the time slice
if(pt[i]>timer)
{
rt=pt[i]–timer;
strcpy(p[n],p[i]);
pt[n]=rt;
et[i]=timer;
n++;
}
//if burst time is lesser than or equal to the time slice, it means that execution
will
//finish in a time slice!
else
{
et[i]=pt[i];
}
i++;
//find the waiting time
wt[i]=wt[i–1]+et[i–1];
}while(i<n);//do the above process n times

count=0;
Real Time Operating Systems—A Detailed Overview 93

for(i=0;i<m;i++)
{
for(j=i+1;j<=n;j++)
{
if(strcmp(p[i],p[j])==0)
{
count++;
found=j;
}
}

if(found!=0)
{

wt[i]=wt[found]–(count*timer);
count=0;
found=0;
}
}
//find the total waiting time
for(i=0;i<m;i++)
{
totwt+=wt[i];
}
//find the average waiting time
avgwt=(float)totwt/m;
//print the values
printf(“p_name\tp_time\tw_time\n”);
for(i=0;i<m;i++)
{
printf(“\n%s\t%d\t%d”,p[i],pt[i],wt[i]);
}
94 Embedded Systems

printf(“\ntotal waiting time: %d\n”,totwt);


printf(“total avgtime: %f”,avgwt);
}
OUTPUT :
enter the process name : aaa
enter the processing time : 4
enter the process name : bbb
enter the processing time : 3
enter the process name : ccc
enter the processing time : 2
enter the process name : ddd
enter the processing time : 5
enter the process name : eee
enter the processing time : 1
p_name p_time w_time
aaa 4 9
bbb 3 3
ccc 2 6
ddd 5 10
eee 1 11

total waiting time : 39


average waiting time : 7.8000

Non-preemptive scheduling
(a) First Come First Serve
The best example for this type of scheduling would be the First Come First
Serve method. It is a relatively simple concept to implement as well as to
understand. The idea is to let the tasks execute as they arrive. Execution of the
tasks will be in the same order as they arrive. Though it is simple, it has got
some serious drawbacks including, the overall waiting time of tasks would be
higher than when they are scheduled in SJF, priority scheduling methodologies.
The following table has an example which would explain this drawback.
Real Time Operating Systems—A Detailed Overview 95

Table 6.4: FCFS Scheduling

n
Task Number Arrival time (in ms) Execution time (in ms)

T0 0 30

T1 3 21

T2 7 14

With FCFS, T0 is executed first, followed by the execution of T1, and


finally the execution of T2 takes place. The average waiting time of the tasks is
calculated as follows:
T0 is the first task to arrive at the ready queue which immediately gets the
processor, so it doesn’t incur any waiting time. T1 arrives at time 3, but gets the
processor at time 30 (after the execution of T0), so the waiting time of T1
would be: 30 – 3 = 27 ms. Similarly, the waiting time of T2 is: 51 – 7 = 44 ms.
Average waiting time: (0 + 27 + 44)/3 = 23.66 ms.
But if the tasks were executed in SJF (preemptive) method, the average
waiting time is lesser than FCFS. At time 0, T0 arrives, it is executed. Waiting
time is 0. At time 3, T1 with lesser execution time arrives, so it preempts T0.
(Note: no waiting time for T1, also, T0 has finished its execution for 3 ms.
Remaining 27 ms there.) At time 7, T2 arrives with even lesser execution time
preempting T1. (Note: no waiting time incurred for T2). At time 28, T2 finishes
its execution, giving the processor to T1, which then executes for 17 ms, giving
the processor to T0 at time (28 + 17) = 45 ms.
So, waiting time for T2:0 ms,
Waiting time for T1: 28 – 4 = 24 ms.
Waiting time for T0: 45 – 3 = 42 ms,
On an average, (0 + 24 + 42)/3 = 22 ms<23.66 ms for FCFS.
When these tasks are scheduled in a non-preemptive SJF scheduler, average
waiting time would be: 16.33 ms. (T2 executes first, then T1 followed by T0).
Hence it can be seen that FCFS lags behind the other scheduling mechanisms
with respect to its performance.
//C code for Non-preemptive Scheduling:
#include<stdio.h>
#include<conio.h>
#include<process.h>
96 Embedded Systems

void main()
{
char p[10][5];
int tot=0,wt[10],i,n;
float avg=0;
clrscr();
//get number of processes
printf(“enter no of processes:”);
scanf(“%d”,&n);
//get the process’ name and burst time
for(i=0;i<n;i++)
{
printf(“enter process%d name:\n”,i+1);
scanf(“%s”,&p[i]);
printf(“enter process time:”);
scanf(“%d”,&pt[i]);
}
//waiting time of first process is zero
wt[0]=0;
//calculate the waiting time of other processes
for(i=1;i<n;i++)
{
//waiting times of the other process would be,
//the sum of waiting time and execution times of previous processes
wt[i]=wt[i–1]+et[i–1];
//find the total waiting time
tot=tot+wt[i];
}
//find the average waiting time
avg=(float)tot/n;
//print the values
printf(“P_name\t P_time\t w_time\n”);
Real Time Operating Systems—A Detailed Overview 97

for(i=0;i<n;i++)
printf(“%s\t%d\t%d\n”,p[i],et[i],wt[i]);
printf(“total waiting time=%d\n avg waiting time=%f”,tot,avg);
getch();}
OUTPUT:
enter no of processes: 5
enter process1 name: aaa
enter process time: 4
enter process2 name: bbb
enter process time: 3
enter process3 name: ccc
enter process time: 2
enter process4 name: ddd
enter process time: 5
enter process5 name: eee
enter process time: 1
P_name P_time w_time
aaa 4 0
bbb 3 4
ccc 2 7
ddd 5 9
eee 1 14

total waiting time = 34


avg waiting time = 6.80

(b) No Priority/Fairness scheduling


Another example for non-preemptive scheduling is the fairness scheduling. No
task is considered superior or inferior here. All the tasks are considered same
and the CPU time is fairly distributed among the available tasks. For instance, if
there are 5 tasks available, CPU time would be divided by 5 and distributed to
the tasks. The tasks can make use of the time slice given to them. It seems to be
similar to the round robin scheduling, but actually both are different in the sense,
in round robin, time slice would be a very little amount of time in which most of
98 Embedded Systems

the tasks would not be able to finish their execution, but in fairness scheduling
most of the tasks are able to finish within the CPU time share.

6.3 INTER PROCESS COMMUNICATION (IPC)


METHODOLOGIES
Inter process communication is very vital in any Embedded system. A process
may have to feed another process for it to proceed. It is inherent in all the
embedded systems. Following are very commonly used IPC mechanisms:
1. Pipe
2. Named Pipe or FIFO
3. Message Queue
4. Shared Memory and
5. Semaphores/Mutex
All these will be discussed in detail with relevant C code for easy understanding.
It would be better if reader has Linux based PC when reading out this Chapter.

6.3.1 Pipe
A pipe is very simple way of communicating between two processes. One
relevant real time example will be apt here. To water the plants available at
garden, the tube will be connected to water tank and another end of the pipe will
be used to water the plants. Same is the scenario. When process A has to
transfer data to process B it can use pipe and most important thing is pipe here
is unidirectional i.e., data can be sent in either of the directions at a time. If there
needs to be a dual communication then 2 pipes have to be used.
And one simpler scenario about this pipe is, it can be used only between
related processes. No two different unrelated processes can use pipe. None
will water plants in neighbour’s house. This is the case here.
The client server scenario can very well illustrate the application of pipes.
The client reads a file name from the STDIN (Standard Input, Keyboard) and
writes into the pipe. The server reads this file from the PIPE and opens the file
for reading. If the open is successful, the server responds by reading the file and
writing into the pipe, otherwise an error message would be generated. The
client then reads from the pipe, writing what it receives to the STDOUT. Linux
system programming would be very handy in explaining the Pipe concept. Figure
6.5 has the pipe representation diagrammatically.
Real Time Operating Systems—A Detailed Overview 99

STDIN Path Name

CLIENT SERVER FILE


STDOUT
File content or
error message

Fig. 6.5: Pipe


Herewith C code for Pipe has been presented. It can be executed in Linux
PC with GCC compiler. GCC is GNU C Compiler which is most commonly
used toolkit for compiling C code in Linux. The procedure for execution remains
very simple. The file name should have an extension as “.c”. Then it can be
compiled with gcc –o <executable_file_name> <filename> . Where executable
file name can be any name as user wishes. It can be later on used for executing
the program as ./executable_file_name. Then the program will be executed
and user can feed the test inputs. Keeping above stuff in mind user can now
read the following code which has comments embedded inline.
PIPE can be created with pipe() system call and it will return two file
descriptors accepting array of integers as argument. One File Descriptor (FD)
will be used as Read end file descriptor and second one can be used as Write
end file descriptor. File descriptor is an integer allotted by the system for each
file that is created.
int pipefd[2]; can be a valid example here.
Passing pipefd as argument to system call it would return 2 file descriptors
one fd [0] and other one is fd [1]. fd [0] can be used as read end of the pipe and
fd [1] can be used as write end of pipe. It is not mandatory that fd [0] should
always be read at end. It is purely based on user’s convenience.
Most important thing to remember in pipe as conveyed earlier is, they can
be used only with related processes. What is related process? A process when
has a child for itself then they become related. How to create child for an
existing process? fork() it. fork() is the system call used to generate child process
for existing parent process. Fork () system call when called, it will return 2
values. They are > 0 and = 0. Former is the return value representing parent and
latter is one that spots child process.
Keeping the above basic points in mind, one can easily walkthrough the code
presented below.
/* Unnamed_pipe.c*/
// Code Starts here.
// standard header files for unnamed pipe.
100 Embedded Systems

# include <stdio.h>
# include <unistd.h>

// main starts here.


int main (void)
{
int pipefd[2];
// array to be passed to the pipe system call.

int ret;
char buffer[15];
// this buffer is where data will be kept in.

pipe (pipefd);
//pipe system call is used which will create pipe.
//there will be 2 return values. One for read and
//another one for write.

ret = fork();
// creation of child process through fork is done
// here. This will now throw two return values.
// One > 0 and other == 0. >0 is for parent, equal
// to 0 is child.

// FIRST PART OF THE PROGRAM //


if (ret > 0)
{
fflush (stdin);
// cleaning the standard input first.

printf (“ \n Parent Process “);


// Printing a message as parent.
Real Time Operating Systems—A Detailed Overview 101

write (pipefd[1],“HELLO.MR.ROBERT”, 15);


// writing the content into Write end of the pipe.
// i.e., the data is now poured.
}

// SECOND PART OF THE PROGRAM //


if (ret == 0)
{
sleep (5);
fflush (stdin);
// cleaning the standard input line.

printf (“ \n CHILD ”);

read (pipefd[0],buffer,sizeof(buffer));
// data is now read, but need to display in the screen.
// for that purpose data is kept in buffer and from buffer
// can be written it to display
write (1,buffer,sizeof(buffer));
//Where 1 represents standard output, the screen.
}
return 0;
}

// Execution part of the program //


$ gcc –o unnamed_pipe unnamed_pipe.c
The above step will get the code compiled and executable file will be
available for the user to execute. Execution is very simple. ./unnamed_pipe will
get the code executed and the message HELLO.MR.ROBERT would be
sent to the child process which the user can see in the screen. One beauty in
this mechanism is unless parent writes child can’t read and there exists a
synchronization which is very vital. Only disadvantage associated with pipe is, it
102 Embedded Systems

can be used for only related processes. Now this problem can be overcome by
using Named pipe or FIFO. Reader is advised to try the same program in Linux/
Unix PC which will definitely yield better understanding.

6.3.2 Named Pipe


To overcome the problem that has been discussed above, Named pipe can be
deployed. Named pipe is also called FIFO (First In First Out). Here the concept
is slightly different. Taking a simple real life scenario would be handy. Assuming
a person has to pass a letter to someone. Due to some situations it cannot be
given in person. So what could be done? Simple it is. Find a third person who is
familiar to both the people. Now that third person will be able to hand over the
paper to destination successfully. Same case can be related to named pipe. As
conveyed already, named pipe can be used for communication between two
different processes. The sequence goes like this, Process A will write the data
in a common file which Process B can also access. After data has been written
by A, B will read the data from that common file. After reading the file can be
deleted. The term file has to be refined. It is called as FIFO in Linux/Unix which
can be created with available system calls. System call mkfifo() can be used to
create a FIFO. In FIFO, 2 different processes can communicate which is revealed
with following C codes, where first code fifo_write.c is FIFO write program
and fifo_read.c is read program. Write program has to be executed first, then
read can be executed. Even if the user executes read program, it will wait for
the writer to write the data. So here exists an auto synchronization which is
highly appreciable feature.
C code for both read and write are presented below with comments included.
mkfifo() has to be specified with the access permissions. A file when created
has got permissions associated with it. There are basically three kinds of users
available in Linux.
1. Owner of the file (The person who creates the file)
2. Group
3. Other users
Each of the above mentioned users will have access permissions. Following are
the access permissions associated with all the files.
1. Read Permission (Denoted by r)
2. Write Permission (Denoted by w)
3. Execute Permission (Denoted by x)
These permissions can be visualized by using ls –l filename.txt. Following snapshot
is revealing the permissions for the file file_1.txt.
Real Time Operating Systems—A Detailed Overview 103

Syntax: prompt $ ls –l <filename.txt>

Other

Owner

Fig. 6.6: File permissions


From the snap Fig. 6.6, one can see -rw-r-r- - being displayed for the file file1.txt.
First -rw denotes the permissions for the owner of the file. Next permission -r-
is the permission for group. And finally -r- - is the permission for other users.
These are the default permissions granted for the files created. Owner gets
Read and Write privilege, where other two users are restricted with just Read
permission. Reader now has to know few more things on the file access
permissions.
1. Without having read permission one would not be able to list out the content
of the directory and even removing files from the directory becomes
impossible.
2. Without write permission a user can’t write file into directory, would be
denied file renaming permission, and can’t build a subdirectory, denial of
to and fro movement of files in and out of directories.
3. Without execute permission one cannot display file in a directory and file
can’t be copied to and fro.
Next query that can arise in minds is can the permissions be altered? Yes. It can
be changed. ‘chmod’ is the command meant for it. Only owner of the file can
change the permissions. ‘chmod’ can be used to change the associated file
permissions.

/* fifo_write.c */
/* for creating FIFO system call mkfifo() has to be used. Also the FIFO
can’t be created in directory other than /tmp of Linux. */
// fifo_write.c
// Code Starts here.
# include <stdio.h>
104 Embedded Systems

# include <sys/stat.h>
# include <sys/types.h>
# include <fcntl.h>
# include <unistd.h>
// above are the standard header files for FIFO creation.

int main()
// main program starts here.
{
int fd, retval;
// mkfifo() will return a return value and it is
// collected in retval. Also fd is return value
// of open system call.

char buffer[8] = “TESTDATA”;


// writing data into buffer which has to be
// transferred.

fflush(stdin);

retval = mkfifo(“/tmp/myfifo”, 0666);


// creation of fifo has been carried out.
// /tmp is the only place where it can be created.
// associated file permissions are 666.

fd = open(“/tmp/myfifo”,O_WRONLY);
// as the fifo is already created, It can be opened
// in write mode for writing the data to fifo from
// buffer.

write(fd, buffer, sizeof(buffer));


// data has been now flooded into fifo.
Real Time Operating Systems—A Detailed Overview 105

close (fd);
// since write process is over, close the file.
return 0;
}

// fifo_read.c
// Code Starts here.
# include <stdio.h>
# include <sys/stat.h>
# include <sys/types.h>
# include <fcntl.h>
# include <unistd.h>
// above are the standard header files for FIFO creation.

int main()
// main program starts here.
{
int fd, retval;
char buffer[8];

fd = open(“/tmp/myfifo”,O_RDONLY);
// fifo has been already created in write program,
// so opening it in read only mode.

retval = read(fd, buffer, sizeof (buffer));


// reading and putting the content into the buffer.
fflush(stdin);

write(1, buffer, sizeof (buffer));


// read the content out from the buffer and putting
// it on the screen on stdout
106 Embedded Systems

close(fd);
return 0;
}

Compilation of C code for write should be first done with gcc –o fifo_write
fifo_write.c and then execution of the same can be done with ./fifo_write.
Similar procedure has to be followed for fifo_read.c. If the read file is executed
first, it will wait until write is executed. Automatic synchronization will be there.

6.3.3 Message Queue


Two (or more) processes can exchange information via access to a common
system message queue. The sending process places via some (OS) message-
passing module a message onto a queue which can be read by another process.
Each message is given an identification or type so that processes can select the
appropriate message. Process must share a common key in order to gain access
to the queue in the first place.
Message queues provide an asynchronous way of communication possible,
meaning that the sender and receiver of the message need not interact with the
message queue at the same time. Message queue has a wide range of
applications. Very simple applications can be taken as example here.
1. Taking input from the keyboard
2. To display output on the screen and
3. Voltage reading from transducer or sensor etc.
A task which has to send the message can put message in the queue and
other tasks. A message queue is a buffer-like object which can receive messages
from ISRs and tasks and the same can be transferred to other recipients. In
short it is like a pipeline. It can hold the messages sent by sender for a period
until receiver reads it. And biggest advantage which someone can have in queue
is receiver and sender need not use the queue on same time. Sender can come
and post the message in queue, receiver can read it whenever needed. Message
queue is basically composed of few components. A message queue should have
a start and it should have an end as well. Starting point of a queue is referred as
head of the queue and terminating point is called tail of the queue. Size of the
queue has to be decided by the programmer while writing the code. And a
queue cannot be read if it is empty, meanwhile a queue cannot be written into if
it is already full. And a queue can have some empty elements as well. Figure 6.7
highlights a sample queue structure.
Real Time Operating Systems—A Detailed Overview 107

MESSAGE QUEUE

TASK TASK TASK TASK TASK TASK

Sending Tasks Receiving Tasks

Fig. 6.7: Queue structure


The message queue can be implemented in Linux machine with available system
calls. The basic operations to be carried out in queue are:
• Creation / Deletion of queue and
• Sending / Receiving of message
Message queue is yet another efficient mechanism for establishing communication
between the processes.
Code for queue is presented as follows:
Two different files have to be written here: One for sender and another one for
receiver. Receiver will wait until the sender writes into the queue. One important
advantage with message queue is, it supports automatic synchronization between
the sender and receiver. Receiver will wait until sender writes. Another advantage
is memory can be freed after usage which is very essential in all software
systems.
Few things have to be taken into consideration before writing code for queue.
1. An identifier has to be generated.
2. 3 system calls have to be used for queue code.
(a) msgsnd( ) — Will initialize the queue.
(b) msgrcv( ) — Will be used to receive the message.
(c) msgctl( ) — Control action can be performed with this call i.e., deletion
can be done with msgctl( ).
While executing, user may face a trouble in execution if not having
administrator privileges. Keeping above points in mind, code can be written
with ease for message queue. One code for receive action and the next one
for send action.
108 Embedded Systems

// message_snd.c
// Code starts here.
// Header files on IPC and message have to be included with
// normal other headers

#include<stdio.h>
#include<stdlib.h>
#include<sys/ipc.h>
#include<sys/types.h>
#include<sys/msg.h>

// A structure has to be created for sending the message


// and to decide on type of typed message.

struct msgbuf
{
long mtype;
//to decide on message type.
char msgtxt[200];
//size of the data which has to be sent.
};

int main(void)
{
struct msgbuf msg;
// creating an instance of the structure.
int msgid;
// every message is represented by an id
key_t key;
// every queue needs a key, which the sender and receiver will agree upon
Real Time Operating Systems—A Detailed Overview 109

// STAGE 1 of PROGRAM //

if ((key=ftok(“message_snd.c”,‘b’)) == –1)
//ftok function generate the key and it will return the key if successful else
return –1
//ftok is file to key
//a key is generated with the file that current file itself.
//in case of failure, ftok will generate –1.
{
perror(“key”);
// if not created perror function will let us know why it has not been created
exit(1);
}
// STAGE 2 of PROGRAM //

if((msgid=msgget(key,0644|IPC_CREAT))==–1)
// message id is generated through this system call
// if successful it will return the message id through which queue can be
accessed
{
perror(“msgid”);
// if not formed it will give the error message.
exit(1);
}

printf (“\n the msgid is %d”,msgid);


// printing msgid for confirmation.
printf(“enter the text”);
//user is prompted for typing the data..
msg.mtype=1;
// setting the message type here. i.e., an agreement between the processes.
while(gets(msg.msgtxt),!feof(stdin))
110 Embedded Systems

//reading and appending the message typed on the


//stdin to the queue by the msgsnd function call
{
// Third Stage of the program //
if(msgsnd(msgid,&msg,sizeof(msg),0)==–1)
// here for sending the message msgid, address of msg and size have to be
passed.
// since flags are not used marking 0 as 4th argument.
// it will return –1 if an error has been created.
{
perror(“msgsnd”);
exit(1);
}
}
//////////// Stage 4 of the program //////////////////////////////////////////
if(msgctl(msgid,IPC_RMID,NULL)==–1)
// to delete the id, when my work is over..
{
perror(“msgctl”);
exit(1);
}
return 0;
}
// To delete the msgid through command line!
// ipcrm –q msgid

Message receive functionality has been implemented in the following code.


// message_rcv.c
/* most part of the program would be the same. There in message_snd.c
msgsnd system call is used, here msgrcv has to be used.
*/
// same are the header files.
Real Time Operating Systems—A Detailed Overview 111

#include<stdio.h>
#include<stdlib.h>
#include<sys/ipc.h>
#include<sys/types.h>
#include<sys/msg.h>

struct msgbuf
{
long mtype;
char msgtxt[200];

};
int main(void)
{
struct msgbuf msg;

int msgid;
key_t key;

// Stage 1 of PROGRAM //
if((key=ftok(“message_snd.c”, ‘b’))== –1)
// using the same file here to get the key.
{
perror(“key”);
// if not created perror function will let us know why it has not been created
exit(1);
}
// Stage 2 of program //
if((msgid=msgget(key,0644))==–1)
{
perror(“msgid”);
exit(1);
112 Embedded Systems

//Stage 3 of the program //

for(;;)

{
if(msgrcv(msgid,&msg,sizeof(msg),1,0)==–1)
// here msgrcv is used. This is the major difference between send and receive
program
{
perror(“msgrcv”);
exit(1);
}
printf(“%s\n”,msg.msgtxt);
}
return 0;
}

Execution procedure is similar and simple as previous one. First, message_snd.c


can be compiled and executed. Compilation can be done with gcc –o
message_snd message_snd.c and execution can be done with ./message_snd.
It will now prompt the sender for typing the data to be sent to receiver. In
parallel, from another terminal message_rcv.c file can be compiled and executed
with gcc –o message_rcv message_rcv.c and ./message_rcv. The receiver
would receive all the information that sender types. If receiver compiles and
executes message_rcv.c file first, program will wait until sender drops the
message.

6.3.4 Shared Memory


The next mechanism to be learnt in IPC is shared memory which is very vital
and frequently used. Shared memory can even be used between unrelated
processes. By default page memory of 4kbytes would be allocated as shared
memory. Assume process 1 wants to access its Shared Memory area, It has to
get attached to it first. Though its P1’s memory area, it cannot get access as
such. Only after attaching it can gain access. A process creates a shared memory
segment using shmget(). The original owner of a shared memory segment can
Real Time Operating Systems—A Detailed Overview 113

assign ownership to another user with shmctl(). It can also revoke this assignment.
Other processes with proper permission can perform various control functions
on the shared memory segment using shmctl(). Once created, a shared segment
can be attached to a process address space using shmat(). It can be detached
using shmdt(). The attaching process must have the appropriate permissions for
shmat(). Once attached, the process can read or write to the segment, as allowed
by the permission requested in the attach operation. A shared segment can be
attached multiple times by the same process. A shared memory segment is
described by a control structure with a unique ID that points to an area of
physical memory. The identifier of the segment is called the shmid. The structure
definition for the shared memory segment control structures and prototypes can
be found in <sys/shm.h>.
There are three steps:
1. Initialization
2. Attach
3. Detach
Two separate programs for read and write are presented here.
// shared memory write program
// shmwrite.c

#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/types.h>
#include <stdio.h>
#include <string.h>

int main()
{
int retval,shmid;
void *memory = NULL;
char *p;

//Stage -1, Initialization of shared memory //

shmid = shmget((key_t)1234, 6, IPC_CREAT|0666 );


114 Embedded Systems

//getting the shm initialized.


//hard coding the key value without using ftok.

if (shmid < 0)
{
printf (“\n The Creation has gone as a failure, Sorry”);
shmid = shmget ((key_t)1234, 6, 0666);
// keeping a check here, if shmid is not created it will be created. As it is
already
// created it will be of not much use.
}

printf (“\n getting the shared memory created %d”, shmid);

//Stage -2, attachment to shared memory //


memory =shmat(shmid, NULL, 0);

//on success shmat() returns the address of the attached shared memory
segment of
// so why void *memory = NULL is declared in the start of the code.

if (p == NULL)
{
printf(“\n Attachment failure, Sorry”);
return 0;
}

p=(char *) memory;
// specifying the data type.
// sending the characters, so we need to cast it to char

memset(p, ‘\0’, 6);


Real Time Operating Systems—A Detailed Overview 115

//cleaning buffer before using.

memcpy(p, “hello”, 6);


// writing the data to be shared.

//Stage -3, detachment from shared memory //

retval = shmdt(p);
if (retval < 0)
{
printf(“\n Suffered Detachment”);
return 0;
}
}
// shared memory read program
// shmread.c
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/types.h>
#include <stdio.h>
#include <string.h>
#include <memory.h>

int main ()
{
int retval,shmid;
void *memory = NULL;
char *p;

//Stage -1, Initialization of shared memory //

shmid = shmget((key_t)1234, 6, IPC_CREAT | 0666);

//getting the shm initialized.


116 Embedded Systems

//hard coding the key value without using ftok.

if (shmid < 0)
{
printf(“\n The Creation has gone as a failure, Sorry”);
shmid = shmget ((key_t)1234, 6, 0666);
}
printf(“\n We are getting the shared memory created %d”, shmid);

//Stage -2, attachment to shared memory //


memory =shmat(shmid, NULL, 0);
if (memory == NULL)

{
printf(“\n Attachment failure, Sorry”);
return 0;
}
p=(char *) memory;
printf (“\n MESSAGE is %s \n”,p);

//printing the message here.

//Stage -3, detachment from shared memory //


retval = shmdt(p);
if (retval < 0)
{
printf(“\n Suffered Detachment”);
return 0;
}
retval = shmctl(shmid, IPC_RMID, NULL);
return 0;
}
Real Time Operating Systems—A Detailed Overview 117

Execution is similar to the cases dealt in the past. GCC has to be used and
compilation has to be done first for shmwrite.c then it has to be executed. After
executing file to write same procedure has to be followed for shmread.c. The
next scenario to be looked into is Semaphores.

6.3.5 Task and Resource Synchronization


In the multi-programming environment, not only task scheduling is important,
but also, when two or more tasks try to access the same resource, care should
be taken in order to avoid conflicts, i.e., Synchronization is needed among the
tasks accessing the common resource. Semaphore and Mutual exclusion are
the key concepts to achieve synchronization.

Semaphores
If there is a two way road, then there will never be a problem for clash and
resource sharing (road will be available for cars travelling either ways). Figure
6.8 is a case referred here. Cars can move comfortably without any clash here.
So no need is there for synchronization or sharing of resources.

Fig. 6.8: Resource available for all the tasks (cars)

If there is an intersecting road there needs a definite way for avoiding clash i.e.,
an electronic signal will be used there which use green, red and yellow lights to
indicate the availability of road for the vehicles to move on. Based on signal,
vehicle can move and use the road without collision. Figure 6.9 shows this
concept diagrammatically.
118 Embedded Systems

Fig. 6.9: Signal used to avoid clash

One more real time example can be impelled here. Consider a railway track
which connects two cities. A train if uses the rail track, another train can’t use
the rail track on the same instance. If used, it would result in a dire accident. So
came the notion of semaphore, which is an indicator of the status of the rail
track (resource). Based on the indication of the semaphore, driver of the
subsequent train can comprehend the status of track. Figure 6.10 has semaphore
used in railway department in it. If free, rail track can be used or if not driver
has to wait until the resource is freed. This is semaphore and the same is being
used in embedded systems to avoid clashes for resource.

Fig. 6.10: Semaphores used in railways


Real Time Operating Systems—A Detailed Overview 119

Before a task can access a shared resource, it should first have that access
permission. Semaphore (S) is an integer variable which helps in achieving
synchronized access of shared resources. S can have a value greater than or
equal to 0. It can be accessed only by two operations namely, wait and signal
operations. These operations work as follows: Wait (S) decrement the Semaphore
value and if in the process of decrementing the value of Semaphore reaches
negative value then the process is suspended and placed in queue for waiting.
Signal (S) increments the value of the Semaphore and it is opposite in action to
wait (S). In other words it causes the first process in queue to get executed. Let
S be the semaphore used to access a shared resource ‘D’.
function wait(S)
while (S<=0);
//wait in the queue, some other task is using D
//no operation takes place.
//Note: loop body is empty
// Once S value reaches a positive value, start using D after decrementing
S.
S– –;
function signal(S)
//the usage of shared resource is over, release ‘D’ by incrementing S
S++;
Where the Semaphore is stored?
As already seen, semaphore is a variable which is accessed by the tasks before
using a shared resource. So it should be globally available to all the tasks which
execute concurrently. In the operating system, there is a globally accessible
area called kernel which is used to store the Semaphore value. We can view
the existing Semaphores by giving the following command in the unix editor
prompt:
$ ipcs –s
A simple Example explaining S

Consider 2 tasks want to access a procedure display. Display is a shared resource.


To control access, Semaphore S is created, initialized to the value 1. The steps
in the Fig. 6.11 are as follows:
1. If task1 needs access, it will acquire S as it is positive and decrements S
immediately by one. Now its value is zero. (Note: no other task can acquire
S until it is equal to or less than zero!).
120 Embedded Systems

2. Uses display and after usage it will release it. On releasing the shared
device, it is implicit that it increments the S value by one, which would
now become 1 i.e., any other task wanting to use the shared resource can
now acquire S.
3. Task2 acquires S, decrements again.
4. Releases the resource and increments S after usage.
If both the tasks need access, kernel can give the access to only one of the
tasks i.e., semaphore will be given to only one task. This allocation can be based
on priority or first come first serve basis. If many tasks need access, then they
will be kept in queue. They will wait for their turn and they can gain access.

DISPLAY

TASK1 TASK2

TASK1 TASK1 USES TASK1 TASK2 TASK2 USES TASK2


ACQUIRES DISPLAY RELEASES ACQUIRES DISPLAY RELEASES
SEMAPHORE SEMAPHORE SEMAPHORE SEMAPHORE

Fig. 6.11: Semaphore example


Semaphore types
There exist two types of Semaphores:
1. Counting Semaphore
2. Binary Semaphore
Binary Semaphore (Fig. 6.12), as the name implies, it can have only two values,
0 or 1, unlike the counting Semaphore whose value can range up to a larger
number from 0.
• When a binary Semaphore’s value is 0, the Semaphore is considered
unavailable (or empty);
• When the value is 1, the binary Semaphore is considered available (or
full).
Real Time Operating Systems—A Detailed Overview 121

• Note that when a binary Semaphore is first created, it can be initialized to


either available or unavailable (1 or 0, respectively).

Acquire
(value = 0)

Initial Initial
value = 1 Available Unavailable value = 0

Release
(value = 1)

Fig. 6.12: Binary semaphore

Where the semaphore is used in the programming context?


In the object based/oriented languages such as C++ and Java, multi-threading is
the key concept to implement concurrency. While dealing with the threads, care
should be taken to ensure that there is no deadlock, starvation, synchronization
problem, etc. In this kind of situation, semaphore plays a crucial role. Semaphores
can be implemented to control the access of shared variables/procedures in a
multi-threaded environment.
Creating a simple semaphore in Unix
There are system calls available inbuilt in Unix / Linux. To acquire, to perform
control actions and to release the semaphore after usage Unix provides following
system calls.
1. semget
2. semctl
3. semop
All the above said system calls will help a programmer to use semaphore concept
to avoid clashes for resources.
Busy waiting in semaphores
With the wait and signal operations, there may be a bright possibility for tasks to
get struck in busy waiting state. This means, when a task is using the Semaphore,
other tasks wanting to acquire the same Semaphore must continuously loop to
check for its availability and then they will acquire one after the other. This
problem can be overcome by letting the tasks block themselves while waiting
for the Semaphore instead of busily waiting and wasting the CPU cycles.
122 Embedded Systems

Deadlock and starvation in semaphores


In addition to the busy waiting scenario, Semaphores can even lead to the problem
of starvation and deadlock. Consider the following sequences of events by two
tasks which work simultaneously.

Task a Task b
Wait for Semaphore S (acquires S) Wait for Semaphore T (acquires T)
Wait for Semaphore T Wait for Semaphore S
(T value will be negative (S value will be negative
since it’s already used by b, so, queued) since it’s already used by a, so, queued)
……….. ………..
……….. ………..
Signal(S) Signal(T)
Signal(T) Signal(S)

The above sequences when executed, they lead to a deadlocked situation. Neither
of the two tasks succeed in finishing execution. This would eventually lead to
the starvation of both the tasks, task a waiting for T while having the semaphore
S, and task b waiting for S having the semaphore for T.

Mutex (Mutual exclusion)


It is yet another concept for achieving synchronization between tasks. Before
looking into the concept of mutual exclusion, some light should be thrown on
critical section problem. It is a section of code which is globally available to all
the tasks, all the tasks have read and write access to this critical section, which
possess synchronization problem. When one task is accessing the critical section
to write, no other task should be allowed to either read or write in that critical
section. So, before a task is going to use the critical section, should ensure no
other task is writing in it. It should be noted that, more than one task is allowed
to read from the critical section at the same time as simultaneous reading is not
going to affect the synchronization.
A classic example for mutual exclusion problem is the dining philosopher’s
problem. In this problem, there will be 5 philosophers, who will be in one of the
following states.
1. Eating
2. Thinking
3. Hungry
Real Time Operating Systems—A Detailed Overview 123

All the philosophers are made to sit in a round table with forks on left and
right side of the plate. The philosopher in the hungry state can go to the eating
state only if both the forks on both the sides of the plates are available. Even if
one is missing, the philosopher cannot go for tasting the food. Philosophers who
do not feel hungry will be in thinking state.

Fork

Eating philosopher

Thinking philosopher

Fig. 6.13: Dining philosopher’s problem

As it can be seen from the Fig. 6.13, only two from 5 philosophers can be
in eating state at any point of time to avoid conflicts. Another famous example
for mutual exclusion problem is reader-writer problem where no two writers or
a writer and a reader can perform their work simultaneously.

6.4 MEMORY MANAGEMENT


As the need for multiprogramming increases, need for managing the memory
also increases proportionally. Memory in memory management generally refers
to the main memory (RAM) management, where each process/task which has
to be executed is brought in. The general memory hierarchy in any computer
system is
1. Registers
2. Cache memory
3. RAM
4. Hard disk
5. Flash memory
The hierarchy is framed based on increased storage capacity and decreased
price i.e., in this hierarchy, registers are most costliest and smallest in storage
124 Embedded Systems

capacity and flash memory, the cheapest with large storage capacity. Our focus
would be on cache memory and RAM.
Properties of RAM
1. Popularly known as physical memory
2. Size is lesser than hard disk (for instance, 512 MB RAM<<<120 GB
HDD)
3. So, speeder than hard disk
4. Costlier when compared to hard disk
5. Volatile in nature, which means, doesn’t remember anything when the
power is shut down
6. It holds the most important, operating system (the OS) with itself for
speeder operations
7. Accessed by CPU for execution of programs
8. Accessed during DMA (Direct Memory Access) by CPU
9. Size ranges from 128 MB to 10 GB
Why memory management?
As explained in the properties of RAM, it is only a limited capacity memory
device. It cannot hold lots and lots of programs as HDD does, which eventually
arises a question. What will happen if the size of the user program is larger than
the RAM’s? Here comes the need for memory management.
The program which has to be executed should be brought in to the RAM
from the secondary storage device like HDD. Only then it’ll be able to get
executed. How the program is allocated space in the RAM? It is another question
which needs attention in Memory management. There are few techniques
available to deal with memory allocation problem. The basic form of memory
allocation is done by partitioning the RAM. It can take any of the following
forms:
1. Fixed size partitioning
2. Variable size partitioning
3. Dynamic allocation based on size of the programs.
1. Fixed size partitioning
Memory is divided into equal sized partitions in this technique. Each partition
can hold a program of equal of lesser size (Refer Fig. 6.14). Number of such
partitions determines the amount of multiprogramming ability. Suppose the
capacity of the RAM is 512 MB and first 80 MB are allocated to the OS. The
rest of them are divided into partitions. The size of the partition can be anything.
For instance, it may be of 20 KB size. So, programs’ size should be of the same
Real Time Operating Systems—A Detailed Overview 125

size. Else some portion of the memory will be wasted! The waste is technically
called internal fragmentation which is discussed in the following subsection.
Example:
Let the size of the RAM be 512 MB, and the approximate size of each partition
be 100 KB.
And the programs of following size arrive.
120 KB, 80 KB, 18 KB, 60 KB.
How the allocation is going to be?

OS 100 KB 100 KB 100 KB 100 KB 100 KB

Fig. 6.14: RAM-fixed partitioning

• First program needs a space > partition size. So it is given two partitions.
• Then the second needs a space lesser than partition size, which is given
the 3rd partition.
• And 3rd needs much lesser space than the partition size. It’s given the 4th
partition.
• Similarly 4th program needs a space lesser than the partition size. It’s
given the 5th partition.
As it can be seen that with each allocation, there goes some space wasted. In
the first allocation, 80 KB is wasted, similarly 20 KB, 82 KB, 40 KB wasted for
2nd, 3rd, 4th allocations respectively. So, amount of memory wasted is more which
is very costly to spare. This is the disadvantage with fixed size partitioning.
Advantage:
• Simple to implement
Disadvantage:
• Memory wastage is more
2. Variable size partitioning
To eliminate the disadvantage with the fixed size partitioning, this type of
partitioning was introduced. In this type, partitions are of different sizes. Each
size is associated with a queue into which the processes of equal/ lesser sizes
are admitted. Then the processes are given the memory in FIFO manner.
Pictorially the same is represented in Fig. 6.15.
126 Embedded Systems

New Processes

Fig. 6.15: Variable sized partitioning

In this type, when a new process arrives, based on its size, it is admitted to any
one of the queues available. In this type also memory wastage happens. But
lesser than the previous type. For example, let the sizes of partitions be 10, 20,
40, 60, 100, 200 KB. Assume the same programs as fixed size partitioning.
When they are allocated to variable sized partitions, 120, 80, 18, 60 KBs are
given the partitions 200, 100, 20, 60 KBs respectively. The memory which is
going to be unused is much lesser than the previous allocation method. There is
yet another allocation method, which is the most famous type of allocation,
which is dynamic memory allocation i.e., allocating the memory at the execution
time. It is covered in the last subsection.

6.5 CACHE MEMORY


Yet another form of memory is the cache memory.
1. It is very small in size.
2. It is very fast compared to the HDD, flash memories.
3. It sits in between the CPU and the RAM.
4. It saves the precious time of CPU most of the time by having the item
needed by the CPU, which helps the CPU not refer to the RAM every
time.
5. But it is very costly.
Real Time Operating Systems—A Detailed Overview 127

How does a cache memory work?


When a request for a word/page is coming, it generally goes to the RAM to
check if the particular word/page is present over there. If present, the requested
item is returned to the processor. Else, the reference is said to be invalid. In this
scenario, the CPU has to spend time in sending the request all the way to the
RAM, and for accessing the RAM to find if the needed item is available/not.
This time is considerably high. In order to reduce this time, cache was introduced.
After the introduction of the cache memory, all the requests have to pass through
the cache. While reaching the cache, if the requested item is found, it is said to
be a cache hit, else, it is a cache miss. In case of a miss, cache has to refer
back to the RAM for the requested item and gets it from there, keeps a copy of
the item in it, then gives back the requested item to the processor. If a hit, it
doesn’t refer to the RAM thereby saving the time. Cache miss, as anybody can
guess, incurs extra time than when the CPU refers directly to the RAM, but it is
only a few seconds delay searching the cache as the size of the cache is very
small. Caches may be in many levels. One which was discussed all these times
is said to be a one level cache. There exists two level caches also. L1 (level one
cache) resides on the microprocessor chip itself and L2 (the level two cache)
resides separately on an extension card for even quicker access.

(1) CPU can directly access RAM

CPU cache RAM

(2) CPU can first check if the


needed file is in cache. If so,
gets it from the cache, else,
the cache contacts the RAM
and gets a copy of the
requested page/file and then
forwards to the CPU.

Fig. 6.16: The cache memory


128 Embedded Systems

During the initial references, the miss rate would be much higher, since the
cache is empty during the start. But as time rolls on, the cache is filled and there
will be more hits than misses. Cache memory concept is diagrammatically
represented in Fig. 6.16.
A real world example for caching
Everybody uses Google almost all the time for their work. Once the keywords
which are to be searched are keyed in, ten pages each containing the relevant
information with respect to the search key are displayed to the user. For example,
while searching for “cache memory” the following link appears.

Hardware implements cache as a block of memory for temporary storage of data...


en.wikipedia.org/wiki/Cache - Cached-Similar

indicates that the


item being
searched has been
cached once
before

Fig. 6.17: Example for caching

In this case (Fig. 6.17), even if the original page has been removed, the cached
page will be given to the user who is searching for that page. Also, it is not that
the cached page is the most recently updated page. It may be on the same day
the page is updated. And the one which is being displayed mayn’t reflect that. If
the user clicks on the cached hyperlink, he/she gets a remainder at the top of the
displayed page regarding the updation conflict. The most frequently visited links
will be having such tags. Some pages may not be cached also. In this case the
cached hyperlink is absent. Figure 6.17 can stand as support for the concept
discussed.
For high cache performance
1. Hit time should be minimized.
2. Miss rate should be reduced.
3. Miss penalty should be reduced (penalty associated with the cache miss,
may be in the form of CPU time/cost/both).

6.6 DYNAMIC MEMORY ALLOCATION


In Dynamic programming, internal fragmentation problem which was there in
both fixed and variable sized partitioning is overcome. The programs are allocated
Real Time Operating Systems—A Detailed Overview 129

memory space exactly what they need. Not even a single byte is allocated
extra. This process would eliminate internal partitioning, but introduces a new
problem of holes which are free space in between the partitions, the size a hole
may not accommodate a program but when all the holes are combined together,
they will become useful. The holes are also known as external fragmentation
(Fig. 6.18). The process of combining the holes together is popularly known as
compaction. The holes’ concept can be understood from the following figure.
Three popular techniques are available for dynamic partitioning.
1. First fit
2. Best fit
3. Next fit

Fig. 6.18: External fragmentation


First fit:
– Scans memory from the beginning and chooses the first available block
that is large enough to hold the program.
– Fastest of all the three modes.
– Many programs are accumulated in the front end of memory.
Best fit:
– Chooses the block that is closest in size to the request.
– Worst performer overall.
– Since smallest block is found for process, the smallest amount of
fragmentation is left.
– Memory compaction must be done more often.
130 Embedded Systems

Next fit:
– Scans memory from the location of the last placement.
– More often allocates a block of memory at the end of memory where the
largest block is found.
– The largest block of memory is broken up into smaller blocks.
– Compaction is required to obtain a large block at the end of memory.

First fit
80 MB
80 MB

Best fit

120 MB
Next fit

Occupied
block

50 MB

Recently allocated
block. (Note: next fit
follows the next
available free block,
i.e. 120 MB block)

Fig. 6.19: First, next, best fits

These methods can be understood well by looking at an example.


Suppose there is a block to be allocated in to the RAM, which has three free
blocks of sizes, 80, 120, 50 MBs. The size of the block to be placed is 45 MB.
• First fit searches for the first free block which can hold the 45 MB block.
It finds an 80 MB block to be good enough and it occupies it.
• Next fit searches for the next sufficient free block. It makes use of the
following 120 MB block.
• Best fit makes use of the smallest block which is sufficient to hold the 45
MB block, which is the 50 MB block. As it can be seen, first and next fit
lead to a larger hole compared to the best fit. Pictorial representation is
presented in Fig. 6.19.

6.7 FRAGMENTATION
Fragmentation has been already discussed in the previous topics such as
partitioning and dynamic memory allocation. Here, a brief summary of those:
Real Time Operating Systems—A Detailed Overview 131

1. Due to the poor usage of the memory partitions, some portion of the
fragments become an empty space which is of no use. This is generally
known as fragmentation.
2. There are two kinds of fragmentation.
R Internal
R External
3. Internal fragmentation arises due to the smaller size of the block which is
very small for the partition in which it is allotted i.e., partition size>block
size.
4. External fragmentation is due to the dynamic allocation of memory where
sizes of the blocks allocated from time to time differ which creates holes
in the memory. These holes will be of no use most of the times due to its
smaller size. Once the holes are combined using compaction, the
disadvantage of holes disappears!
5. Internal fragmentation cannot be resolved using any technique since it is
private to the fragment.

6.8 VIRTUAL MEMORY


There are times when the RAM can run out of the space and when the user
loads another process, it can flash a message like “no more space to load a new
process. Close any of the currently running processes to load the new”. During
these times the virtual memory plays a crucial role. The term virtual means that
this memory is not having a physical existence like main memory or secondary
memory. Its job is, when the main memory runs out of space, when the need to
load more items comes, this virtual memory sees in to the RAM for the pieces
of the currently running program which have not been used since long time i.e.,
portions of a program which is not referenced for a long time. This is the case
similar to the scheduling algorithm, Least Recently Used (LRU). Any other
scheduling algorithm like FIFO can also be made use of. This swapping process
is transparent in the sense, none of the processes come to know that some
portion of some other program has been moved to accommodate it! What
impression does this process give to the processes? The processes think that
there are lots and lots of memory available within the RAM to accommodate
them even though actually the size of the RAM is very much limited. This is the
general concept behind the Virtual Memory.
Advantages
1. Greatly reduces the burden on the programmer since he need not worry
about the size of the program and the available free RAM space.
132 Embedded Systems

2. Programs, in the absence of the virtual memory, need to wait for the
RAM to free some space to let the new programs to run.
3. Increases the degree of multi-programming by letting many processes to
reside in the RAM at the same time.
Disadvantage
1. The major disadvantage with respect to the virtual memory scheme is
that, the time it takes to move portions of processes to the hard disk.

6.9 CONTEXT SWITCHING


As known, CPU has to execute one task for some time and it has to go ahead
with executing another task for sometime which is waiting in queue. Assume
that CPU is executing a lower priority task. So CPU has to store data relevant
to that task. All these data would be available and handled in CPU registers.
When CPU moves to the next higher priority task for execution it cannot abruptly
move for execution. CPU has to save the current data somewhere safely and
then can execute the next task. Where to save the data related to current task?
The state of the CPU registers when a task has to be preempted is called
context and saving the context of the current task in stack and loading the new
task is called context switching.
As an example, imagine an MP3 Player which also has FM radio in it.
User can listen to any song in MP3 player mode and if need be FM radio can be
switched on any time. The MP3 song being played will be paused and instrument
will provide support for using FM radio. If user feels to get back to MP3 player
it can be done and the paused song can be played again. Here a context switching
has happened. CPU should save the details of song being played in stack and
then only can concentrate on next task. Context is saved here and switching
happens then.
Following are the actions that can take place when there is a switching request:
• Getting the address where new task (function) begins. That address will
be loaded into the program counter (place where address of next instruction
to be executed is stored). It is all set to execute the new task, but has to
store the current context. It will act as a paper mark for getting back.
• Context related to current process can be
• Program Status Word (PSW)
• CPU registers
• Stack pointer details etc.
Real Time Operating Systems—A Detailed Overview 133

Figure 6.20 shows the context related to current task.


All these information have to be stored before the switching. As the new task to
be executed may also need above registers and utilities, it has to be freed and
stored in a different place.

Current Program Context

PC (Program Counter)

SP (Stack Pointer)

CPU registers

Fig. 6.20: Current program context

POINTS TO REMEMBER
1. Linux is an open source OS.
2. A task is the basic element in every OS, which is the program in execution
Task states including Dormant, Running, Ready, Blocked.
3. Task scheduling means arranging the tasks in ready state in an order in
which they will be allowed to get executed.
4. Task scheduling is of 2 types, preemptive, non-preemptive.
5. Pipe is a logical communication channel for processes wanting to
communicate with each other.
6. Message queue is something which processes use to share messages.
7. Synchronization is needed among the tasks accessing the common
resource. Semaphore and Mutual exclusion are the key concepts to
achieve synchronization.
8. Semaphore is a variable that ensures synchronized access to the shared
resources.
9. Mutual exclusion is a technique which allows a process to enter critical
section only if no other process is currently inside the critical section.
10. Memory management deals with efficient management of the main
memory in multi programming environments.
134 Embedded Systems

11. Managing memory ensures that there is less/no internal or external


fragmentation.
12. Fragmentation refers to the portion in the memory partition which is
unusable.
13. Virtual memory is vital to every OS which helps the programmers execute
their processes whose size is greater than the available RAM size.
14. Switching the context from a lower priority process to a higher priority
process is called context switching.

6.10 QUIZ
1. When does a task move from running state to blocked state?
2. Write down the memory hierarchy.
3. Cache memory is faster and costlier than RAM—true/false.
4. Differentiate preemptive and non-preemptive techniques.
5. Differentiate binary semaphore and counting semaphore.
6. Why is task synchronization needed?
7. What is the disadvantage of using virtual memory?

Answers for Quiz


1. When the task is in need of a resource in the midst of execution.
2. Registers-Cache memory-RAM-Hard disk-Flash memory.
3. True
4. Preemptive takes into account the priority of the processes, non-preemptive
doesn’t.
5. Binary semaphore just has two values 0/1 but counting semaphore deals
with a large set of values.
6. When two or more tasks try to access the same resource, care should be
taken in order to avoid conflicts, i.e., synchronization is needed among the
tasks accessing the common resource.
7. The major disadvantage with respect to the virtual memory scheme is
that, the time it takes to move portions of processes to the hard disk.
7
Networks for
Embedded Systems

Learning Outcomes
R Serial Communication Basics
• RS 232 model
• I2C (I Square C) model
R CAN and CAN OPEN
R SPI and SCI
R USB
R IEEE 1394 – Apple Fire Wire
R HDLC – An Insight
R Parallel Communication Basics
• PCI interface
• PCI-X interface
R Device Drivers – An Introduction
• Serial port device driver
• Parallel port device driver
R Recap
R Quiz

7.1 SERIAL COMMUNICATION BASICS


Serial communication is a common method of transmitting data between a
computer and a peripheral device such as a programmable instrument or even
another computer. Serial communication transmits data one bit at a time,
sequentially, over a single communication line to a receiver. Serial is also a most
popular communication protocol that is used by many devices for instrumentation;
numerous GPIB-compatible devices also come with an RS-232 based port.
This method is used when data transfer rates are very low or the data must be
transferred over long distances and also where the cost of cable and
136 Embedded Systems

synchronization difficulties make parallel communication impractical. Serial


communication is popular because most computers have one or more serial
ports, so no extra hardware is needed other than a cable to connect the instrument
to the computer or two computers together. Serial communication requires that
you specify the following five parameters:
1. The speed or baud rate of the transmission
2. The number of data bits encoding a character
3. The sense of the optional parity bit (whether to be used or not, if yes then
odd or even)
4. The number of stop bits
5. Full or half-duplex operation
Each transmitted character is packaged in a character frame that consists
of a single start bit followed by the data bits, the optional parity bit, and the stop
bit or bits, as shown in the Fig. 7.1 below. After the stop bit, the line may remain
idle indefinitely, or another character may immediately be started. The minimum
stop bit length required by the system can be larger than a “bit”. In fact it can be
1.5 stop bits, or 2 stop bits, or even the new hardware that doesn’t support
fractional stop bits can be configured to send 2 stop bits when transmitting and
requiring 1 stop bit when receiving.
Typically, serial communication is carried out using ASCII form of the data.
Communication is completed using 3 transmission lines: Ground, Transmit, and
Receive. Since serial is asynchronous (in many applications), the port is able to
transmit data on one line while receiving data on another. Other lines are available
for handshaking, but are not required. We already understood that the important
serial characteristics are baud rate, data bits, stop bits, and parity and for two
ports to communicate, these parameters must match.

Idle Start Data Parity Stop Idle


Bit Bits Bit Bits

Space
Mark

Bit Time

Character Frame

Fig. 7.1: Block diagrammatic representation of serial data transfer


Networks for Embedded Systems 137

The various types of serial communication standards are listed below:


1. RS-232
2. RS-423
3. RS-485
4. USB
5. Fire Wire
6. Ethernet
7. MIDI
8. PCI Express
9. SPI and SCI
10. IIC
11. IrDA
The terms DTE and DCE are very common in the data communication
technologies. DTE stands for Data Terminal Equipment and DCE stands for
Data Communications Equipment. But what do they really mean? As the full
DTE name indicates, this is a piece of device that ends a communication line,
whereas the DCE provides a path for communication.
Advantages and disadvantages
The only advantage of synchronous data transfer is the Lower overhead and
thus, greater throughput, compared to asynchronous one. But it has some
disadvantages such as:
1. Slightly more complex and
2. Hardware is more expensive
One of the main disadvantage of asynchronous technique is the large relative
overhead, where a high proportion of the transmitted bits are unique for control
purposes and thus carry no useful information. But it holds some advantages
like,
1. Simple and doesn’t require much synchronization on both communication
sides.
2. The timing is not as critical as for synchronous transmission; therefore
hardware can be made cheaper.
3. Set-up is very fast, so well suited for applications where messages are
generated at irregular intervals, for example data entry from the keyboard.

7.1.1 RS-232 Model


With the term “serial port” we will usually mean the hardware RS-232 and its
signal levels, connections etc. because many of the modern devices still connect
138 Embedded Systems

to serial port even after the development of many advanced technologies in


serial communication systems. There may be many reasons like ease of
debugging, cost effective, etc. Serial port is also termed as COM port.
RS-232 is a standard related to serial data communication between host
systems, commonly known as Data Terminal Equipment, or DTE and a peripheral
system termed, Data communication Equipment (also known as Data Circuit-
Terminating Equipment) or DCE. To be more specific, the device that connects
to the RS-232 interface is called a Data Communications Equipment (DCE)
and the device to which it connects (e.g., the computer) is called a Data Terminal
Equipment (DTE).
It was first introduced by the Electronics Industry Alliance (EIA) in the early
1960s and is commonly known as RS-232 (Recommended Standard 232). EIA-
232 or RS-232 or RS-232C is a complete serial communication protocol, which
specifies signal voltages, signal timing, signal function, pin wiring, and the mechanical
connections (i.e., either 25-pin DB-25 or 9-pin DB-9). In 1987, the EIA released
a new version of the standard and changed the name to EIA-232-D. And in 1991,
the EIA teamed up with Telecommunications Industry Association (TIA) and
issued a new version of the standard called EIA/TIA-232-E. Many people,
however, still refer the standard name as RS-232C, or just RS-232.
DB–25 connector:

13 1

25 14
1. Protective Ground 14. Secondary TD
2. Transmit Data (TD) 15. Transmit Clock
3. Receive Data (RD) 16. Secondary RD
4. Request to Send (RTS) 17. Receiver Clock
5. Clear to Send (CTS) 18. Local Loop Back
6. Data Set Ready (DSR) 19. Secondary RTS
7. Signal Ground 20. Data Terminal Ready (DTR)
8. Data Carrier Detect (CD) 21. Remote Loop Back
9. Reserved 22. Ring Indicate
10. Reserved 23. Data Rate Detect
11. Unassigned 24. Transmit Clock
12. Secondary CD 25. Test Mode
13. Secondary CTS

Fig. 7.2: RS-232C connections


RS-232 defines the purpose, signal timing and signal levels for each line. It’s an
Active LOW voltage driven interface i.e., it transmits positive voltage for a 0
bit, negative voltage for a 1 bit. And the output signal level usually swings between
+12 V and –12 V. The high level is defined in between +5V to +12V, and a low
level is defined in between –5V and –12V. With 2V of noise margin, a high level
for the receiver is defined in between +3V to +12V, and a low level is between
–3V to –12V. The signal voltage between +3 V and –3V, called “dead area” is
Networks for Embedded Systems 139

designed to absorb line noise. A low level is defined as logic 1 and is referred to
as “marking.” Similarly, a high level is defined as logic 0 and is referred to as
“spacing.”

7.1.2 I2C (I Square C) Model


I²C is a multi master, low-bandwidth, short distance, serial communication bus
protocol. Nowadays it is not only used on single boards, but also to attach low-
speed peripheral devices and components to a motherboard, embedded system,
or cell phone, as the new versions provide lots of advanced features and much
higher speed. The features like simplicity and flexibility make this bus attractive
for consumer and automotive electronics.
The basic design of I²C has a 7-bit address space with 16 reserved
addresses, which makes the maximum number of nodes that can communicate
on the same bus as 112. That means each I²C device is recognized by a unique
7-bit address. It is important to note that the maximum number of nodes are
obviously limited by the address space, and also by the total bus capacitance of
400 pf.
The two bi-directional lines, which carry information between the devices
connected to the bus, are known as Serial Data line (SDA) and Serial Clock line
(SCL). As the name indicates the SDA line contains the data and the SCL with
the clock signal for synchronization. The typical voltages used are +5 V or +3.3 V.

Vdd
Pull-up resistors
SDA
Master
Micro- SCL
controller

Peripheral Peripheral Peripheral


device device device
Slave 1 Slave 2 Slave #

Fig. 7.3: I2C (I Square C)bus connections

Like the CAN and LIN protocols, the I²C also follows the master-slave
communication protocol. But the I²C bus is a multi-master bus, which means
more than one IC devices are capable of initiating a data transfer can be connected
to it. The device that initiates the communication is called MASTER, whereas
the device being addressed by the Master is called as SLAVE. It is the master
device who always does generation of clock signals, which means each master
generates its own clock signals when transferring data on the bus.
140 Embedded Systems

Modes of Operation
The I²C bus can operate in three modes, or in other words the data on the I2C
bus can be transferred in three different modes.
1. Standard mode
2. Fast mode
3. High-Speed (HS) mode.
Standard mode
1. This is the original Standard mode released in early 80’s.
2. It has maximum data rates of 100kbps.
3. It uses 7-bit addressing, which provides 112 slave addresses.
Enhanced or Fast mode
The fast mode added some more features to the slave devices.
1. The maximum data rate was increased to 400 kbps.
2. To suppress noise spikes, Fast-mode devices were given Schmidt-triggered
inputs.
3. The SCL and SDA lines of an I²C-bus slave device were made to exhibit
high impedance when power was removed.

High-Speed Mode
This mode was created mainly to increase the data rate up to 36 times faster
than standard mode. It provides 1.7 MBPS (with C>b = 400pF), and 3.4Mbps
(with C>b = 100pF).
The major difference between High Speed (HS) mode in comparison to
standard mode is, HS mode systems must include active pull up resistors on the
SCL line. The other difference is, the master operating in HS-mode sends a
compatibility request signal in code form to slave, if not acknowledge (a bit
name within the I2C frame) remains high after receiving the compatibility code,
than the master assumes the slave is capable of HS-mode.

7.2 CAN AND CAN OPEN


CAN or Controller Area Network or CAN-bus is an ISO standard computer
network protocol and bus standard, designed for microcontrollers and devices
to communicate with each other without a host computer. Designed earlier for
industrial networking but recently more adopted to automotive applications, CAN
have gained widespread popularity for embedded control in the areas like industrial
automation, automotive, mobile machines, medical, military and other harsh
environment network applications.
Networks for Embedded Systems 141

The CAN is a “broadcast” type of bus. That means there is no explicit


address in the messages. All the nodes in the network are able to pick-up or
receive all transmissions. There is no way to send a message to just a specific
node. To be more specific, the messages transmitted from any node on a CAN
bus does not contain addresses of either the transmitting node, or of any intended
receiving node. Instead, an identifier that is unique throughout the network is
used to label the content of the message.
Each message carries a numeric value, which controls its priority on the
bus, and may also serve as an identification of the contents of the message. And
each of the receiving nodes performs an acceptance test or provides local filtering
on the identifier to determine whether the message, and thus its content, is
relevant to that particular node or not, so that each node may react only on the
intended messages. If the message is relevant, it will be processed; otherwise it
is ignored.
If the bus is free, any node may begin to transmit. But what will happen in
situations where two or more nodes attempt to transmit message (to the CAN
bus) at the same time. The identifier field, which is unique throughout the network
helps to determine the priority of the message. A “non-destructive arbitration
technique” is used to accomplish this, to ensure that the messages are sent in
order of priority and that no messages are lost. The lower the numerical value
of the identifier, the higher the priority. That means the message with identifier
having more dominant bits (i.e., bit 0) will overwrite other nodes’ less dominant
identifier so that eventually (after the arbitration on the ID) only the dominant
message remains and is received by all nodes.

Engine CAN Buses


Control Unit

Antilock Cruise
Transmission Airbags Window
braking Control
Control Node Control Node mirror
Control Node
Control
Node

Fig. 7.4: CAN communication system


142 Embedded Systems

Like any network applications, CAN also follows layered approach to the system
implementation. It conforms to the Open Systems Interconnection (OSI) model
that is defined in terms of layers. The ISO 11898 (For CAN) architecture defines
the lowest two layers of the seven layers OSI/ISO model as the data-link layer
and physical layer.
The rest of the layers (called Higher Layers) are left to be implemented by the
system software developers (used to adapt and optimize the protocol on multiple
media like twisted pair. Single wire, optical, RF or IR). The Higher Level Protocols
(HLP) is used to implement the upper five layers of the OSI in CAN. CAN use
a specific message frame format for receiving and transmitting the data. The
two types of frame format available are:
(a) Standard CAN protocol or Base frame format
(b) Extended CAN or Extended frame format
Error detection and correction
This mechanism is used for detecting errors in messages appearing on the CAN
bus, so that the transmitter can retransmit the message. The CAN protocol
defines five different ways of detecting errors. Two of these work at the bit
level, and the other three at the message level.
1. Bit Monitoring
2. Bit Stuffing
3. Frame Check
4. Acknowledgment Check
5. Cyclic Redundancy Check
1. Each transmitter on the CAN bus monitors (i.e., reads back) the transmitted
signal level. If the signal level read differs from the one transmitted, a Bit
Error is signaled. Note that no bit error is raised during the arbitration
process.
2. When five consecutive bits of the same level have been transmitted by a
node, it will add a sixth bit of the opposite level to the outgoing bit stream.
The receivers will remove this extra bit. This is done to avoid excessive
DC components on the bus, but it also gives the receivers an extra
opportunity to detect errors: if more than five consecutive bits of the
same level occur on the bus, a Stuff Error is signaled.
3. Some parts of the CAN message have a fixed format, i.e., the standard
defines exactly what levels must occur and when (Those parts are the
CRC Delimiter, ACK Delimiter, End of Frame, and also the Intermission).
If a CAN controller detects an invalid value in one of these fixed fields, a
Frame Error is signaled.
Networks for Embedded Systems 143

4. All nodes on the bus that correctly receives a message (regardless of


their being “interested” of its contents or not) are expected to send a
dominant level in the so-called Acknowledgment Slot in the message. The
transmitter will transmit a recessive level here. If the transmitter can’t
detect a dominant level in the ACK slot, an Acknowledgment Error is
signaled.
5. Each message features a 15-bit Cyclic Redundancy Check sum and any
node that detects a different CRC in the message than what it has calculated
itself will produce a CRC Error.

7.3 SPI AND SCI


The concept of serial communication is the process of sending data of one bit at
a time, sequentially, over a communication channel or computer bus. This is in
contrast to parallel communication, where several bits are sent as a whole, on a
link with several parallel channels. In this section we discuss about SPI and SCI
communication protocols indepth.

7.3.1 SPI
The Serial Peripheral Interface Bus or SPI bus is a synchronous serial data link
standard named by Motorola that operates in full duplex mode. Devices
communicate in master/slave mode where the master device initiates the data
frame. Multiple slave devices are allowed with individual slave select (chip
select) lines. Sometimes SPI is called a “four-wire” serial bus, contrasting with
three-, two-, and one-wire serial buses.

SCLK SCLK
SPI MOSI MOSI SPI
Master Slave
MISO MISO
SS SS

Fig. 7.5: SPI communication model

To begin a communication, the master first configures the clock, using a


frequency less than or equal to the maximum frequency the slave device supports.
Such frequencies are commonly in the range of 1–70 MHz.
The master then pulls the slave select low for the desired chip. If a waiting
period is required (such as for analog-to-digital conversion) then the master
must wait for at least that period of time before starting to issue clock cycles.
144 Embedded Systems

During each SPI clock cycle, a full duplex data transmission occurs:
R the master sends a bit on the MOSI line; the slave reads it from that same
line.
R the slave sends a bit on the MISO line; the master reads it from that same
line.
Not all transmissions require all four of these operations to be meaningful
but they do happen. Transmissions normally involve two shift registers of some
given word size, such as eight bits, one in the master and one in the slave; they
are connected in a ring. Data are usually shifted out with the most significant bit
first, while shifting a new least significant bit into the same register. After that
register has been shifted out, the master and slave have exchanged register
values. Then each device takes that value and does something with it, such as
writing it to memory. If there is more data to exchange, the shift registers are
loaded with new data and the process repeats.
Transmissions may involve any number of clock cycles. When there are
no more data to be transmitted, the master stops toggling its clock. Normally, it
then deselects the slave. Transmissions often consist of 8-bit words, and a
master can initiate multiple such transmissions if it wishes/needs. However,
other word sizes are also common, such as 16-bit words for touchscreen
controllers or audio codecs, like the TSC2101 from Texas Instruments; or 12-bit
words for many digital-to-analog or analog-to-digital converters.
Every slave on the bus that hasn’t been activated using its slave select line
must disregard the input clock and MOSI signals, and must not drive MISO.
The master must select only one slave at a time.
Some slave devices are designed to ignore any SPI communications in
which the number of clock pulses is greater than specified. Others don’t care,
ignoring extra inputs and continuing to shift the same output bit. It is common for
different devices to use SPI communications with different lengths, as, for
example, when SPI is used to access the scan chain of a digital IC by issuing a
command word of one size (perhaps 32 bits) and then getting a response of a
different size (perhaps 153 bits, one for each pin in that scan chain).
SPI devices sometimes use another signal line to send an interrupt signal to
a host CPU. Examples include pen-down interrupts from touchscreen sensors,
thermal limit alerts from temperature sensors, alarms issued by real time clock
chips, SDIO, and headset jack insertions from the sound codec in a cell phone.
Interrupts are not covered by the SPI standard; their usage is neither forbidden
nor specified by the standard.
Advantages
• Full duplex communication
Networks for Embedded Systems 145

• Complete protocol flexibility for the bits transferred


R Not limited to 8-bit words
R Arbitrary choice of message size, content, and purpose
• Extremely simple hardware interfacing
R Typically lower power requirements than I²C or SMBus due to less
circuitry (including pullups)
R No arbitration or associated failure modes
• Uses only four pins on IC packages, and wires in board layouts or
connectors, much less than parallel interfaces
• At most one “unique” bus signal per device (chip select); all others are
shared.
Disadvantages
• Requires more pins on IC packages than I²C, even in the “3-Wire” variant
• No in-band addressing; out-of-band chip select signals are required on
shared buses
• No hardware flow control by the slave (but the master can delay the next
clock edge to slow the transfer rate)
• No hardware slave acknowledgment (the master could be “talking” to
nothing and not know it)
• Supports only one master device
• No error-checking protocol is defined
• Generally prone to noise spikes causing faulty communication.

7.3.2 SCI
A Serial Communications Interface (SCI) is a device that enables the serial
(one bit at a time) exchange of data between a microprocessor and peripherals
such as printers, external drives, scanners, or mice. In this respect, it is similar to
a Serial Peripheral Interface (SPI). But in addition, the SCI enables serial
communications with another microprocessor or with an external network. The
term SCI was coined by Motorola in the 1970s. In some applications it is known
as a Universal Asynchronous Receiver/Transmitter (UART).
The SCI contains a parallel-to-serial converter that serves as a data
transmitter, and a serial-to-parallel converter that serves as a data receiver. The
two devices are clocked separately, and use independent enable and interrupt
signals. The SCI operates in a Nonreturn-To-Zero (NRZ) format, and can function
in half-duplex mode (using only the receiver or only the transmitter) or in full
duplex (using the receiver and the transmitter simultaneously). The data speed
is programmable.
146 Embedded Systems

Serial interfaces have certain advantages over parallel interfaces. The most
significant advantage is simpler wiring. In addition, serial interface cables can
be longer than parallel interface cables, because there is much less interaction
(crosstalk) among the conductors in the cable.
The term SCI is sometimes used in reference to a serial port. This is a
connector found on most personal computers, and is intended for use with serial
peripheral devices. Normally data is sent as 8 or 9 bit words [least significant bit
first].
A START bit marks the beginning of the frame. The start bit is active low.
The figure above shows a framed 8 bit data word. The data word follows the
start bit. A parity bit may follow the data word [after the MSB] depending on
the protocol used. A mark parity bit [always set high] may be used, a space
parity bit [always set low] may be used, or an even/odd parity bit may be used.
The even parity bit will be a 1 if the number of ones/zeros is even, or a zero
if there are an odd number. The odd parity bit will be high if there is an odd
number of ones/zeros in the data field. No parity bit is used in the example
above. A stop bit will normally follow the data field [or parity bit if used]. The
stop bit is used to bring [or insure] the signal rests at a logic high following the
end of the frame; so when the next start bit arrives it will bring the bus from a
high to low. Idle characters are sent as all ones with no start or stop bits. The RT
clock rate is 16 times the incoming baud rate. The RT clock is re-synchronized
after every start bit.

7.4 USB
Universal Serial Bus (USB) is a specification to establish communication between
devices and a host controller (usually a personal computer), developed and
invented by Intel. USB has effectively replaced a variety of interfaces such as
serial and parallel ports.
USB can connect computer peripherals such as mice, keyboards, digital
cameras, printers, personal media players,flash drives, Network Adapters, and
external hard drives. For many of those devices, USB has become the standard
connection method.
USB was designed for personal computers, but it has become common
place on other devices such as smart phones, PDAs and video game consoles,
and as a power cord. As of 2008, there are about 2 billion USB devices sold per
year, and approximately 6 billion total sold to date.
Unlike the older connection standards RS-232 or Parallel port, USB
connectors also supply electric power, so many devices connected by USB do
not need a power source of their own.
Networks for Embedded Systems 147

Providing an industry standard, USB was originally released in 1995 at 12


Mbps. Today, USB operates at 480 Mbps and is found in over six billion PC,
Consumer Electronics (CE), and mobile devices with a run rate of 2 billion USB
products being shipped into the growing market every year. In addition to high
performance and ubiquity, USB enjoys strong consumer brand recognition and
a reputation for ease-of-use.
Today, Hi-Speed USB 2.0, provides greater enhancement in performance
up to 40 times faster than USB 1.0, with a design data rate of up to 480 megabits
per second (Mbps). In addition, USB On-The-Go (OTG), a supplement to the
USB 2.0 specification, was created in 2002. USB OTG defines a dual role
device, which can act as either a host or peripheral, and can connect to a PC or
other portable devices through the same connector.
Portable computing devices such as handholds, cell phones and digital
cameras that connect to the PC as a USB peripheral benefit from having additional
capability to connect to other USB devices directly. For instance, users can
perform functions such as sending photos from a digital camera to a printer,
PDA, cell phone, or sending music files from an MP3 player to another portable
player, PDA or cell phone.

Host Controller

Logical Pipes

Endpoints in
the device
Device

Fig. 7.6: USB communication interface

Wireless USB is the new wireless extension to USB that combines the
speed and security of wired technology with the ease-of-use of wireless
technology. Wireless connectivity has enabled a mobile lifestyle filled with
conveniences for mobile computing users. Supporting robust high-speed wireless
connectivity, wireless USB utilizes the common WiMedia* Ultra-Wide Band
(UWB) radio platform developed by the WiMedia Alliance.
148 Embedded Systems

USB device communication is based on pipes (logical channels). A pipe is


a connection from the host controller to a logical entity, found on a device, and
named an endpoint. Because pipes correspond one-to-one to endpoints, the
terms are sometimes used interchangeably. A USB device can have up to 32
endpoints: 16 into the host controller and 16 out of the host controller. The USB
standard reserves one endpoint of each type, leaving a theoretical maximum of
30 for normal use. USB devices seldom have this many endpoints.
There are two types of pipes: stream and message pipes depending on the type
of data transfer.
• isochronous transfers: at some guaranteed data rate (often, but not
necessarily, as fast as possible) but with possible data loss (e.g., real time
audio or video).
• interrupt transfers: devices that need guaranteed quick responses
(bounded latency) (e.g., pointing devices and keyboards).
• bulk transfers: large sporadic transfers using all remaining available
bandwidth, but with no guarantees on bandwidth or latency (e.g., file
transfers).
• control transfers: typically used for short, simple commands to the device,
and a status response, used, for example, by the bus control pipe number
0.
A stream pipe is a uni-directional pipe connected to a uni-directional endpoint
that transfers data using an isochronous, interrupt, or bulk transfer. A message
pipe is a bi-directional pipe connected to a bi-directional endpoint that is exclusively
used for control data flow. An endpoint is built into the USB device by the
manufacturer and therefore exists permanently. An endpoint of a pipe is
addressable with tuple (device_address, endpoint_number) as specified in a
TOKEN packet that the host sends when it wants to start a data transfer session.
If the direction of the data transfer is from the host to the endpoint, an
OUT packet (a specialization of a TOKEN packet) having the desired device
address and endpoint number is sent by the host. If the direction of the data
transfer is from the device to the host, the host sends an IN packet instead. If
the destination endpoint is a uni-directional endpoint whose manufacturer’s
designated direction does not match the TOKEN packet (e.g., the manufacturer’s
designated direction is IN while the TOKEN packet is an OUT packet), the
TOKEN packet will be ignored. Otherwise, it will be accepted and the data
transaction can start. A bi-directional endpoint, on the other hand, accepts both
IN and OUT packets.

7.5 IEEE 1394 — APPLE FIREWIRE


The IEEE 1394 interface is a serial bus interface standard for high-speed
communications and isochronous real time data transfer, frequently used by
Networks for Embedded Systems 149

personal computers, as well as in digital audio, digital video, automotive, and


aeronautics applications. The interface is also known by the brand names of
FireWire (Apple), i.LINK (Sony), and Lynx (Texas Instruments). IEEE 1394
replaced parallel SCSI in many applications, because of lower implementation
costs and a simplified, more adaptable cabling system. The 1394 standard also
defines a backplane interface, though this is not as widely used.
IEEE 1394 is the High-Definition Audio-Video Network Alliance (HANA)
standard connection interface for A/V (Audio/Visual) component communication
and control. FireWire is also available in wireless, fiber optic, and coaxial versions
using the isochronous protocols.
Nearly all digital camcorders have included a four-circuit 1394 interface,
though, except for premium models, such inclusion is becoming less common. It
remains the primary transfer mechanism for high-end professional audio and
video equipment. Since 2003, many computers intended for home or professional
audio/video use have built-in FireWire/i.LINK ports, especially prevalent with
Sony and Apple’s computers. The legacy (alpha) 1394 port is also available on
premium retail motherboards.
The original release of IEEE 1394–1995 specified what is now known as
FireWire 400. It can transfer data between devices at 100, 200, or 400 Mbit/s
half-duplex data rates (the actual transfer rates are 98.304, 196.608, and 393.216
Mbit/s, i.e., 12.288, 24.576 and 49.152 megabytes per second respectively).
These different transfer modes are commonly referred to as S100, S200, and
S400.
Cable length is limited to 4.5 metres (14.8 ft), although up to 16 cables can
be daisy chained using active repeaters; external hubs, or internal hubs are
often present in FireWire equipment. The S400 standard limits any configuration’s
maximum cable length to 72 metres (236 ft). The 6-circuit connector is commonly
found on desktop computers, and can supply the connected device with power.
The 6-circuit powered connector, now referred to as an alpha connector,
adds power output to support external devices. Typically a device can pull about
7 to 8 watts from the port; however, the voltage varies significantly from different
devices. Voltage is specified as unregulated and should nominally be about 25
volts (range 24 to 30). Apple’s implementation on laptops is typically related to
battery power and can be as low as 9 V.
Devices on a FireWire bus can communicate by Direct Memory Access
(DMA), where a device can use hardware to map internal memory to FireWire’s
“Physical Memory Space”. The SBP-2 (Serial Bus Protocol 2) used by FireWire
disk drives uses this capability to minimize interrupts and buffer copies. In SBP-
2, the initiator (controlling device) sends a request by remotely writing a command
into a specified area of the target’s FireWire address space. This command
150 Embedded Systems

usually includes buffer addresses in the initiator’s FireWire “Physical Address


Space”, which the target is supposed to use for moving I/O data to and from the
initiator.
On many implementations, particularly those like PCs and Macs using the
popular OHCI, the mapping between the FireWire “Physical Memory Space”
and device physical memory is done in hardware, without operating system
intervention. While this enables high-speed and low-latency communication
between data sources and sinks without unnecessary copying (such as between
a video camera and a software video recording application, or between a disk
drive and the application buffers), this can also be a security or media rights
restriction risk if untrustworthy devices are attached to the bus.
For this reason, high-security installations will typically either purchase newer
machines which map a virtual memory space to the FireWire “Physical Memory
Space” (such as a Power Mac G5, or any Sun workstation), disable relevant
drivers at operating system level, disable the OHCI hardware mapping between
FireWire and device memory, physically disable the entire FireWire interface, or
opt not use FireWire hardware.
This feature can be used to debug a machine whose operating system has
crashed, and in some systems for remote-console operations. On FreeBSD, the
dcons driver provides both, using gdb as debugger. Under Linux, firescope and
fireproxy exist.

7.6 HDLC — AN INSIGHT


High-Level Data Link Control (HDLC) is a bit-oriented synchronous data link
layer protocol developed by the International Organization for Standardization
(ISO). HDLC frames can be transmitted over synchronous or asynchronous
links. Those links have no mechanism to mark the beginning or end of a frame,
so the beginning and end of each frame has to be identified. This is done by
using a frame delimiter, or flag, which is a unique sequence of bits that is
guaranteed not to be seen inside a frame. This sequence is ‘01111110’, or, in
hexadecimal notation, 0x7E.
Each frame begins and ends with a frame delimiter. A frame delimiter at
the end of a frame may also mark the start of the next frame. A sequence of 7
or more consecutive 1-bits within a frame will cause the frame to be aborted.
When no frames are being transmitted on a simplex or full-duplex
synchronous link, a frame delimiter is continuously transmitted on the link. Using
the standard NRZI encoding from bits to line levels (0 bit = transition, 1 bit = no
transition), this generates one of two continuous waveforms, depending on the
initial state:
Networks for Embedded Systems 151

0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0

Fig. 7.7: HDLC model

Synchronous framing
On synchronous links, this is done with bit stuffing. Any time that 5 consecutive
1-bits appear in the transmitted data, the data is paused and a 0-bit is transmitted.
This ensures that no more than 5 consecutive 1-bits will be sent. The receiving
device knows this is being done, and after seeing 5, 1-bits in a row, a following
0-bit is stripped out of the received data. If the following bit is a 1-bit, the
receiver has found a flag.
This also (assuming NRZI with transition for 0 encoding of the output)
provides a minimum of one transition per 6 bit times during transmission of data,
and one transition per 7 bit times during transmission of flag, so the receiver can
stay in sync with the transmitter. Note however, that for this purpose encodings
such as 8b/10b encoding are better suited. HDLC transmits bytes of data with
the least significant bit first (little-endian order).
Asynchronous framing
When using asynchronous serial communication such as standard RS-232 serial
ports, bits are sent in groups of 8, and bit-stuffing is inconvenient. Instead they
use “control-octet transparency”, also called “byte stuffing” or “octet stuffing”.
The frame boundary octet is 01111110, (7E in hexadecimal notation). A “control
escape octet”, has the bit sequence ‘01111101’, (7D hexadecimal). If either of
these two octets appear in the transmitted data, an escape octet is sent, followed
by the original data octet with bit 5 inverted. For example, the data sequence
“01111110” (7E hex) would be transmitted as “01111101 01011110” (“7D 5E”
hex). Other reserved octet values (such as XON or XOFF) can be escaped in
the same way if necessary.

7.7 PARALLEL COMMUNICATION BASICS


Parallel communication is a method of sending several data signals simultaneously
over several parallel channels. It contrasts with serial communication; this
distinction is one way of characterizing a communication link.
The basic difference between a parallel and a serial communication channel
is the number of distinct wires or strands at the physical layer used for
simultaneous transmission from a device. Parallel communication implies more
than one such wire/strand, in addition to a ground connection. An 8-bit parallel
152 Embedded Systems

channel transmits eight bits (or a byte) simultaneously. A serial channel would
transmit those bits one at a time. If both operated at the same clock speed, the
parallel channel would be eight times faster. A parallel channel will generally
have additional control signals such as a clock, to indicate that the data is valid,
and possibly other signals for handshaking and directional control of data
transmission. Before the development of high-speed serial technologies, the
choice of parallel links over serial links was driven by these factors:
• Speed: Superficially, the speed of a parallel data link is equal to the number
of bits sent at one time times the bit rate of each individual path; doubling
the number of bits sent at once, doubles the data rate. In practice, clock
skew reduces the speed of every link to the slowest of all of the links.
• Cable length: Crosstalk creates interference between the parallel lines,
and the effect worsens with the length of the communication link. This
places an upper limit on the length of a parallel data connection that is
usually shorter than a serial connection.
• Complexity: Parallel data links are easily implemented in hardware, making
them a logical choice. Creating a parallel port in a computer system is
relatively simple, requiring only a latch to copy data onto a data bus. In
contrast, most serial communication must first be converted back into
parallel form by a Universal Asynchronous Receiver/Transmitter (UART)
before they may be directly connected to a data bus.
The decreasing cost of integrated circuits, combined with greater consumer
demand for speed and cable length, has led to parallel communication links
becoming deprecated in favour of serial links; for example, IEEE 1284 printer
ports vs. USB, Parallel ATA vs. Serial ATA, and SCSI vs. FireWire.
On the other hand, there has been a resurgence of parallel data links in RF
communication. Rather than transmitting one bit at a time (as in Morse code
and BPSK), well-known techniques such as PSM, PAM, and multiple-input,
multiple-output communication send a few bits in parallel. (Each such group of
bits is called a “symbol”). Such techniques can be extended to send an entire
byte at once (256-QAM). More recently techniques such as OFDM have been
used in Asymmetric Digital Subscriber Line to transmit over 224 bits in parallel,
and in DVB-T to transmit over 6048 bits in parallel.

7.7.1 PCI Interface


Conceptually, the PCIe bus can be thought of as a high-speed serial replacement
of the older PCI/PCI-X bus, an interconnect bus using shared address/data
lines.
Networks for Embedded Systems 153

A key difference between PCIe bus and the older PCI, is the bus topology.
PCI uses a shared parallel bus architecture, where the PCI host and all devices
share a common set of address/data/control lines. In contrast, PCIe is based on
point-to-point topology, with separate serial links connecting every device to the
root complex (host). Due to its shared bus topology, access to the PCI bus is
arbitrated (in the case of multiple masters), and limited to 1 master at a time, in
a single direction. Furthermore, PCI’s clocking scheme limits the bus clock to
the slowest peripheral on the bus (regardless of the devices involved in the bus
transaction). In contrast, a PCIe bus link supports full-duplex communication
between any two endpoints, with no inherent limitation on concurrent access
across multiple endpoints.
In terms of bus protocol, PCIe communication is encapsulated in packets.
The work of packetizing and depacketizing data and status message traffic is
handled by the transaction layer of the PCIe port (described later). Radical
differences in electrical signaling and bus protocol require the use of a different
mechanical form factor and expansion connectors (and thus, new motherboards
and new adapter boards); PCI slots and PCIe slots are not interchangeable. At
the software level, PCIe preserves backward compatibility with PCI; legacy
PCI system software can detect and configure newer PCIe devices without
explicit support for the PCIe standard, though PCIe’s new features will not be
accessible. (And PCIe cards cannot be inserted into PCI slots).
The PCIe link between 2 devices can consist of anywhere from 1 to 32
lanes. In a multi-lane link, the packet data is striped across lanes, and peak data-
throughput scales with the overall link width. The lane count is automatically
negotiated during device initialization, and can be restricted by either endpoint.
For example, a single-lane PCIe (x1) card can be inserted into a multilane slot
(x4, x8, etc.), and the initialization cycle will auto negotiate the highest mutually
supported lane count.
The link can dynamically down configure the link to use fewer lanes, thus
providing some measure of failure tolerance in the presence of bad/unreliable
lanes. The PCIe standard defines slots and connectors for multiple widths: x1,
x4, x8, x16, x32. This allows PCIe bus to serve both cost-sensitive applications
where high throughput is not needed, as well as performance-critical applications
such as 3D graphics, network (10 Gigabit Ethernet, multiport Gigabit Ethernet),
and enterprise storage (SAS, Fibre Channel).
As a point of reference, a PCI-X (133 MHz 64 bit) device and PCIe device
at 4-lanes (x4), Gen1 speed have roughly the same peak transfer rate in a single
direction: 1064 MB/sec. The PCIe bus has the potential to perform better than
the PCI-X bus in cases where multiple devices are transferring data
communicating simultaneously, or if communication with the PCIe peripheral is
bidirectional.
154 Embedded Systems

7.7.2 PCI-X Interface


PCI-X, short for PCI-eXtended, is a computer bus and expansion card standard
that enhances the 32-bit PCI Local Bus for higher bandwidth demanded by
servers. It is a double-wide version of PCI, running at up to four times the clock
speed, but is otherwise similar in electrical implementation and uses the same
protocol. It has itself been replaced in modern designs by the similar sounding
PCI Express, which features a very different logical design, most notably being
a “narrow but fast” serial connection instead of a “wide but slow” parallel
connection.
The PCI SIG ratified PCI-X 2.0 adds 266 MHz and 533 MHz variants,
yielding roughly 2.15 GB/s and 4.3 GB/s throughput, respectively. PCI-X 2.0
makes additional protocol revisions that are designed to help system reliability
and add error-correcting codes to the bus to avoid resends. To deal with one of
the most common complaints of the PCI-X form factor, the 184-pin connector,
16-bit ports were developed to allow PCI-X to be used in devices with tight
space constraints.

PCI 2.0 32-bit

PCI Express × 1

PCI Express × 16

Fig. 7.8: PCI-X interface


Despite the various theoretical advantages of PCI-X 2.0 and its backward
compatibility with PCI-X and PCI devices, it has not been implemented on a
large scale (as of 2008). This lack of implementation primarily is because hardware
vendors have chosen to integrate PCI Express instead.

7.8 DEVICE DRIVERS — AN INTRODUCTION


In computing, a device driver or software driver is a computer program allowing
higher-level computer programs to interact with a hardware device.
Networks for Embedded Systems 155

A driver typically communicates with the device through the computer bus
or communications subsystem to which the hardware connects. When a calling
program invokes a routine in the driver, the driver issues commands to the device.
Once the device sends data back to the driver, the driver may invoke routines in
the original calling program. Drivers are hardware-dependent and operating
system specific. They usually provide the interrupt handling required for any
necessary asynchronous time dependent hardware interface.

7.8.1 Serial Port Device Driver


In computing, a serial port is a serial communication physical interface through
which information transfers in or out one bit at a time (contrast parallel port).
Throughout most of the history of personal computers, data transfer through
serial ports connected the computer to devices such as terminals and various
peripherals.
While such interfaces as Ethernet, FireWire, and USB all send data as a
serial stream, the term “serial port” usually identifies hardware more or less
compliant to the RS-232 standard, intended to interface with a modem or with a
similar communication device.
Device drivers can be abstracted into logical and physical layers. Logical
layers process data for a class of devices such as Ethernet ports or disk drives.
Physical layers communicate with specific device instances. For example, a
serial port needs to handle standard communication protocols such as XON/
XOFF that are common for all serial port hardware. This would be managed by
a serial port logical layer. However, the physical layer needs to communicate
with a particular serial port chip. 16550 UART hardware differs from PL-011.
The physical layer addresses these chip specific variations. Conventionally, OS
requests go to the logical layer first. In turn, the logical layer calls upon the
physical layer to implement OS requests in terms understandable by the hardware.
Inversely, when a hardware device needs to respond to the OS, it uses the
physical layer to speak to the logical layer.

7.8.2 Parallel Port Device Driver


A parallel port is a type of interface found on computers (personal and otherwise)
for connecting various peripherals. In computing, a parallel port is a parallel
communication physical interface. It is also known as a printer port or Centronics
port. The IEEE 1284 standard defines the bidirectional version of the port, which
allows the transmission and reception of data bits at the same time.
In early parallel ports, the data lines were unidirectional (data out only) so
it was not easily possible to feed data in to the computer. However, a work
around was possible by using 4 of the 5 status lines. A circuit could be constructed
156 Embedded Systems

to split each 8-bit byte into two 4-bit nibbles which were fed in sequentially
through the status lines. Each pair of nibbles was then recombined into an 8-bit
byte. This same method (with the splitting and recombining done in software)
was also used to transfer data between PCs using a lap link cable.
Device drivers, particularly on modern Windows platforms, can run in kernel-
mode (Ring 0) or in user-mode (Ring 3). The primary benefit of running a driver
in user mode is improved stability, since a poorly written user mode device
driver cannot crash the system by overwriting kernel memory. On the other
hand, user/kernel mode transitions usually impose a considerable performance
overhead, thereby prohibiting user mode drivers for low latency and high
throughput requirements.
Kernel space can be accessed by user module only through the use of
system calls. End user programs like the UNIX shell or other GUI based
applications are part of the user space. These applications interact with hardware
through kernel supported functions.
Virtual device drivers represent a particular variant of device drivers. They
are used to emulate a hardware device, particularly in virtualization environments,
for example when a DOS program is run on a Microsoft Windows computer or
when a guest operating system is run on, for example, a Xen host.
Instead of enabling the guest operating system to dialog with hardware,
virtual device drivers take the opposite role and emulate a piece of hardware, so
that the guest operating system and its drivers running inside a virtual machine
can have the illusion of accessing real hardware. Attempts by the guest operating
system to access the hardware are routed to the virtual device driver in the host
operating system as e.g., function calls. The virtual device driver can also send
simulated processor-level events like interrupts into the virtual machine.
Virtual devices may also operate in a nonvirtualized environment. For
example a virtual network adapter is used with a virtual private network, while
a virtual disk device is used with iSCSI.

POINTS TO REMEMBER
1. I²C uses only two bidirectional open-drain lines, Serial Data Line (SDA)
and Serial Clock (SCL),pulled up with resistors. Typical voltages used
are +5 V or +3.3 V although systems with other voltages are permitted.
2. The SPI bus can operate with a single master device and with one or
more slave devices.
3. A Synchronous Serial Port (SSP) is a controller that supports the Serial
Peripheral Interface (SPI), 4-wire Synchronous Serial Interface (SSI),
and Microwire serial buses. A SSP uses a master-slave paradigm to
communicate across its connected bus.
Networks for Embedded Systems 157

4. A peripheral is a device attached to a host computer, but not part of it,


and is more or less dependent on the host. It expands the host’s
capabilities, but does not form part of the core computer architecture.
5. In Linux environments, programmers can build device drivers either as
parts of the kernel or separately as loadable modules.
6. IEEE 1284 is a standard that defines bidirectional parallel communications
between computers and other devices.
7. A USB bus is reset using a prolonged (10 to 20 milliseconds) SE0 signal.
8. Wireless USB is used in game controllers, printers, scanners, digital
cameras, MP3 players, hard disks and flash drives.

7.9 QUIZ
1. Protocols may include which of the following
(a) Signaling (b) Authentication
(c) Error checking (d) All of these
2. A device driver simplifies programming by acting as translator between a
................. or operating systems that use it.
(a) hardware device and the applications
(b) Software and applications
(c) None of these
3. Virtual serial port emulation can be useful in case there is a lack of
available..................... ports or they do not meet the current requirements.
(a) Physical serial (b) Data
(c) Network (d) None of these
4. Some synchronous devices provide a ............ to synchronize data
transmission, especially at higher data rates.
(a) Data signal (b) Timer
(c) Clock signal (d) No signal

Answers for Quiz


1. (d) 2. (a)
3. (a) 4. (c)
8 An Overview and
Architectural Analysis of
8051 Microcontroller

Learning Outcomes
R Basic Introduction about Microcontrollers
R Comparison of 8051
R Architectural Details with Block Diagram of 8051.
R Microcontroller Resources
• Bus width
• Program and data memory
• Parallel ports
• EEPROM and flash memory
• Pulse Width Modulated (PWM) output
• On-chip Digital to Analog Converter (DAC) using PWM or timer
• On-chip A/D convertors (ADC)
• Reset circuit
• Watchdog Timer (WDT) device
• Bit wise manipulation capability
• Power down mode
• Timers
• Real time clock
• Serial asynchronous and synchronous communication interface
• Asynchronous serial communication
• Synchronous serial communication
An Overview and Architectural Analysis of 8051 Microcontroller 159

• SFR registers
• Port registers SFR
• PSW Program Status Word
• Stack Pointer
• Data Pointer
• Accumulator
• B register
• Program Counter
• SFR Registers for the internal timer
• Power control register
• Serial port registers
• Interrupt registers
R Internal and External Memory
R Memory Organizations
R Timer or Counter
R Input and Output Ports
R Interrupts — An Insight
R Assembly Language Programming
R Recap
R Quiz

8.1 INTRODUCTION
Microcontrollers are designed for embedded applications, unlike microprocessors
which are used for general purpose computing. The architecture of
microcontroller is also much different from microprocessors. Microcontrollers
shortly called as UC’s or MCU’s architecture varies with the application it is
designed for. So, different microcontrollers have different architectures. Typically
its architecture is a combination of processor core, memory, programmable
I/O’s, programmable memory (generally flash memory) and small amount of
RAM. 8051 follows Harvard architecture. Harvard architecture is a standard
computer architecture in which program instructions are stored in different
memory locations from data. Each type of memory is accessed via a separate
bus, allowing instructions and data to be fetched in parallel and thus improving
the speed of execution.
160 Embedded Systems

As microcontrollers are made for specific purpose, they can be


manufactured at low cost. As microcontrollers are made for specific purpose,
they can’t be reused for any other purpose other than what they are defined for.
The applications of these microcontrollers varies from washing machines
to space shuttle engine controllers. It depends on the imagination of the architect
to develop microcontroller architecture. A recent market release of
microcontroller in a shoe, which gives out the pressure on the foot, amount of
distance travelled by the person shows us that it’s the creativity of man which
makes microcontrollers to create wonders.
Table 8.1: Simple comparison: Pentium vs. 8051
FEATURE 8051 PENTIUM COMMENT
8051 internally divides clock
12 MHz. typical
1,000 MHz. by 12 so for 12 MHz. clock
Clock Speed but 60 MHz. ICs
(1GHz.) effective clock rate is just
available
1 MHz.
16
8051 can address 2 , or
64 Kbytes of memory.
Address bus 16 bits 32 bits 32
Pentium can address 2 , or
4 Gigabytes of memory.
Pentium’s wide bus allows
Data bus 8 bits 64 bits
very fast data transfers.
But - Pentium has multiple
ALU width 8 bits 32 bits 32 bit ALUs – along with
floating-point units.
Personal
Domestic
Computers
appliances,
Applications and other high
Peripherals,
performance
automotive etc.
areas.
Pentium runs hot as power
Power Small fraction of a
Tens of watts consumption increases with
consumption watt
frequency.
About 200
About 2 Euros. In Euros –
Cost of chip
volume Depending on
spec.

The block diagram of 8051 looks like as shown below in Fig. 8.1. It contains a
Central Processing Unit (CPU), which acts as a control unit, which determines
the control flow. Based on programming it also manages to share the resources
effectively. CPU works on an oscillator, which determines the speed of the
microcontroller. Oscillators are generally formed of a crystal oscillator. The
original 8051 core runs at 12 clock cycles per machine cycle, with most instructions
executing in one or two machine cycles. With a 12 MHz clock frequency, the
8051 could thus execute 1 million one cycle instructions per second or 500,000
An Overview and Architectural Analysis of 8051 Microcontroller 161

two cycle instructions per second. Enhanced 8051 cores are now commonly
used which run at six, four, two, or even one clock per machine cycle, and have
clock frequencies of up to 100 MHz, and are thus capable of an even greater
number of instructions per second.

Fig. 8.1: Block diagram of 8051

The 8051 architecture provides many functions (CPU, RAM, ROM, I/O, interrupt
logic, timer, etc.) in a single package. The complete architecture in detail is
presented in Fig. 8.2.
• 8-bit ALU, Accumulator and 8-bit Registers; hence it is an 8-bit
microcontroller
• 8-bit data bus—It can access 8 bits of data in one operation
• 16-bit address bus—It can access 216 memory locations—64 KB (65536
locations) each of RAM and ROM
• On-chip RAM—128 bytes (data memory)
• On-chip ROM—4 kBytes (program memory)
• Four byte bidirectional input/output port
• UART (serial port)
• Two 16-bit Counter/timers
• Two-level interrupt priority
• Power saving mode (on some derivatives)
162 Embedded Systems

Intel 8051 Microarchitecture P0.0 - P0.7 P2.0 - P2.7

Vcc
Port 0 Port 2
Vss Drivers Drivers

RAM Addr. Port 0 Port 2 EPROM/


Register RAM Latch Latch ROM

Program
Address
ACC Stack Register
Pointer

B TMP 2 TMP 1 Buffer


Register

PC
PC
Interrupt, Serial Port,
ALU Incremented
Incremente
and Timer Blocks

Program
PSW Counter
PSE N#
Instruction
Register

ALE/PROGH Timing
and DPTR
EA#NPP Control
RST

Port 1 Port 3
Latch Latch

Port 1 Port 3
OSC. Drivers Drivers

XTAL1 XTAL2
P10 - P17 P3.0 - P3.7

Fig. 8.2: Architecture of 8051

Here is a brief description of 8051 architecture.


R Program Counter (PC): A 16-bit register to hold the program memory
address of the instruction being currently fetched. Increments continuously
to point to the next instructions, unless there is chance in program flow
path.
R Data Pointer Register (DPTR): A 16-bit register to hold the external
memory address of the instruction being currently fetched or to be fetched
in an indirect addressing mode (A mode in which a register points to the
memory address from where it can be accessed).
R Accumulator (A): An 8-bit register to save an operand for an ALUs or
data transfer operation and whose most important function is to accumulate
the result after an ALU operation.
R B Register (B): An 8-bit register to save second operand for the ALU
and also accumulate the result after ALU operation for multiplication or
division.
An Overview and Architectural Analysis of 8051 Microcontroller 163

R Arithmetic and Logic Unit (ALU): A unit to perform an arithmetic or


logic operation at an instance as per the instruction to be executed and
give results.
R Processor Status Word (PSW): A register to save the bits. For example
the flags like carry bit for defining the register bank currently in use.
R Port (P0): An 8-bit port for the I/Os in a single chip mode and for data
cum lower order address in the expanded mode.
R Port (P1): An 8-bit port for the I/Os in a single chip mode a few deice
operations related bits in certain 8051 family variants in the expanded
mode.
R Port (P2): An 8-bit port for the I/Os in a single chip mode and for higher
order address in the expanded mode.
R Port (P3): An 8-bit port for the I/Os in a single chip mode and the serial
interface (SI) bits, timer T0 and T1 inputs, interrupts INT0 and INT1
inputs, RD and WR for the memory read-write in the expanded mode.
R Serial Interface (SI) Device : Serial device for the full duplex ( input as
well as output at an instant) UART serial I/O operations through the set
of two pins of P3, RxD and TxD and for half duplex (input or output at an
instant) synchronous communication of the bits through the same set of
pins, DATA and CLOCK.
R T0 and T1: Timing devices in 8051 family using four registers TH1,
TH0, TL1, and TL0. T2 third timer device also present in 8052 family.
R Special Function Registers (SFRs): All registers the SP, PSW, A, B,
IE, IP, SCON, TCON, SMOD, SBUF, PCON, TL0, TH0, TL1,TL0 are
called SFRs. These are indirectly addressable space only.
R ROM: Masked ROM, EPROM or flash EEPROM of 4kb in 8051 classic
family (or 8kb or 16kb in 8051 family variants).
R Internal RAM: For the read and write the 128 byte memory is indirectly
and directly addressable in address space 0x00 and 0x7F (only indirectly
addressable in address space 0x80 to 0xFF used in 8052 family 256 byte
RAM).
R Register banks: Four register banks each of 8 registers and these are
also part of the internal RAM.
R XTAL1 and XTAL2: Pins of the crystal in the oscillator generate 12
MHz.
164 Embedded Systems

R External Enable (EA): To enable use of external memory addresses


to external ROM in place of the one insides.
R Reset (RST) : Reset circuit input and also reset few output cycles to the
external peripheral devices to let processor reset and synchronize with
devices.
R INT0 and INT1: Active two external interrupts.
R VCC and GND: For 5V and ground connections respectively.
R Program Store Enable (PSEN): Active 0 when reading the external
memory byte.
R Read (RD): Active 0 when reading the byte from the external data memory.
R Write (WR): Active 0 when writing the byte to external data memory.
R Stack Pointer (SP) is an 8-bit wide register, which is incremented before
data is stored on to the stack using PUSH or CALL instructions. It’s a
stack defined anywhere on the 128 byte RAM.
R PORT 0 to 3 Latches and Drivers: Each I/O port is allotted a latch
and a driver. Latches are allocated address in SFR. These provide a
means to communicate with the external world.
R Serial Data Buffer: This is a means to convert the serial data that is
received from the external world in parallel and thus enhance the computing
speed inside the architecture. Also the parallel data from 8051 will be
converted serial when data is being sent out. It makes use of Transmit
and Receive buffers respectively for this.
R Timer Registers: There are two timers Timer0 and Timer1, which
internally are divided into 8-bit registers TL0, TH0 and TL1 and TH1
respectively.
R Control Registers: There are several control registers IP, IE, TMOD,
TCON, SCON and PCON. These contain the status and the control
information for interrupts, timers/counters and serial port. All these are
allocated separate addresses in SFR.
R Timing and Control Unit: It is useful for driving timing and control
signals for internal circuit and external system bus.
R Instruction Register decodes the opcode and gives information to timing
and control unit, based on which required operation is performed.
R ALU (Arithmetic and Logic Unit): ALU performs 8-bit arithmetic
and logical operations over the operands held by temporary registers.
User can’t use temporary registers.
An Overview and Architectural Analysis of 8051 Microcontroller 165

R Interrupt, serial port, timer and control units perform specific functions
under the control of timing and control unit.

8.2 MICROCONTROLLER RESOURCES

(i) Bus Width


Internal:
R Internal bus width of 8048 is 8.
R Registers and Internal RAMs need 8-bit addresses.
External:
R External address bus width is 16 in 8048/8051 /68HC11.
R External address space is 216 = 64kb.
R The time during which ALE is active (= 1) there is the address at bus.
R The time during which ALE is inactive (= 0) there is a data at bus.
R External data bus can be optional 8- or 16-bits in 8096 series family.

(ii) Program and Data Memory


In Harvard memory architecture, separate address space is used for program
memory and a data memory. In Princeton memory architecture same address
space is used for data memory and program memory.
Program Memory
R Program functions and routines in an MCU are mostly in a non-volatile
Read Only Memory (ROM).
R Program memory stores boot-up programs, interrupt service routines,
standard macros functions for program building block functions, UART
communication at free selected baud rates.
R Exemplary contents in the program memory in a system may be chanced
according to memory usage like on-chip or external EPROM or Flash
laboratory test stages, or mask ROM at production stage.
Data Memory
R Data variables and stacks in an MCU are mostly in a volatile read and
write memory (RAM) or registers.
R Data memory is used for storing direct or indirect or both addresse bytes
for the variables, pointers, look-up tables, limited size array.
R It also stores stack of all program functions, program counters and saved
variables.
166 Embedded Systems

(iii) Parallel Ports


R MCU has 32-bit I/O parallel ports.
R A microcontroller in a single chip mode provides few ports at chip itself
and in expanded mode it reduces no. of port pins.
R Port P0 is used as AD0–AD8 lower byte addresses as well 8-bit data bus
when the off-chip access mode MCU access the external memory chips
or devices (or ports).
R Port P2 is used as A8–A15 higher byte addresses when the MCU is in
off-chip access mode.
R The port P3 is used for control and interrupt signal in off-chip mode.
R MCU have memory mapped I/O devices which share the same address
space as the internal memory and external I/O devices share same address
space as that with external data memory.

(iv) EEPROM and Flash Memory


R EEPROM—Electrically erasable and electrically programmable ROMs.
R An erase cycle and read cycle is 10ms in EEPROM, FLASH and also in
68HC11.
R An EEPROM erases one byte at an instant whereas FLASH is when a
sector consisting of many bytes can be erased.
R FLASH can be also used as EEPROM which has capacity to erase one
or a few in a row byte in an instant which we call it as flash EEPROM.
R A variant of flash is boot mat flash. A sector (block) in boot mat flash is
OTP (One Time Programmable). Boot-up programs are one that runs on
start up.

(v) Pulse Width Modulated (PWM) Output


R An MCU Pulse Width Modulated Output (PWM) is used to obtain the
Digital to Anolog Conversion (DAC) operation.
R PWM output is one in which the width percentage is proportional to the
modulation parameter p n which relates linearly to the analog voltage.
R The pulse width percentage is (100)(256–p)/256, where p is modulation
parameter in an 8-bit pulse control register.
R Analog output is obtained when the PWM output is integrated by an
integrator.
An Overview and Architectural Analysis of 8051 Microcontroller 167

R Analog output in two cases as a function of pulse width percentage and


pulse control register parameter.
• Case 1: it is assumed that the integrator-1 design is such that the output
v is maximum when p = 255 and is 0 when p = 0.
• Case 2: it is assumed that the integrator-2 design is such that the
output v is maximum when p = 255 and is –v when p = 0.
R PWM control register can be an Out-Compare Register (OCR) in an
MCU. The OCR can be loaded the value p, which it compares with the
count value c in the free running counter.
R The count register can be PACT (Pulse Count Accumulator) in an MCU.
The PACT is periodically loaded with the modulation parameter such that
the interrupts at that instances generate 1s and 0s at each PACT overflow
interrupt.

(vi) On-chip D/A (DAC) Using PWM or Timer


R DAC stands for digital to analog conversion such that the analog current
or voltage output relates linearly to the digital values (inputs).
R The PWM device with PWM control register, an MCU, can be used as
the DAC by installing an appropriate integrator that interfaces with the
MCU.
R An MCU not having a PWM device but having a timer or a counter/timer
with an out-compare register or PACT can be programmed to obtain a
PWM output, which in turn gives the analog output through the integrator.

(vii) On-chip A/D Convertors (ADC)


R The ADC operation is needed in many controlled related operations.
R An MCU with on-chip ADC is fed with a certain signal (input analog
voltage), v through an amplifier, which first samples the signal for a certain
period (1 µs) then holds it for the required time.
R Sampling the signal is averaging for a certain period.
R Sample and hold circuit amplifier, gives the value close to true value because
of random noise in the period cancelling during averaging.
R After conversion to the digital bits that map to the signal ratio [v/(V+ref –
V–ref)], the converted bits can be latched at a port in the MCU or used for
control applications. V+ref and V–ref are the reference analog inputs to the
ADC, set such that when v = V+ref with all output bits equal to 1 and
v = V–ref with all output bits equal to 0.
168 Embedded Systems

R Mapping means that output bits b0, b1,…, bn-1 are such that the decimal
values of these are proportional to signal ratio.

(viii) Reset Circuit


R A reset circuit or device resets the microcontrollers so that a processor
starts smoothly.
R Reset circuit forces a processor to start the processing of instructions
from a starting address smoothly. Smooth start means without glitches—
sharp variations between low voltage and needed voltage.
R The reset circuit activates for a few clock cycles and then deactivates to
let the MCU processor start executing instructions.
R It becomes input on power up and becomes output pin for a few clock
cycles to enforce reset state in other interfaced external devices within
system.

(ix) Watchdog Timer (WDT) Device


R The watchdog timer is a timing device that resets the system after a pre-
defined timeout.
R A watchdog timer device is a time provided within microcontroller, which
resets it such that it starts execution of instruction from the beginning.
R One of the main application of WDT is it rescues the system if a fault
develops in between. Example, when a program hangs due to an interfaced
circuit fault or loop not exiting due to an exception condition. On restart, it
is expected to function normally.

(x) Bit-Wise Manipulation Capability


R The 8051 family MCU has a powerful bit manipulation capability and has
a bit manipulation capability with the carry bit playing the role of an
accumulator.
R A port bit can be set, reset, complemented, transferred or logically operated.

(xi) Power-Down Mode


R An MCU based system may be designed such that it need not be switched
of at any time so in this case processor works in power-down mode.
R The MCU initiates certain actions on execution of the WAIT and STOP
instructions.
R During the stop state, the MCU disconnects external devices and the
internal clock circuit deactivated, then the power backed RAM activates
and protects the protected area.
An Overview and Architectural Analysis of 8051 Microcontroller 169

R The 8051-MCU provides a power down mode bit for serial communication.
Where bit transferred rate can slowed by half, so that power is saved
during communication form MCU.

(xii) Timers
R A timer counts the equal interval clock pulses from an oscillator circuit.
The pulses are used after a suitable and fixed or programmable pre-
scaling (division) factor.
R 8051 family MCU has two timers T0 and T1.
R 8052 variants in the family has additional timer T2 and 8096 family has
two programmable timers T1 and T2.
R T1 facilitates high speed inputs. It captures the time instances into a FIFO
and records up to eight events in quick succession.
R A timer/counter mode runs in a non-stop, reset, and load disabled state. It
also works in other mode called real time clock.
(xiii) Real Time Clock
R Real Time clock is an important resources in a microcontroller because
using this as an OS sets the system clock and schedules the task and time
delay functions.
R It is an on-chip device made from the timer working in non-reset, non-
loadable and non-stop mode.
R Real time clock is used because it never stops and cannot be reset.

(xiv) Serial Asynchronous and Synchronous Communication Interface


R In Serial Communication, a stream of 1’s and 0’s is sent or received at
successive intervals on a single line called serial line.

(xv) Asynchronous Serial Communication


R In this communication, a byte or frame of bits on the serial line need not
maintain same phase difference between them.
R The transmitter does not communicate explicitly or implicitly clock bit to
the receiver for synchronization of receiver clock in the same phase.
R Each bit is for a period T, which is reciprocal of a rate called baud rate.
R In this Communication, there are two formats one is 10T and other one
11T.
R In 10T format, start bit,b=0 followed by 8 data bits and next the stop bit
b = 1.
170 Embedded Systems

R In 11T format, the next bit after data bits and before stop bit there is bit
for error checking or bit to indicate the meaning of the preceding 8 bits as
an addresse or data.

(xvi) Synchronous Serial Communication


R Serial Synchronous communication means each byte (or frame of bits)
on the serial line needs to maintain same phase difference between them.
R The transmitter does communicate explicitly or implicitly clock bit to the
receiver for synchronization of receiver clock in the same phase.
R This mode is used for interprocessor communication between the systems.
Each bit is for the T period which is the reciprocal of a rate called bit rate.
This usually is expressed in kbps units.

(xvii) SFR Registers


R The SFR registers are located within the Internal Memory in the address
range 80h to FFh. Not all locations within this range are defined. Each
SFR has a very specific function. Each SFR has an address (within the
range 80h to FFh) and a name which reflects the purpose of the SFR.
Although 128 bytes of the SFR address space is defined only 21 SFR
registers are defined in the standard 8051. Undefined SFR addresses
should not be accessed as this might lead to some unpredictable results.
Note some of the SFR registers are bit addressable. SFRs are accessed
just like normal Internal RAM locations.

(xviii) Port Registers SFR


R The standard 8051 has four 8-bit I/O ports: P0, P1, P2 and P3.
R For example Port 0 is a physical 8-bit I/O port on the 8051. Read (input)
and write (output) access to this port is done in software by accessing the
SFR P0 register which is located at address 80h. SFR P0 is also bit
addressable. Each bit corresponds to a physical I/O pin on the 8051.
Example access to port 0:
• SETB P0.7 ; sets the MSB bit of Port 0
• CLR P0.7 ; clears the MSB bit of Port 0
R The operand P0.7 uses the dot operator and refers to bit 7 of SFR P0.
The same bit could be addressed by accessing bit location 87h. Thus the
following two instructions have the same meaning:
• CLR P0.7
• CLR 87h
172 Embedded Systems

(xix) PSW Program Status Word


PSW, the Program Status Word is at address D0h and is a bit-addressable
register. The status bits are listed in Table 8.2.
Table 8.2: Program Status Word (PSW) flags

Symbol Bit Address Description


C (or CY) PSW.7 D7h Carry flag
AC PSW.6 D6h Auxiliary carry flag
F0 PSW.5 D5h Flag 0
RS1 PSW.4 D4h Register bank select 1
RS0 PSW.3 D3h Register bank select 0
OV
0V PSW.2 D2h Overflow flag
PSW.1 D1h Reserved
P PSW.0 D0h Even Parity flag

Carry flag. C
R This is a conventional carry, or borrow flag used in arithmetic operations.
The carry flag is also used as the ‘Boolean accumulator’ for Boolean
instruction operating at the bit level. This flag is sometimes referenced as
the CY flag.
Auxiliary carry flag. AC
R This is a conventional auxiliary carry flag (half carry) for use in BCD arithmetic.
Flag 0. F0
R This is a general purpose flag for user programming.
Register bank select 0 and register bank select 1. RS0 and RS1
R These bits define the active register bank (bank 0 is the default register
bank).
Overflow flag. OV
R This is a conventional overflow bit for signed arithmetic to determine if
the result of a signed arithmetic operation is out of range.
Even Parity flag. P
R The parity flag is the accumulator parity flag, set to a value, 1 or 0, such
that the number of ‘1’ bits in the accumulator plus the parity bit add up to
an even number.

(xx) Stack Pointer


R The Stack Pointer, SP, is an 8-bit SFR register at address 81h. The small
address field (8 bits) and the limited space available in the Internal RAM
An Overview and Architectural Analysis of 8051 Microcontroller 173

confined the stack size and this is sometimes a limitation for 8051
programmes. The SP contains the address of the data byte currently on
the top of the stack. The SP pointer is initialized to a defined address. A
new data item is ‘pushed’ on to the stack using a PUSH instruction which
will cause the data item to be written to address SP + 1. Typical instructions,
which cause modification to the stack are: PUSH, POP, LCALL, RET,
RETI etc. The SP SFR, on start-up, is initialized to 07h so this means the
stack will start at 08h and expand upwards in Internal RAM. If register
banks 1 to 3 are to be used the SP SFR should be initialized to start higher
up in Internal RAM. The following instruction is often used to initialise the
stack:
• MOV SP, #2Fh

(xxi) Data Pointer


R The Data Pointer, DPTR, is a special 16-bit register used to address the
external code or external data memory. Since the SFR registers are just
8-bits wide the DPTR is stored in two SFR registers, where DPL (82h)
holds the low byte of the DPTR and DPH (83h) holds the high byte of the
DPTR. For example, if you wanted to write the value 46h to external data
memory location 2500h, you might use the following instructions:
• MOV A, #46h : Move immediate 8-bit data 46h to A (Accumulator).
• MOV DPTR, #2504h : Move immediate 16-bit address value 2504h
to A.
x Now DPL holds 04h and DPH holds 25h.
• MOVX @DPTR, A : Move the value in A to external RAM location
2500h.
x Uses indirect addressing.
• Note the MOVX (Move X) instruction is used to access external
memory.

(xxii) Accumulator
R This is the conventional accumulator that one expects to find in any
computer, which is used to hold result of various arithmetic and logic
operations. Since the 8051 microcontroller is just an 8-bit device, the
accumulator is, as expected, an 8-bit register.
R The accumulator, referred to as ACC or A, is usually accessed explicitly
using instructions such as:
R INC A; Increment the accumulator
174 Embedded Systems

R However, the accumulator is defined as an SFR register at address E0h.


So the following two instructions have the same effect:
• MOV A, #52h : Move immediate the value 52h to the accumulator
• MOV E0h, #52h : Move immediate the value 52h to Internal RAM
location E0h,
R Usually the first method, MOV A, #52h, is used as this is the most
conventional (and happens to use less space, 2 bytes as opposed to 3
bytes!)

(xxiii) B Register
R The B register is an SFR register at addresses F0h which is bit-addressable.
The B register is used in two instructions only: i.e., MUL (multiply) and
DIV (divide). The B register can also be used as a general purpose register.

(xxiv) Program Counter


R The PC (Program Counter) is a 2 byte (16 bit) register which always
contains the memory address of the next instruction to be executed. When
the 8051 is reset the PC is always initialized to 0000h. If a 2 byte instruction
is executed the PC is incremented by 2 and if a 3 byte instruction is
executed the PC is incremented by three so as to correctly point to the
next instruction to be executed. A jump instruction (e.g., LJMP) has the
effect of causing the program to branch to a newly specified location, so
the jump instruction causes the PC contents to change to the new address
value. Jump instructions cause the program to flow in a non-sequential
fashion, as will be described later.

(xxv) SFR Registers for the Internal Timer


R The setup and operation of the on-chip hardware timers will be described
later, but the associated registers are briefly described here:
R TCON, the Timer Control register is an SFR at address 88h, which is bit-
addressable. TCON is used to configure and monitor the 8051 timers.
The TCON SFR also contains some interrupt control bits, described later.
R TMOD, the Timer Mode register is an SFR at address 89h and is used to
define the operational modes for the timers, as will be described later.
R TL0 (Timer 0 Low) and TH0 (Timer 0 High) are two SFR registers
addressed at 8Ah and 8Bh respectively. The two registers are associated
with Timer 0.
R TL1 (Timer 1 Low) and TH1 (Timer 1 High) are two SFR registers
An Overview and Architectural Analysis of 8051 Microcontroller 175

addressed at 8Ch and 8Dh respectively. These two registers are associated
with Timer 1.

(xxvi) Power Control Register


R PCON (Power Control) register is an SFR at address 87h. It contains
various control bits including a control bit, which allows the 8051 to go to
‘sleep’ so as to save power when not in immediate use.

(xxvii) Serial Port Registers


R Programming of the on-chip serial communications port will be described
later in the text. The associated SFR registers, SBUF and SCON, are
briefly introduced here, as follows:
R The SCON (Serial Control) is an SFR register located at addresses 98h,
and it is bit-addressable. SCON configures the behaviour of the on-chip
serial port, setting up parameters such as the baud rate of the serial port,
activating send and/or receive data, and setting up some specific control
flags.
R The SBUF (Serial Buffer) is an SFR register located at address 99h.
SBUF is just a single byte deep buffer used for sending and receiving
data via the on-chip serial port.

(xxviii) Interrupt Registers


R Interrupts will be discussed in more detail later. The associated SFR
registers are:
R IE (Interrupt Enable) is an SFR register at addresses A8h and is used to
enable and disable specific interrupts. The MSB bit (bit 7) is used to
disable all interrupts.
R IP (Interrupt Priority) is an SFR register at addresses B8h and it is bit
addressable. The IP register specifies the relative priority (high or low
priority) of each interrupt. On the 8051, an interrupt may either be of low
(0) priority or high (1) priority.

8.3 INTERNAL AND EXTERNAL MEMORY


The 8051 has a separate memory space for code (programs) and data. We will
refer here to on-chip memory and external memory as shown in Fig. 8.4. In an
actual implementation the external memory may, in fact, be contained within the
microcomputer chip. However, we will use the definitions of internal and external
memory to be consistent with 8051 instructions which operate on memory. Note,
the separation of the code and data memory in the 8051 architecture is a little
unusual. The separated memory architecture is referred to as Harvard
176 Embedded Systems

architecture whereas Von Neumann architecture defines a system where code


and data can share common memory.

FFFFh
External
8051 chip
DATA
Memory
Internal 0000h
Memory

Internal FFFFh
SFRs
External
Internal CODE
RAM 0000h Memory

ROM

Fig. 8.4: 8051 Memory representation

External Code Memory


R The executable program code is stored in this code memory. The code
memory size is limited to 64 KBytes (in a standard 8051). The code memory
is read-only in normal operation and is programmed under special
conditions e.g., it is a PROM or a Flash RAM type of memory.
External RAM Data Memory
R This is read-write memory and is available for storage of data. Up to
64KBytes of external RAM data memory is supported (in a standard
8051).
Internal Memory
The 8051’s on-chip memory consists of 256 memory bytes organised as follows:
First 128 bytes: 00h to 1Fh Register Banks
20h to 2Fh Bit Addressable RAM
30 to 7Fh General Purpose RAM
Next 128 bytes: 80h to FFh Special Function Registers
The first 128 bytes of internal memory is organized as shown in Fig. 8.5, and is
referred to as Internal RAM, or IRAM.
Register Banks: 00h to 1Fh
R The 8051 uses 8 general purpose registers R0 through R7 (R0, R1, R2,
R3, R4, R5, R6, and R7). These registers are used in instructions such as:
R ADD A, R2 ; adds the value contained in R2 to the accumulator.
An Overview and Architectural Analysis of 8051 Microcontroller 177

R Note since R2 happens to be memory location 02h in the Internal RAM


the following instruction has the same effect as the above instruction.
R ADD A, 02h
R Now, things get more complicated when we see that there are four banks
of these general purpose registers defined within the Internal RAM. For
the moment we will consider register bank 0 only. Register banks 1 to 3
can be ignored when writing introductory level assembly language
programs.
Bit Addressable RAM: 20h to 2Fh
R The 8051 supports a special feature which allows access to bit variables.
This is where individual memory bits in Internal RAM can be set or cleared.
In all there are 128 bits numbered 00h to 7Fh. Being bit variables any one
variable can have a value 0 or 1. A bit variable can be set with a command
such as SETB and cleared with a command such as CLR. Examples of
instructions are:
• SETB 25h; sets the bit 25h (becomes 1)
• CLR 25h; clears bit 25h (becomes 0)
R Note, bit 25h is actually bit b5 of Internal RAM location 24h.
R The Bit Addressable area of the RAM is just 16 bytes of Internal RAM
located between 20h and 2Fh. So if a program writes a byte to location
20h, for example, it writes 8 bit variables, bits 00h to 07h at once.
General Purpose RAM: 30h to 7Fh
R These 80 bytes of Internal RAM memory are available for general purpose
data storage. Access to this area of memory is fast compared to access
to the main memory and special instructions with single byte operands are
used. However, these 80 bytes are used by the system stack and in practice
little space is left for general storage. The general purpose RAM can be
accessed using direct or indirect addressing modes. Examples of direct
addressing:
• MOV A, 6Ah : reads contents of address 6Ah to accumulator
R Examples for indirect addressing (use registers R0 or R1):
• MOV R1, #6Ah : move immediate 6Ah to R1.
• MOV A, @R1 : move indirect: R1 contains address of Internal RAM
which contains data that is moved to A.
R These two instructions have the same effect as the direct instruction
above.
An Overview and Architectural Analysis of 8051 Microcontroller 179

SFR Registers
R The SFR registers are located within the Internal Memory in the address
range 80h to FFh, as shown in Fig. 8.5. Not all locations within this range
are defined. Each SFR has a very specific function. Each SFR has an
address (within the range 80h to FFh) and a name which reflects the
purpose of the SFR. Although 128 bytes of the SFR address space is
defined only 21 SFR registers are defined in the standard 8051. Undefined
SFR addresses should not be accessed as this might lead to some
unpredictable results. Note some of the SFR registers are bit addressable.
SFRs are accessed just like normal Internal RAM locations.

8.4 MEMORY ORGANIZATION


The 8051 has two types of memory and these are Program Memory and Data
Memory. Program Memory (ROM) is used to permanently save the program
being executed, while Data Memory (RAM) is used for temporarily storing
data and intermediate results created and used during the operation of the
microcontroller. Depending on the model in use at most a few kb of ROM and
128 or 256 bytes of RAM is used.
All 8051 microcontrollers have a 16-bit addressing bus and are capable of
addressing 64 kb memory. The MCS-51 has four distinct types of memory—
internal RAM, special function registers, program memory, and external data
memory.
Internal RAM (IRAM) is located from address 0 to address 0xFF. IRAM
from 0x00 to 0x7F can be accessed directly, and the bytes from 0x20 to 0x2F
are also bit-addressable. IRAM from 0x80 to 0xFF must be accessed indirectly,
using the @R0 or @R1 syntax, with the address to access loaded in R0 or R1.
Special Function Registers (SFR) are located from address 0x80 to 0xFF,
and are accessed directly using the same instructions as for the lower half of
IRAM. Some of the SFR’s are also bit-addressable.
Program memory (PMEM, though less common in usage than IRAM and XRAM)
is located starting at address 0. It may be on- or off-chip, depending on the
particular model of chip being used. Program memory is read-only, though some
variants of the 8051 use on-chip flash memory and provide a method of re-
programming the memory in-system or in-application. Aside from storing code,
program memory can also store tables of constants that can be accessed by
MOVC A, @DPTR, using the 16-bit special function register DPTR.
External data memory (XRAM) also starts at address 0. It can also be on
or off-chip; what makes it “external” is that it must be accessed using the
MOVX (Move external) instruction. Many variants of the 8051 include the
standard 256 bytes of IRAM plus a few KB of XRAM on the chip. If more
180 Embedded Systems

XRAM is required by an application, the internal XRAM can be disabled, and


all MOVX instructions will fetch from the external bus.

8.5 TIMER OR COUNTER


The MCS-51 has two 16 bit Timer/ Counter register: Timer 0 and Timer 1. Both
can be configured to operate either as timers or event counters (see Fig. 8.6
below).
In the Timer function, the register is incremented every machine cycle.

89s51
Osc C/T = 0 TL1 TH1
:12 C/T = 1

0 = open
P3.5/T1 G C/T M1 M0 G C/T M1 M0 1 = close

P3.3/INT1

TF1 TR1 TF0 TR0 IE1 IT1 IE0 IT0

= TIMER 0

= TIMER 1

Fig. 8.6: Diagram block timer/ counter operation


As shown in figure above, microcontroller can be used as timer or counter as
per users’ need. To switch between timer and counter, 8051 is provided with a
switch. Microcontroller will act as timer when switch position on upper and
microcontroller will act as counter when switch position on lower by controlling
C/T bit on TMOD register. The right switch position is dependent on BIT GATE
(Register TMOD), TR1 (Register TCON) than INT1.
An Overview and Architectural Analysis of 8051 Microcontroller 181

Timer/ Counter Mode Control ( TMOD ) Register


TIMER 1 TIMER 0
GATE C/T M1 M0 GATE C/T M1 M0

M1 M0 Operating
0 0 8048 Timer, TLx serves as 5 bit prescaler
0 1 16 bit Timer/Counter THx and TLx are cascaded, there is no prescaler
8 bit auto re-load Timer/ Counter THx holds a value which is to be reloaded
1 0
into TLx each time it overflows
(Timer 0) TL0 is an 8 bit Timer/ Counter controlled by the standard timer 0
1 1 control
control bits (Timer 1) Timer/Counter 1 stopped bits
(Timer 1) Timer/ Counter 1 stopped
Timer/ Counter Control ( TCON ) Register
MSB LSB
TF1 TR1 TF0 TR0 IE1 IT1 IE0 IT0

BIT SYMBOL FUNCTION


Timer 1 overflow flag. Set by hardware on Timer/Counter overflow.
TCON.7 TF1 Cleared by hardware when processor vector to interrupt routine, or
clearing the bit in software.
Timer 1 Run control bit . Set/ cleared by software to turn Timer/ Counter
TCON.6 TR1
on/off
Timer 0 overflow flag. Set by hardware on Timer/Counter overflow.
TCON.5 TF0 Cleared by hardware when processor vector to interrupt routine, or
clearing the bit in software.
Timer 1 Run control bit . Set/ cleared by software to turn Timer/ Counter
TCON.4 TR0
on/off
Interrupt 1 Edge flag. Set by hardware when external interrupt edge
TCON.3 IE1
detected. Cleared when interrupt processed.
Interrupt 1 type control bit. Set/ cleared by software to specify falling
TCON.2 IT1
edge/ low level triggered external interrupts
Interrupt 0 Edge flag. Set by hardware when external interrupt edge
TCON.1 IE0
detected. Cleared when interrupt processed.
TCON.0 IT0 Interrupt 0 type control bit. Set/ cleared by software to specify falling
edge/low level triggered external interrupts

8.6 INPUT AND OUTPUT P 1.0 1 40 Vcc


PORTS P 1.1 2 39 P 0.0 AD0
P 1.2 3 38 P 0.1 AD1
All 8051 microcontrollers have 4 P 1.3 4 37 P 0.2 AD2
I/O ports each comprising 8 bits P 1.4 5 36 P 0.3 AD3
P 1.5 6 P 0.4 AD4
which can be configured as inputs P 1.6 7
35
P 0.5 AD5
34
or outputs. Accordingly, in total of P 1.7 8 32 P 0.6 AD6
32 input/output pins enabling the RST 9 31 P 0.7 AD7
EA/Vpp*
RXD P 3.0 10
microcontroller to be connected to TXD P 3.1 11
Intel 8051 30
ALE/PROG
29
peripheral devices are available for INT0 P 3.2 12
28
PSEN
13 P 2.7 A15
use. INT1 P 3.3
14 27 P 2.6 A14
T0 P 3.4
26
Pins 1–8: Port 1 Each of these pins T1 P 3.5 15
25
P 2.5 A13
WR P 3.6 16 P 2.4 A12
can be configured as an input or an RD P 3.7 17 24 P 2.3 A11
output. XTAL 2 18 23 P 2.2 A10
XTAL 1 19 22 P 2.1 A9
Vss 20 21 P 2.0 A8

Fig. 8.7: Pin layout of 8051


182 Embedded Systems

Pin 9: RST A logic one on this pin disables the microcontroller and clears the
contents of most registers. In other words, the positive voltage on this pin resets
the microcontroller. By applying logic zero to this pin, the program starts execution
from the beginning.
Pins 10–17: Port 3 Similar to port 1, each of these pins can serve as general
input or output. Besides, all of them have alternative functions:
Pin 10: RXD Serial asynchronous communication input or Serial synchronous
communication output.
Pin 11: TXD Serial asynchronous communication output or Serial synchronous
communication clock output.
Pin 12: INT0 Interrupt 0 inputs.
Pin 13: INT1 Interrupt 1 input.
Pin 14: T0 Counter 0 clock input.
Pin 15: T1 Counter 1 clock input.
Pin 16: WR Write to external (additional) RAM.
Pin 17: RD Read from external RAM.
Pin 18, 19: X2, X1 Internal oscillator input and output. A quartz crystal which
specifies operating frequency is usually connected to these pins. Instead of it,
miniature ceramics resonators can also be used for frequency stability. Later
versions of microcontrollers operate at a frequency of 0 Hz up to over 50 Hz.
Pin 20: GND Ground.
Pins 21-28: Port 2 If there is no intention to use external memory then these
port pins are configured as general inputs/outputs. In case external memory is
used, the higher address byte, i.e., addresses A8–A15 will appear on this port.
Even though memory with capacity of 64Kb is not used, which means that not
all eight port bits are used for its addressing, the rest of them are not available as
inputs/outputs.
Pin 29: PSEN If external ROM is used for storing program then a logic zero
(0) appears on it every time the microcontroller reads a byte from memory.
Pin 30: ALE Prior to reading from external memory, the microcontroller puts
the lower address byte (A0–A7) on P0 and activates the ALE output. After
receiving signal from the ALE pin, the external register (usually 74HCT373 or
74HCT375 add-on chip) memorizes the state of P0 and uses it as a memory
chip address. Immediately after that, the ALU pin is returned its previous logic
state and P0 is now used as a Data Bus. As seen, port data multiplexing is
performed by means of only one additional (and cheap) integrated circuit. In
other words, this port is used for both data and address transmission.
An Overview and Architectural Analysis of 8051 Microcontroller 183

Pin 31: EA By applying logic zero to this pin, P2 and P3 are used for data and
address transmission with no regard to whether there is internal memory or not.
It means that even there is a program written to the microcontroller, it will not be
executed. Instead, the program written to external ROM will be executed. By
applying logic one to the EA pin, the microcontroller will use both memories,
first internal then external (if exists).
Pins 32–39: Port 0 Similar to P2, if external memory is not used, these pins can
be used as general inputs/outputs. Otherwise, P0 is configured as address output
(A0–A7) when the ALE pin is driven high (1) or as data output (Data Bus)
when the ALE pin is driven low (0).
Pin 40: VCC +5V power supply.
Pin configuration, i.e., whether it is to be configured as an input (1) or an output
(0), depends on its logic state. In order to configure a microcontroller pin as an
input, it is necessary to apply logic zero (0) to appropriate I/O port bit. In this
case, voltage level on appropriate pin will be 0.
Similarly, in order to configure a microcontroller pin as an input, it is necessary
to apply a logic one (1) to appropriate port. In this case, voltage level on appropriate
pin will be 5V (as is the case with any TTL input). This may seem confusing but
don’t lose your patience. It all becomes clear after studying simple electronic
circuits connected to an I/O pin.

8.7 INTERRUPTS — AN INSIGHT


As the name implies, an interrupt is some event which interrupts normal program
execution.
Program flow is always sequential, being altered only by those instructions which
expressly cause program flow to deviate in some way. However, interrupts give
us a mechanism to “put on hold” the normal program flow, execute a subroutine,
and then resume normal program flow as if we had never left it. This subroutine,
called an interrupt handler, is only executed when a certain event (interrupt)
occurs. The event may be one of the timers “overflowing”, receiving a character
via the serial port, transmitting a character via the serial port, or one of two
“external events”. The 8051 may be configured so that when any of these
events occur the main program is temporarily suspended and control passed to
a special section of code which presumably would execute some function related
to the event that occured. Once complete, control would be returned to the
original program. The main program never even knows it was interrupted.
The ability to interrupt normal program execution when certain events occur
makes it much easier and much more efficient to handle certain conditions. If it
were not for interrupts we would have to manually check in our main program
whether the timers had overflown, whether we had received another character
184 Embedded Systems

via the serial port, or if some external event had occured. Besides making the
main program ugly and hard to read, such a situation would make our program
inefficient since we’d be burning precious “instruction cycles” checking for
events that usually don’t happen.
The 8051 provides five interrupt sources. These are listed below.
1. Timer 0 (TF0) and timer 1 (TF1) interrupt.
2. External hardware interrupts, INT0 and INT1.
3. Serial communication interrupts TI and RI.
8051 can be configured such that when Timer 0 overflows or when a character
is sent/received, the appropriate interrupt handler routines are called.
Obviously, we need to be able to distinguish between various interrupts and
executing different code depending on what interrupt was triggered. This is
accomplished by jumping to a fixed address when a given interrupt occurs.

Interrupt Flag Interrupt Handler Address

External 0 IE0 0003h

Timer 0 TF0 000Bh

External 1 IE1 0013h

Timer 1 TF1 001Bh

Serial RI/TI 0023h

By consulting the above chart, we see that whenever Timer 0 overflows


(i.e., the TF0 bit is set), the main program will be temporarily suspended and
control will jump to 000BH. It is assumed that we have code at address 000BH
that handles the situation of Timer 0 overflowing.
8051 is provided with a solution that helps it to decide when multiple interrupts
of different categories occur at the same time. This is named as polling. 8051
automatically evaluates whether an interrupt should occur after every instruction.
When checking for interrupt conditions, it checks them in the following order:
• External 0 Interrupt
• Timer 0 Interrupt
• External 1 Interrupt
• Timer 1 Interrupt
• Serial Interrupt
An Overview and Architectural Analysis of 8051 Microcontroller 185

This means that if a Serial Interrupt occurs at the exactly same instant that an
External 0 Interrupt occurs, the External 0 Interrupt will be executed first and
the Serial Interrupt will be executed once the External 0 Interrupt has completed.
The 8051 offers two levels of interrupt priority: high and low. By using interrupt
priorities you may assign higher priority to certain interrupt conditions.
For example, you may have enabled Timer 1 Interrupt which is automatically
called every time Timer 1 overflows. Additionally, you may have enabled the
Serial Interrupt which is called every time a character is received via the serial
port. However, you may consider that receiving a character is much more
important than the timer interrupt. In this case, if Timer 1 Interrupt is already
executing you may wish that the serial interrupt itself interrupts the Timer 1
Interrupt. When the serial interrupt is complete, control passes back to Timer 1
Interrupt and finally back to the main program. You may accomplish this by
assigning a high priority to the Serial Interrupt and a low priority to the Timer 1
Interrupt.
Interrupt priorities are controlled by the IP SFR (B8h). The IP SFR has the
following format:

Bit Name Bit Address Explanation of Function


7 - - Undefined
6 - - Undefined
5 - - Undefined
4 PS BCh Serial Interrupt Priority
3 PT1 BBh Timer 1 Interrupt Priority
2 PX1 BAh External 1 Interrupt Priority
1 PT0 B9h Timer 0 Interrupt Priority
0 PX0 B8h External 0 Interrupt Priority

When considering interrupt priorities, the following rules apply:


• Nothing can interrupt a high-priority interrupt—not even another high
priority interrupt.
• A high-priority interrupt may interrupt a low-priority interrupt.
• A low-priority interrupt may only occur if no other interrupt is already
executing.
• If two interrupts occur at the same time, the interrupt with higher priority
will execute first. If both interrupts are of the same priority the interrupt
which is serviced first by polling sequence will be executed first.
186 Embedded Systems

The Interrupt structure of 8051 is as shown below.

0 IT0
ITO
INT0
INTO IE0
1

TF0
TFO

Interrupt
0 sources
IT1
INT1 IE1
1

TF1

T1
RI

It is interesting to see how 8051 handles interrupts. When an interrupt is triggered,


the following actions are taken automatically by the microcontroller:
R The current Program Counter is saved on the stack, low-byte first.
R Interrupts of the same and lower priority are blocked.
R In the case of Timer and External interrupts, the corresponding interrupt
flag is cleared.
R Program execution transfers to the corresponding interrupt handler vector
address.
R The Interrupt Handler Routine executes.
An interrupt ends when your program executes the RETI (Return from Interrupt)
instruction. When the RETI instruction is executed the following actions are
taken by the microcontroller:
R Two bytes are popped off the stack into the Program Counter to restore
normal program execution.
R Interrupt status is restored to its pre-interrupt status.

8.8 ASSEMBLY LANGUAGE PROGRAMMING


The best way to program 8051 is using Assembly Language Programming. An
assembly language is a low-level programming language for computers,
microprocessors, microcontrollers, and other integrated circuits. It implements
An Overview and Architectural Analysis of 8051 Microcontroller 187

a symbolic representation of the binary machine codes and other constants


needed to program a given CPU architecture. This representation is usually
defined by the hardware manufacturer, and is based on mnemonics that symbolize
processing steps (instructions), processor registers, memory locations, and other
language features. An assembly language is thus specific to certain physical (or
virtual) computer architecture. This is in contrast to most high-level programming
languages, which, ideally, are portable.
A utility program called an assembler is used to translate assembly language
statements into the target computer’s machine code. The assembler performs a
more or less isomorphic translation (a one-to-one mapping) from mnemonic
statements into machine instructions and data. This is in contrast with high-level
languages, in which a single statement generally results in many machine
instructions.
In general, the structure of Assembly language program comprises of opcode
and operands. Operands are the variables on which we want to perform the
required operation. Opcode specifies the type of operation that is to be specified.
Instruction Sets
The main operational groups of instructions in 8051 are:
R Arithmetic Instructions
R Branch Instructions
R Data Transfer Instructions
R Logic Instructions
R Bit Oriented Instructions
There are several other instruction sets, but they are less prominent.
The nomenclature followed to explain these instruction sets is
• A – accumulator;
• Rn – is one of working registers (R0–R7) in the currently active RAM
memory bank;
• Direct – is any 8-bit address register of RAM. It can be any general-
purpose register or a SFR (I/O port, control register etc.);
• @Ri – is indirect internal or external RAM location addressed by register
R0 or R1;
• #data – is an 8-bit constant included in instruction (0–255);
• #data16 – is a 16-bit constant included as bytes 2 and 3 in instruction
(0–65535);
188 Embedded Systems

• addr16 – is a 16-bit address. May be anywhere within 64KB of program


memory;
• addr11 – is an 11-bit address. May be within the same 2KB page of
program memory as the first byte of the following instruction;
• rel – is the address of a close memory location (from –128 to +127
relative to the first byte of the following instruction). On the basis of it,
assembler computes the value to add or subtract from the number currently
stored in the program counter;
• bit – is any bit-addressable I/O pin, control or status bit; and
• C – is carry flag of the status register (register PSW).
Arithmetic Operation Group
Arithmetic instructions perform several basic operations such as addition,
subtraction, division, multiplication etc. After execution, the result is stored in
the first operand. For example:
ADD A,R1 – The result of addition (A+R1) will be stored in the accumulator.

ARITHMETIC INSTRUCTIONS

Mnemonic Description Byte Cycle


ADD A,Rn Adds the register to the accumulator 1 1
ADD A,direct Adds the direct byte to the accumulator 2 2
ADD A,@Ri Adds the indirect RAM to the accumulator 1 2
ADD A,#data Adds the immediate data to the accumulator 2 2
Adds the register to the accumulator with a
ADDC A,Rn 1 1
carry flag
Adds the direct byte to the accumulator with
ADDC A,direct 2 2
a carry flag
Adds the indirect RAM to the accumulator
ADDC A,@Ri 1 2
with a carry flag
Adds the immediate data to the accumulator
ADDC A,#data 2 2
with a carry flag
Subtracts the register from the accumulator
SUBB A,Rn 1 1
with a borrow
Subtracts the direct byte from the
SUBB A,direct 2 2
accumulator with a borrow
Subtracts the indirect RAM from the
SUBB A,@Ri 1 2
accumulator with a borrow
Subtracts the immediate data from the
SUBB A,#data 2 2
accumulator with a borrow
INC A Increments the accumulator by 1 1 1
Contd...
An Overview and Architectural Analysis of 8051 Microcontroller 189

INC Rn Increments the register by 1 1 2


INC Rx Increments the direct byte by 1 2 3
INC @Ri Increments the indirect RAM by 1 1 3
DEC A Decrements the accumulator by 1 1 1
DEC Rn Decrements the register by 1 1 1
DEC Rx Decrements the direct byte by 1 1 2
DEC @Ri Decrements the indirect RAM by 1 2 3
INC DPTR Increments the Data Pointer by 1 1 3
MUL AB Multiplies A and B 1 5
DIV AB Divides A by B 1 5
Decimal adjustment of the accumulator
DA A 1 1
according to BCD code
Branch Instructions
There are two kinds of branch instructions:
Unconditional jump instructions: Upon their executions, a jump to a new
location from where the program continues execution is executed.
Conditional jump instructions: A Jump to a new program location is executed
only if a specified condition is met. Otherwise, the program normally proceeds
with the next instruction.

BRANCH INSTRUCTIONS

Mnemonic Description Byte Cycle


ACALL addr11 Absolute subroutine call 2 6
LCALL addr16 Long subroutine call 3 6
RET Returns from subroutine 1 4
RETI Returns from interrupt subroutine 1 4
AJMP addr11 Absolute jump 2 3
LJMP addr16 Long jump 3 4
Short jump (from –128 to +127
SJMP rel locations relative to the following 2 3
instruction)
JC rel Jump if carry flag is set. Short jump. 2 3
Jump if carry flag is not set.Short
JNC rel 2 3
jump.
JB bit,rel Jump if direct bit is set. Short jump. 3 4
Contd...
190 Embedded Systems

Jump if direct bit is set and clears bit.


JBC bit,rel 3 4
Short jump.
JMP @A+DPTR Jump indirect relative to the DPTR 1 2
Jump if the accumulator is zero.Short
JZ rel 2 3
jump.
Jump if the accumulator is not zero.
JNZ rel 2 3
Short jump.
Compares direct byte to the
CJNE A,direct,rel accumulator and jumps if not equal. 3 4
Short jump.
Compares immediate data to the
CJNE A,#data,rel accumulator and jumps if not equal. 3 4
Short jump.
Compares immediate data to the
CJNE
register and jumps if not equal. Short 3 4
Rn,#data,rel
jump.
Compares immediate data to indirect
CJNE
register and jumps if not equal. Short 3 4
@Ri,#data,rel
jump.
Decrements register and jumps if not
DJNZ Rn,rel 2 3
0. Short jump.
Decrements direct byte and jump if not
DJNZ Rx,rel 3 4
0. Short jump.
NOP No operation 1 1

Data Transfer Instructions


Data transfer instructions move the content of one register to another. The
register of the content which is moved remains unchanged. If they have the
suffix “X” (MOVX), the data is exchanged with external memory.

DATA TRANSFER INSTRUCTIONS

Mnemonic Description Byte Cycle


Moves the register to the
MOV A,Rn 1 1
accumulator
Moves the direct byte to the
MOV A,direct 2 2
accumulator
Moves the indirect RAM to the
MOV A,@Ri 1 2
accumulator
Moves the immediate data to the
MOV A,#data 2 2
accumulator
Moves the accumulator to the
MOV Rn,A 1 2
register
MOV Rn,direct Moves the direct byte to the register 2 4
Moves the immediate data to the
MOV Rn,#data 2 2
register
Moves the accumulator to the direct
MOV direct,A 2 3
byte
Contd...
An Overview and Architectural Analysis of 8051 Microcontroller 191

MOV direct,Rn Moves the register to the direct byte 2 3


Moves the direct byte to the direct
MOV direct,direct 3 4
byte
Moves the indirect RAM to the direct
MOV direct,@Ri 2 4
byte
Moves the immediate data to the
MOV direct,#data 3 3
direct byte
Moves the accumulator to the
MOV @Ri,A 1 3
indirect RAM
Moves the direct byte to the indirect
MOV @Ri,direct 2 5
RAM
Moves the immediate data to the
MOV @Ri,#data 2 3
indirect RAM
Moves a 16-bit data to the data
MOV DPTR,#data 3 3
pointer
Moves the code byte relative to the
MOVC
DPTR to the accumulator 1 3
A,@A+DPTR
(address=A+DPTR)
Moves the code byte relative to the
MOVC A,@A+PC PC to the accumulator 1 3
(address=A+PC)
Moves the external RAM (8-bit
MOVX A,@Ri 1 3-10
address) to the accumulator
Moves the external RAM (16-bit
MOVX A,@DPTR 1 3-10
address) to the accumulator
Moves the accumulator to the
MOVX @Ri,A 1 4-11
external RAM (8-bit address)
Moves the accumulator to the
MOVX @DPTR,A 1 4-11
external RAM (16-bit address)
Pushes the direct byte onto the
PUSH direct 2 4
stack
Pops the direct byte from the
POP direct 2 3
stack/td>
Exchanges the register with the
XCH A,Rn 1 2
accumulator
Exchanges the direct byte with the
XCH A,direct 2 3
accumulator
Exchanges the indirect RAM with
XCH A,@Ri 1 3
the accumulator
Exchanges the low-order nibble
XCHD A,@Ri 1 3
indirect RAM with the accumulator

Logic Instructions
Logic instructions perform logic operations upon corresponding bits of two
registers. After execution, the result is stored in the first operand.
192 Embedded Systems

LOGIC INSTRUCTIONS

Mnemonic Description Byte Cycle


ANL A,Rn AND register to accumulator 1 1
ANL A,direct AND direct byte to accumulator 2 2
ANL A,@Ri AND indirect RAM to accumulator 1 2
ANL A,#data AND immediate data to accumulator 2 2
ANL direct,A AND accumulator to direct byte 2 3
ANL direct,#data AND
AND immediate data to
immediae data todirect
directregister
register 3 4
ORL A,Rn OR register to accumulator 1 1
ORL A,direct OR direct byte to accumulator 2 2
ORL A,@Ri OR indirect RAM to accumulator 1 2
ORL direct,A OR accumulator to direct byte 2 3
ORL direct,#data OR immediate data to direct byte 3 4
XRL A,Rn Exclusive OR register to accumulator 1 1
XRL A,direct Exclusive OR direct byte to accumulator 2 2
Exclusive OR indirect RAM to
XRL A,@Ri 1 2
accumulator
Exclusive OR immediate data to
XRL A,#data 2 2
accumulator
XRL direct,A Exclusive OR accumulator to direct byte 2 3
Exclusive OR immediate data to direct
XORL direct,#data 3 4
byte
CLR A Clears the accumulator 1 1
Complements the accumulator (1=0,
CPL A 1 1
0=1)
SWAP A Swaps nibbles within the accumulator 1 1
RL A Rotates bits in the accumulator left 1 1
Rotates bits in the accumulator left
RLC A 1 1
through carry
RR A Rotates bits in the accumulator right 1 1
Rotates bits in the accumulator right
RRC A 1 1
through carry

Bit-oriented Instructions
Similar to logic instructions, bit-oriented instructions perform logic operations.
The difference is that these are performed upon single bits.
An Overview and Architectural Analysis of 8051 Microcontroller 193

BIT-ORIENTED INSTRUCTIONS

Mnemonic Description Byte Cycle


CLR C Clears the carry flag 1 1
CLR bit Clears the direct bit 2 3
SETB C Sets the carry flag 1 1
SETB bit Sets the direct bit 2 3
CPL C Complements the carry flag 1 1
CPL bit Complements the direct bit 2 3
ANL C,bit AND direct bit to the carry flag 2 2
AND complements of direct bit to the
ANL C,/bit 2 2
carry flag
ORL C,bit OR direct bit to the carry flag 2 2
OR complements of direct bit to the carry
ORL C,/bit 2 2
flag
MOV C,bit Moves the direct bit to the carry flag 2 2
MOV bit,C Moves the carry flag to the direct bit 2 3

The bytes and cycles mentioned in all the instruction sets are very important, as
these decide the amount of memory occupied and the speed of execution for a
particular set of instructions or in short a program.
Note
To know the beauty of microcontroller, one has to do the hands-on experiments
with it. Try some experiments like driving a dc motor clockwise and anticlockwise
(Doing this you are actually making an automated car control) based on interrupts,
to experience the real power of microcontrollers. Try writing more programs
which can trigger your knowledge.

POINTS TO REMEMBER
1. Microcontroller is different from microprocessor.
2. 8051 follows Harvard architecture and it has an 8-bit ALU.
3. Microcontrollers are meant for specific purpose, means a dedicated
operation.
4. 8-bit internal data bus width and 16-bit internal address bus is supported
by 8051.
194 Embedded Systems

5. Program counter is a register that will hold the address of the next
instruction to be executed. It is 16 bits wide.
6. A and B registers are called accumulators where A is one of the operands
for most of the arithmetic operations and will also accumulate the results
in it. B register is meant for Division and subtraction operation.
7. There are many Special function registers available for carrying out
specific operations.
8. Program Status Word (PSW) will have the flag status bits in it and one
can refer PSW to read flags.
9. 8051 has 8-bit stack pointer with initial default value defined by processor
is 0x07.
10. Registers for serial IOs, timers, ports and interrupt handlers are also
supported.
11. Two external interrupt pins, INT0 and INT1.
12. Four ports of 8-bits each in single chip mode.
13. Two timers are supported.
14. Certain versions of 8051 even have DMA (Direct Memory Access)
support.
15. Most of the versions support Watch dog timer feature also.
16. Following instruction sets are available for the programmer to use.
(i) Arithmetic Instructions
(ii) Branch Instructions
(iii) Data Transfer Instructions
(iv) Logic Instructions and
(v) Bit Oriented Instructions

8.9 QUIZ (Reader can try answering these questions by


themselves)
1. What are the registers available in the 8051?
2. What are the functions of the 8051 registers?
3. What is stack, PC, SFR, PSW/Flags?
4. What is an instruction set?
5. What is a memory map? Why is it needed?
6. What is an assembly language program? How does it look?
An Overview and Architectural Analysis of 8051 Microcontroller 195

7. What are 8051 addressing modes?


8. What is the difference between Timer and Counter?
9. What is the purpose of TCON / TMOD registers?
10. Why are SCON and SMOD significant?
11. In how many modes timer can be operated?
9
Advanced Architectures

Learning Outcomes
R Basic Introduction to Processors
R ARM Architecture
• Different versions
• ARM internal-core block diagram
• Instruction set
• Programming model and data types
• C Assignments in ARM—few examples
R SHARC Architecture
• Working principle
• Addressing modes
• C Assignments with examples
R ARM vs. SHARC
R Blackfin Processors
• Core features
• Memory and DMA
• Microcontroller features
• Peripherals
R Texas Instruments—DSP Processor
R Assembly Language Programming on Hardware Processors
R Recap
R Quiz

9.1 BASIC INTRODUCTION TO PROCESSORS


A microprocessor incorporates the functions of a computer’s Central Processing
Unit (CPU) on a single integrated circuit. It is a multipurpose, programmable,
clock-driven, register based electronic device that accepts binary data as input,
processes it according to instructions stored in its memory, and provides results
as output.
Advanced Architectures 197

The first microprocessors emerged in the early 1970s and were used for electronic
calculators, using binary-coded decimal arithmetic on 4-bit words. Other
embedded uses of 4-bit and 8-bit microprocessors, such as terminals, printers,
various kinds of automation etc., followed soon after. Affordable 8-bit
microprocessors with 16-bit addressing also led to the first general purpose
microcomputers from the mid 1970s on.

9.2 ARM ARCHITECTURE


The ARM is a 32-bit Reduced Instruction Set Computer (RISC) Instruction Set
Architecture (ISA) developed by ARM Holdings. It was known as the Advanced
RISC Machine, and before that as the Acorn RISC Machine. The ARM
architecture is the most widely used 32-bit ISA in terms of numbers produced.
They were originally conceived as a processor for desktop personal computers
by Acorn Computers, a market now dominated by the x86 families used by IBM
PC compatible and Apple Macintosh computers. The relative simplicity of ARM
processors make them suitable for low power applications. As a result, they
have become dominant in the mobile and embedded electronics market, as
relatively low-cost, small microprocessors and microcontrollers.
• ARM is one of the most licensed and thus widespread processor cores in
the world.
• Used especially in portable devices due to low power consumption and
reasonable performance (MIPS/watt).
• Several interesting extensions available or in development like Thumb
instruction set and Jazelle Java machine.

9.2.1 Different Versions of ARM Processor


Processor cores: ARM6, ARM7, ARM9, ARM10, ARM11
• Extensions: Thumb, El Segundo, Jazelle etc.
• IP-blocks: UART, GPIO, memory controllers, etc.
Table: 9.1: ARM Processor – Different Versions
Area Power
CPU Description ISA Process Voltage Clock/MHz Mips/MHz
mm2 mW
ARM7TD
Core V4T 0.18u 1.8V 0.53 <0.25 60–110 0.9
MI
ARM7TD
Synthesizable V4T 0.18u 1.8V <0.8 <0.4 >50 0.9
MI-S
ARM9TD
MI
Core V4T 0.18u 1.8V 1.1 0.3 167–220
167-220 1.1
Macrocell
ARM920T 16+16kB V4T 0.18u 1.8V 11.8 0.9 140–200
140-200 1.05
cache
Macrocell
ARM940T
8+8kB cache
V4T 0.18u 1.8V 4.2 0.85 140–170
140-170 1.05
Synthesizable
ARM9E-S
core
V5TE 0.18u 1.8V ? ~1 133–200
133-200 1.1
Macrocell
ARM1020
E
32+32kB V5TE 0.18u 1.8V ~10 ~0.85 200–400
200-400 1.25
cache
198 Embedded Systems

Table 9.1 gives the different versions of ARM processor including the
specifications. This includes CPU, Description, ISA, Voltage, Clock speed and
MIPS details. The various properties of ARM architecture are detailed below:
• 32-bit RISC processor core (32-bit instructions)
• 37 pieces of 32-bit integer registers (16 available)
• Pipelined (ARM7: 3 stages)
• Cached (depending on the implementation)
• Von Neumann-type bus structure (ARM7), Harvard (ARM9)
• 8 / 16 / 32 -bit data types
• 7 modes of operation (usr, fiq, irq, svc, abt, sys, und)

9.2.2 ARM Internals — Core Block Diagram


Below figure (Fig. 9.1) gives the internal details of ARM processor along with
the core modes of operation detailed below.
A(30,0)
ALE

Address Register

Address
Incrementer Scan Control

A
L Register Bank
U
(31 x 32 bit registers) nEXEC
B (6 status registers) DATA32
U
S BIGEND
PROG32
MCLX
nWAIT
Booth's nRW
A Multiplier nBW
B
Instruction nIRQ
B B Decoder nFIQ
U U
S & nRESET
S
Control ABORT
Logic nOPC
Barrel nTRANS
Shifter nMREQ
SEQ
LOCK
32 bit ALU nCPI
CPA
CPB
nM(4:0)

Instruction Pipeline &


Write Data Register
Read Data Register

D&E nENOUT
DOUT(31.0) DATA(31,0)

Fig. 9.1: ARM processor—Internals


R User (usr): Normal program execution state
R FIQ (fiq): Data transfer state (fast irq, DMA-type transfer)
R IRQ (iqr): Used for general interrupt services
R Supervisor (svc): Protected mode for operating system support
Advanced Architectures 199

R Abort mode (abt): Selected when data or instruction fetch is aborted


R System (sys): Operating system ‘privilege’-mode for user
R Undefined (und): Selected when undefined instruction is fetched

9.2.3 ARM-register Set


The register set of ARM processor along with their details is given below.
R Register structure depends on mode of operation
R 16 pieces of 32-bit integer registers R0 – R15 are available in ARM-mode
R R0 – R12 are general purpose registers
R R13 is Stack Pointer (SP)
R R14 is subroutine Link Register
R Holds the value of R15 when BL-instruction is executed
R R15 is Program Counter (PC)
R Bits 1 and 0 are zeroes in ARM-state (32-bit addressing)
R R16 is state register (CPSR, Current Program Status Register)
There are 37 ARM registers in total of which variable amount is available as
banked registers depending on the mode of operation. R13 functions always as
stack pointer. R14 functions as link register in other than sys and usr–modes.
SPSR = Saved Program Status Register. Flag register Mode-bits tell the
processor operating mode and thus the registers available.

9.2.4 ARM-instruction Set


• Full 32-bit instruction set in native operating mode
• 32-bit long instruction word
• All instructions are conditional
• Normal execution with condition AL (always)
• For a RISC-processor, the instruction set is quite diverse with different
addressing modes.

9.2.5 ARM Programming Model and Data Types


• Traditional set of registers
– 15 32-bit General purpose registers
– Program Counter (PC)
– Current Program Status Register (CPSR)
– Stores condition code bits to record results of comparisons
200 Embedded Systems

• The memory system


– Memory is byte addressable
– 32-bit addresses
– Data access can be 8-bit bytes, 16-bit half words, or 32-bit words
• Load/Store Instructions
– LDR, LDRH, LDRB: load (half-word, byte)
– STR, STRH, STRB: store (half-word, byte)
– Addressing modes:
x Register indirect: LDR r0,[r1]
x With second register: LDR r0,[r1,–r2]
x With constant: LDR r0,[r1,#4]
x Base-plus-offset addressing : LDR r0,[r1,#16]
x Auto-indexing increments base register: LDR r0,[r1,#16]!

9.2.6 C Assignments in ARM—A Few Examples


• C:
x x = a + b;
• ARM:
x ADR r4,a ; get address for a
x LDR r0,[r4] ; get value of a
x ADR r4,b ; get address for b, reusing r4
x LDR r1,[r4] ; get value of b
x ADD r3,r0,r1 ; compute a+b
x ADR r4,x ; get address for x
x STR r3,[r4] ; store value of x
• The Branch Instruction (B) changes flow of control
x B #100 (adds 400 to the PC)
x BEQ (branches on equals from CPSR)
x BGT (branches on greater than in the CPSR)
• Loops, if stmts, switch and case stmts can be implemented with
branching
• All operations can be performed conditionally, testing CPSR, by appending
the following:
x EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE, LT, GT, LE
Advanced Architectures 201

9.3 SHARC ARCHITECTURE


The Super Harvard Architecture Single-Chip Computer (SHARC) is a
high performance floating-point and fixed-point DSP from Analog Devices.
SHARC is used in a variety of signal processing applications ranging from single-
CPU guided artillery shells to 1000-CPU over-the-horizon radar processing
computers. The original design dates to about January 1994. SHARC processors
are or were used because they have offered good floating-point performance
per watt. SHARC processors are typically intended to have a good number of
serial links to other SHARC processors nearby, to be used as a low-cost
alternative to SMP. Can issue some computations in parallel:
– Dual add-subtract;
– Fixed-point multiply/accumulate and add, subtract, average
– Floating-point multiply and ALU operation
– Multiplication and dual add/subtract
The SHARC has a 32-bit word-addressed address space. Depending on word
size this is 16 GB, 20 GB, or 24 GB. SHARC instructions may contain a 32-bit
immediate operand. Instructions without this operand are generally able to
perform two or more operations simultaneously. Many instructions are conditional,
and may be preceded with “if condition” in the assembly language. There are
a number of condition choices, similar to the choices provided by the x86 flags
register. There are two delay slots. After a jump, two instructions following the
jump will normally be executed.
The SHARC processor has built-in support for loop control. Up to 6 levels
may be used, avoiding the need for normal branching instructions and the
normal book keeping related to loop exit.
The SHARC has two full sets of general purpose registers. Code can instantly
switch between them, allowing for fast context switches between an application
and an OS or between two threads. Below table gives the different operating
systems of SHARC processor.

9.3.1 SHARC Working Principle


The SHARC Processor family dominates the floating-point DSP market with
exceptional core and memory performance and outstanding I/O throughput. For
as little as 319 MFLOPS/dollar, SHARC brings floating-point processing
performance to applications where dynamic range is a key. ADI helps automotive
design engineers meet their system design objectives by combining over 40
years of signal processing experience with the industry’s leading portfolio of
user validated signal processing circuit designs. ADI’s technologies are used to
address the most challenging signal chain requirements in advanced safety, power
202 Embedded Systems

train, chassis electronics systems in electric and fossil fuel powered vehicles
worldwide.
SHARC Processor

Operating Third Party Description


System Contact

Thread X ExpressLogic High performance, high quality RTOS, Small


footprint, deterministic.

C/OS-II Micrium The Real-Time Kemel is a portable, scalabel,


preemptive real-time, multitasking kernel for
microprocessors and microcontrollers.

The SHARC is a Harvard architecture word-addressed VLIW processor; it


knows nothing of 8-bit or 16-bit values since each address is used to point to a
whole 32-bit word, not just a byte. It’s a little-endian processor, thus making a
64-bit integer stored in memory with the least significant word preceding the
most significant word.
The word size is 48-bit for instructions, 32-bit for integers and normal floating-
point, and 40-bit for extended floating-point. Code and data are normally fetched
from on-chip memory. Small data types may be stored in wider memory, simply
wasting the extra space. A system that does not use 40-bit extended floating-
point might divide the on-chip memory into two sections, a 48-bit one for code
and a 32-bit one for everything else. Most memory-related CPU instructions
cannot access all the bits of 48-bit memory, but a special 48-bit register is provided
for this purpose. The special 48-bit register may be accessed as a pair of smaller
registers, allowing movement to and from the normal registers.

9.3.2 SHARC Addressing Modes


• Immediate value:
– R0 = DM (0x20000000);
• Direct load
– R0 = DM (_a)! Loads contents of _a
• Direct store
– DM (_a) = R0! Stores R0 at _a
• Base-Plus-Offset
– R0 = DM (M1, I0); ! Loads from location I0 + M1
• Load/store architecture
– No memory-direct operations; all data must be loaded into registers
• Two Data Address Generators (DAG):
Advanced Architectures 203

– PM: Program Memory and – DM: Data Memory


• Must set up DAG registers to control loads/stores
– DAG registers automatically update to give quick access to arrays
• Compiler allows programmer to control where data is placed in memory
– Either data memory or program memory
float dm a[N]; data memory and float pm b[N]; program memory
• Two data loads in one cycle:
F0 = DM(M0,I0), F1 = PM(M8,I9);

9.3.3 SHARC – C Assignments with Examples


• C:
x = a + b;
• SHARC:
R1 = DM (_a) ! Load a
R2 = DM (_b); ! Load b
R0 = R1 + R2;
DM (_x) = R0; ! Store result in x
• Shorter version using pointers:
R2 = DM(I1,M5), R1=PM(I8,M13);
R0 = R2+R1;
DM(I0,M5)=R0; ! Store in x
• The variable b is in DM and a is in PM
• The jump is the basic mechanism
– Jumps can be based on direct addressing or Jumps can be Indirect
addressed
– Uses DAG2 (PM) registers and Jumps can be PC-relative
• All Instructions may be executed conditionally
• Conditions come from:
– Arithmetic status (ASTAT)
– Mode control 1 (MODE1) and loop register

9.4 ARM VS. SHARC


• ARM7 is Von Neumann architecture and ARM9 is Harvard architecture
• SHARC is modified Harvard architecture.
204 Embedded Systems

– On-chip memory (> 1Gbit) evenly split between Program Memory (PM)
and Data Memory (DM)
– Program memory can be used to store some data.
– Allows data to be fetched from both memory in parallel
Some interesting applications of ARM and SHARC processors include:
• ARM
– Compaq iPAQ
– Nintendo Gameboy
• SHARC
– Cellular phone
– Music synthesis
– Stereo Receivers

9.5 BLACKFIN PROCESSORS


Blackfin 16/32-bit embedded processors offer software flexibility and scalability
for convergent applications: multi-format audio, video, voice and image processing,
multimode baseband and packet processing, control processing, and real-time
security.
Blackfin processors use a 32-bit RISC microcontroller programming model
on a SIMD architecture, which was co-developed by Intel and Analog Devices,
as MSA (Micro Signal Architecture). The Blackfin processor architecture was
announced in December, 2000 and first demonstrated at the Embedded Systems
Conference in June, 2001.
The Blackfin architecture incorporates aspects of ADI’s older SHARC
architecture and Intel’s XScale architecture into a single core, combining Digital
Signal Processing (DSP) and microcontroller functionality. There are many
differences in the core architecture between Blackfin/MSA and XScale/ARM
or SHARC, but the combination provides improvements in performance,
programmability and power consumption over traditional DSP or RISC
architecture designs.

9.5.1 Core Features


• For some applications, the DSP is central. It combines two 16-bit hardware
MACs, two 40-bit ALUs, and a 40-bit barrel shifter. This allows the
processor to execute up to three instructions per clock cycle, depending
on the level of optimization performed by the compiler and/or programmer.
• Other applications emphasize the RISC core. It includes memory
protection, different operating modes (user, kernel), single-cycle opcodes,
Advanced Architectures 205

data and instruction caches, and instructions for bit test, byte, word, or
integer accesses and a variety of on-chip peripherals.
The ISA also features a high level of expressiveness, allowing the assembly
programmer (or compiler) to highly optimize an algorithm to the hardware features
present.

Fig. 9.2: Analog Devices—Blackfin processor

9.5.2 Memory and DMA


The Blackfin uses a byte-addressable, flat memory map. Internal L1 memory,
internal L2 memory, external memory and all memory-mapped control registers
reside in this 32-bit address space, so that from a programming point-of-view,
the Blackfin has Von Neumann architecture.
The L1 internal SRAM memory, which runs at the core-clock speed of the
device, is based on a Harvard Architecture. Instruction memory and data memory
are independent and connect to the core via dedicated memory buses which
allows for high sustained data rates between the core and L1 memory. Portions
of instruction and data L1 SRAM can be optionally configured as cache
(independently). Certain Blackfin processors also have between 64KB and
256KB of L2 memory. This memory runs slower than the core clock speed.
Code and data can be mixed in L2.
Blackfin processors support a variety of external memories including SDRAM,
DDR-SDRAM, NOR FLASH, NAND FLASH and SRAM. Some Blackfin also
include mass-storage interfaces such as ATAPI, and SD/SDIO. They can support
hundreds of megabytes of memory in the external memory space.
Coupled with the significant core and memory system is a DMA engine
that can operate between any of its peripherals and main (or external) memory.
The processors typically have a dedicated DMA channel for each peripheral,
which enables very high throughput for applications that can take advantage of
it such as real time standard definition (D1) video encoding and decoding.
206 Embedded Systems

9.5.3 Microcontroller Features


The architecture contains the usual CPU, memory, and I/O found on
microprocessors or microcontrollers.
• Memory Protection Unit: All Blackfin processors contain a Memory
Protection Unit (MPU). The MPU provides protection and caching
strategies across the entire memory space. The MPU allows Blackfin to
support many full-featured operating systems, RTOSs and kernels like
Thread X, µC/OS-II, or (noMMU) Linux. The Blackfin MPU does not
provide address translation like a traditional Memory Management Unit
(MMU) thus it does not support virtual memory or separate memory
addresses per process. This is why Blackfin currently cannot support
operating systems requiring virtual memory such as WinCE or QNX.
Confusingly, in most of the Blackfin documentation, the MPU is referred
to as a MMU.
• User/Supervisor Modes: Blackfin supports three run-time modes:
supervisor, user and emulation. In supervisor mode, all processor resources
are accessible from the running process. However, when in user mode,
system resources and regions of memory can be protected (with the help
of the MPU). In a modern operating system or RTOS, the kernel typically
runs in supervisor mode and threads/processes will run in user mode. If a
thread crashes or attempts to access a protected resource (memory,
peripheral, etc.) an exception will be thrown and the kernel will then be
able to shut down the offending thread/process. The official guidance
from ADI on how to use the Blackfin in non-OS environments is to reserve
the lowest-priority interrupt for general purpose code so that all software
is run in supervisor space. This would not be as serious a deficiency if the
Blackfin had more than 9 general purpose interrupt vectors.
• Variable-Length, RISC-Like Instruction Set: Blackfin supports 16, 32 and
64-bit instructions. Commonly used control instructions are encoded as
16-bit opcodes while complex DSP and mathematically intensive functions
are encoded as 32 and 64-bit opcodes. This variable length opcode encoding
allows Blackfin to achieve good code density equivalent to modern
microprocessor architectures.

9.5.4 Peripherals
Blackfin processors contain a wide array of connectivity peripherals.
• USB 2.0 OTG (On-The-Go)
• ATAPI
• MXVR: A MOST (Media Oriented Systems Transport) Network Interface
Controller.
Advanced Architectures 207

• PPI (Parallel Peripheral Interface) : A parallel input/output port that can


be used to connect to LCDs, video encoders (video DACs), video decoders
(video ADCs), CMOS sensors, CCDs and generic, parallel and high-
speed devices. The PPI can run up to 75 MHz and can be configured
from 8 to 16-bits wide.
• SPORT: A synchronous, high speed serial port that can support TDM, I2S
and a number of other configurable framing modes for connection to
ADCs, DACs, other processors, FPGAs, etc.
• CAN: A wide area, low speed serial bus that is fairly popular in automotive
and industrial electronics.
• UART (Universal Asynchronous Receiver Transmitter) : Allows for bi-
directional communication with RS232 devices (PCs, modems, PC
peripherals, etc.), MIDI devices, IRDA devices.
• SPI : A fast serial bus used in many high-speed embedded electronics
applications.
• I²C (also known as TWI (Two-Wire interface)) : A lower speed, shared
serial bus.
Because all of the peripheral control registers are memory-mapped in the normal
address space, they are quite easy to set up.

9.6 TI-DSP PROCESSORS


Texas Instruments TMS320 is a blanket name for a series of Digital Signal
Processors (DSPs) from Texas Instruments. It was introduced on April 8, 1983
through the TMS32010 processor, which was then the fastest DSP on the market.
The processor is available in many different variants, some with fixed point
arithmetic and some with floating point arithmetic. The floating point DSP
TMS320C3x, which exploits delayed branch logic, has as many as three delay
slots.
The flexibility of this line of processors has led to it being used not merely
as a co-processor for digital signal processing but also as a main CPU. Newer
implementations support standard IEEE JTAG control for boundary scan and/or
in-circuit debugging.
The TMS320 architecture has been around for a while so a number of
product variants have developed. The product codes used by Texas Instruments
after the first TMS32010 processor have involved a very popular series of
processor named TMS320Cabcd where a is the main series, b the generation
and cd is some custom number for a minor subvariant.
For this reason people working with DSPs often abbreviate a processor as
“C5x” when the actual name is something like TMS320C5510, since all products
208 Embedded Systems

obviously have the name “TMS320” and all processors with “C5” in the name
are code compatible and share the same basic features. Sometimes you will
even hear people talking about “C55x” and similar subgroupings, since processors
in the same series and same generation are even more similar.
The TMS320 series can be programmed using C, C++, and/or assembly
language. Most work on the TMS320 processors is done using Texas Instruments
proprietary tool chain and their integrated development environment Code
Composer Studio, which includes a mini operating system called DSP/BIOS.
Additionally, a department at the Chemnitz University of Technology has
developed preliminary support for the TMS320C6x series in the GNU Compiler
Collection.
In November 2007, TI released part of its tool chain as freeware for non-
commercial users, offering the bare compiler, assembler, optimizer and linker
under a proprietary license. However, neither the IDE nor a debugger were
included, so for debugging and JTAG access to the DSPs, users still need to
purchase the complete tool chain.

9.7 ASSEMBLY LANGUAGE PROGRAMMING ON HARDWARE


PROCESSORS
An assembly language is a low-level programming language for computers,
microprocessors, microcontrollers, and other programmable devices. It
implements a symbolic representation of the machine codes and other constants
needed to program a given CPU architecture. This representation is usually
defined by the hardware manufacturer, and is based on mnemonics that symbolize
processing steps (instructions), processor registers, memory locations, and other
language features. An assembly language is thus specific to certain physical (or
virtual) computer architecture. This is in contrast to most high-level programming
languages, which, ideally, are portable.
A utility program called an assembler is used to translate assembly language
statements into the target computer’s machine code. The assembler performs a
more or less isomorphic translation (a one-to-one mapping) from mnemonic
statements into machine instructions and data. This is in contrast with high-level
languages, in which a single statement generally results in many machine
instructions.
Many sophisticated assemblers offer additional mechanisms to facilitate
program development, control the assembly process, and aid debugging. In
particular, most modern assemblers include a macro facility (described below),
and are called macro assemblers.
Typically a modern assembler creates object code by translating assembly
instruction mnemonics into opcodes, and by resolving symbolic names for memory
Advanced Architectures 209

locations and other entities. The use of symbolic references is a key feature of
assemblers, saving tedious calculations and manual address updates after program
modifications. Most assemblers also include macro facilities for performing textual
substitution—e.g., to generate common short sequences of instructions as inline,
instead of called subroutines.
Assemblers are generally simpler to write than compilers for high-level
languages, and have been available since the 1950s. Modern assemblers,
especially for RISC architectures, such as SPARC or POWER, as well as x86
and x86–64, optimize Instruction scheduling to exploit the CPU pipeline efficiently.
Some examples are as below:

MOV AL, 1h ; Load AL with immediate value 1


MOV CL, 2h ; Load CL with immediate value 2
MOV DL, 3h ; Load DL with immediate value 3
In each case, the MOV mnemonic is translated directly into an opcode in the
ranges 88–8E, A0–A3, B0–B8, C6 or C7 by an assembler, and the programmer
does not have to know or remember which.
Transforming assembly language into machine code is the job of an
assembler, and the reverse can at least partially be achieved by a disassembler.
Unlike high-level languages, there is usually a one-to-one correspondence
between simple assembly statements and machine language instructions.
However, in some cases, an assembler may provide pseudo instructions
(essentially macros) which expand into several machine language instructions
to provide commonly needed functionality. For example, for a machine that
lacks a “branch if greater or equal” instruction, an assembler may provide a
pseudo instruction that expands to the machine’s “set if less than” and “branch
if zero (on the result of the set instruction)”. Most full-featured assemblers also
provide a rich macro language (discussed below) which is used by vendors and
programmers to generate more complex code and data sequences.
Instructions (statements) in assembly language are generally very simple,
unlike those in high-level language. Generally, a mnemonic is a symbolic name
for a single executable machine language instruction (an opcode), and there is
at least one opcode mnemonic defined for each machine language instruction.
Each instruction typically consists of an operation or opcode plus zero or more
operands. Most instructions refer to a single value, or a pair of values. Operands
can be immediate (typically one byte values, coded in the instruction itself),
registers specified in the instruction, implied or the addresses of data located
elsewhere in storage. This is determined by the underlying processor architecture:
the assembler merely reflects how this architecture works. Extended mnemonics
are often used to specify a combination of an opcode with a specific operand,
210 Embedded Systems

e.g., the System/360 assemblers use B as an extended mnemonic for BC with a


mask of 15 and NOP for BC with a mask of 0.
Extended mnemonics are often used to support specialized uses of
instructions, often for purposes not obvious from the instruction name. For
example, many CPUs do not have an explicit NOP instruction, but do have
instructions that can be used for the purpose. In 8086 CPUs, the instructions
xchg ax, ax are used for nop, with nop being a pseudo-opcode to encode the
instructions xchg ax, ax. Some disassemblers recognize this and will decode
the xchg ax, ax instruction as nop. Similarly, IBM assemblers for System/360
and System/370 use the extended mnemonics NOP and NOPR for BC and BCR
with zero masks.
Some assemblers also support simple built-in macro-instructions that generate
two or more machine instructions. For instance, with some Z80 assemblers the
instructions ld hl,bc are recognized to generate ld l,c followed by ld h,b. These
are sometimes known as pseudo-opcodes.

POINTS TO REMEMBER
1. A processor register (or general purpose register) is a small amount of
storage available on the CPU whose contents can be accessed more
quickly than storage available elsewhere. 8051 follows Harvard
architecture and it has an 8-bit ALU.
2. The ARM architecture is licensable. Companies that are current or
former ARM licensees include Alcatel-Lucent, Apple Inc., Atmel,
Broadcom, etc.
3. Off-chip memory can be used with the SHARC. This memory can only
be configured for one single size. If the off-chip memory is configured
as 32-bit words to avoid waste, then only the on-chip memory may be
used for code execution and extended floating-point.
4. The Blackfin architecture encompasses various different CPU models,
each targeting particular applications.
5. A popular choice for 2G Software defined cell phone radios, particularly
GSM, circa, late 1990s when many Nokia and Ericsson cell phones
made use of the C54x family of TI DSP processor.
6. DA25x is an ARM processor and a C55x core. It has some on-chip
peripherals like a USB slave controller and security features.
Documentation of this chip is only available after signing a Texas
Instruments NDA.
7. Assembly can sometimes be portable across different operating systems
on the same type of CPU.
Advanced Architectures 211

8. There are some situations in which practitioners might choose to use


assembly language, such as when: interacting directly with the hardware,
for example in device drivers and interrupt handlers.

9.8 QUIZ (Readers are expected to answer these questions)


1. How does an ARM processor differ from a TI-DSP processor?
2. Features to be considered while selecting a processor—Discuss.
3. What is an instruction register?
4. What is cache? What is the usual size of cache in processors?
5. What is an assembly language program? Where is it needed?
6. What are address registers?
7. What is the difference between ARM and Blackfin processors?
8. What is the advantage of assembly coding over programming languages?
9. What is the latest TI-DSP release?
10. Think and come out with differences between 2G and 3G without referring
internet.
10
Coding Guidelines

Learning Outcomes:
R Need for standards in coding
R Limitations associated with coding
R Existing standards for programming
R Summary

10.1 CODING STANDARDS—THE DEFINITION


A coding standard is a set of rules describing how your code should look, which
features of the programming language you will use and how and possibly which
tools should be used to write the code. It can of course be as specific as you
want it to be.
Obviously, different programming environments mean insurmountable
differences. You can use your coding conventions to describe those differences.
Especially, if your team is already familiar with an environment, it is useful to
describe how to duplicate behaviour that might not be available in the new
environment. For example, if your team is new to Java, you could describe how
to achieve constructs familiar to C++ programmers like conditional compilation
or assertions.
With rapid technological changes it is often difficult for development teams
to identify the new features that exist in newer versions and the pitfalls of using
certain constructs. Languages like C++ and Java offer powerful constructs
such as throwable exceptions or garbage collection. These techniques might be
new to some members of your team. Although these new techniques usually
make programs easier to understand, they can also have a major influence on
the memory usage and general performance of your applications. Your coding
conventions should warn about the possible abuse of those techniques.
Coding Guidelines 213

Coding conventions are only useful if they are in tune with the work your team
is currently doing. To ensure that they don’t become obsolete, they have to be
constantly updated and adapted to the latest technologies used by your team.

10.2 THE PURPOSE


The goal of these guidelines is to create uniform coding habits among software
personnel in the engineering department so that reading, checking, and maintaining
code written by different persons becomes easier. The intent of these standards
is to define a natural style and consistency, yet leave to the authors of the
engineering department source code, the freedom to practice their craft without
unnecessary burden.
When a project adheres to common standards many good things happen:
• Programmers can go into any code and figure out what’s going on, so
maintainability, readability, and reusability are increased. Code walk through
becomes less painful.
• New people can get up to speed quickly.
• People new to a language are spared the need to develop a personal style
and defend it to death.
• People new to a language are spared making the same mistakes over and
over again, so reliability is increased.
• People make fewer mistakes in consistent environments.
• Idiosyncratic styles and college-learned behaviours are replaced with an
emphasis on business concerns—high productivity, maintainability, shared
authorship, etc.
Experience over many projects points to the conclusion that coding standards
help the project to run smoothly. They aren’t necessary for success, but they
help.
A mixed coding style is harder to maintain than a bad coding style. So it’s
important to apply a consistent coding style across a project. When maintaining
code, it’s better to confirm to the style of the existing code rather than blindly
follow this document or your own coding style.
Since a very large portion of project scope is after-delivery maintenance or
enhancement, coding standards reduce the cost of a project by easing the learning
or re-learning task when code needs to be addressed by people other than the
author, or by the author after a long absence. Coding standards help to ensure
that the author need not be present for the maintenance and enhancement phase.

10.3 THE LIMITATIONS


One problem with coding standards is that there is no ISO, ANSI, or W3C
standard. Therefore, every organization or group of programmers has to come
214 Embedded Systems

up with its own set of coding standards. The objective of this document is to
define coding standards and guidelines while coding for different technologies.

10.4 COMMON PROGRAMMING STANDARDS


In this section, we cover the coding standards in different sections in a broader
sense.
The following subsections cover the following generic topics:
• Modularization
• Data typing
• Names
• Organizing control structures
• Program layout
• Comments and (program) documentation

10.4.1 Modularization
A module is a collection of objects that are logically related. Those objects may
include constants, data types, variables, and program units (e.g., functions,
procedures, etc.). Note that objects in a module need not be physically related.
For example, it is quite possible to construct a module using several different
source files. Likewise, it is quite possible to have several different modules in
the same source file. However, the best modules are physically related as well
as logically related; that is, all the objects associated with a module exist in a
single source file (or directory, if the source file would be too large) and nothing
else is present.
Modules contain several different objects including constants, types,
variables, and program units (routines). Modules share many of the attributes
with routines; this is not surprising since routines are the major components of
a typical module. However, modules have some additional attributes of their
own. The following sections describe the attributes of a well-written module.
Module Attributes
A module is a generic term that describes a set of program related objects
(routines as well as data and types of objects) that are somehow coupled. Good
modules share many of the same attributes as good routines as well as the
ability to hide certain details from code outside the module. Good modules exhibit
strong cohesion. That is, a module should offer a (small) group of services that
are logically related. For example, a “printer” module might provide all the services
one would expect from a printer. The individual routines within the module would
provide the individual services.
Coding Guidelines 215

Good modules exhibit loose coupling. That is, there are only a few, well-
defined (visible) interfaces between the module and the outside world. Most
data is private, accessible only through accessing functions (see information
hiding below). Furthermore, the interface should be flexible. Good modules exhibit
information hiding. Code outside the module should only have access to the
module through a small set of public routines. All data should be private to that
module.

10.4.2 Data Typing, Declarations, Variables and Other Objects


Most languages’ built-in data types are abstractions of the underlying machine
organization and rarely does the language define the types in terms of exact
machine representations. For example, an integer variable may be a 16-bit two’s
complement value on one machine, a 32-bit value on another, or even a 64-bit
value. Clearly, a program written to expect 32 or 64 bit integers will malfunction
on a machine (or compiler) that only supports 16-bit integers. The reverse can
also be true.
One supposed advantage of a high level language is that it abstracts away
the machine dependencies that exist in data types. In theory, an integer is an
integer... In practice; there are short integers, integers, and long integers. Common
sizes include eight, sixteen, thirty-two, and even sixty-four bits, with more on the
way. Unfortunately, the abstraction the high level language provides can destroy
the ability to port a program from one machine to another.
Most modern high level language provides programmers with the ability to
define new data types as isomorphism (synonyms) of existing types. Using this
facility, it is possible to define a data type module that provides precise definitions
for most data types. For example, you could define the int16 and int32 data
types that always use 16 or 32 bits, respectively. By doing so, you can easily
guarantee that your programs can easily port between most systems (and their
compilers) by simply changing the definition of the int16 and int32 types on the
new machine. Consider the following C/C++ example:
On a 16-bit machine:
typedef int int16;
typedef long int32;

On a 32-bit machine:
typedef short int16;
typedef int int32;
Don’t redefine existing types. This may seem like a contradiction to the guideline
above, but it really isn’t. This statement says that if you have an existing type
216 Embedded Systems

that uses the name “integer” you should not create a new type named “integer”.
Doing so would only create confusion. Another programmer, reading your code,
may confuse the old “integer” type every time she/he sees a variable of type
integer. This applies to existing user types as well as predefined types. Since it
is possible to declare symbols at different points in a program, different
programmers have developed different conventions that concern the position of
their declarations. The two most popular conventions are the following:
– Declare all symbols at the beginning of the associated program unit
(function, procedure, etc.)
– Declare all variables as close as possible to their use.
Logically, the second scheme above would seem to be the best. However, it has
one major drawback – although names typically have only a single definition,
the program may use them in several different locations. But those who absolutely
desire to put their definitions as close to the for loop as possible can always do
something like the following:
// Previous statements in this code...
.
{
int i;
for (i=start; i <= end; ++k)
}
.
// Additional statements in this code.

10.4.3 Names
According to studies done, the use of high-quality identifiers in a program
contributes more to the readability of that program than any other single factor,
including high-quality comments. The quality of your identifiers can make or
break your program; program with high-quality identifiers can be very easy to
read, programs with poor quality identifiers will be very difficult to read. There
are very few “tricks” to developing high quality names; most of the rules are
nothing more than plain old fashion common sense. Unfortunately, programmers
(especially C/C++ programmers) have developed many arcane naming
conventions that ignore common sense. The biggest obstacle most programmers
have to learn how to create good names is an unwillingness to abandon existing
conventions. Yet their only difference when quizzed on why they adhere to
(existing) bad conventions seems to be “because that’s the way I’ve always
done it and that’s the way everybody else does it”.
Coding Guidelines 217

Naming conventions represent one area in Computer Science where there


are far too many divergent views (program layout is the other principle area).
The primary purpose of an object’s name in a programming language is to describe
the use and/or contents of that object. A secondary consideration may be to
describe the type of the object. Programmers use different mechanisms to handle
these objectives. Unfortunately, there are far too many “conventions” in place;
it would be asking too much to expect any one programmer to follow several
different standards. Therefore, this standard will apply across all languages as
much as possible.
The vast majority of programmers know only one language i.e., English.
Some programmers know English as a second language and may not be familiar
with a common non-English phrase that is not in their own language (e.g.,
rendezvous). Since English is the common language of most programmers, all
identifiers should use easily recognizable English words and phrases.

10.4.4 Organizing Control Structures


Although the control structures found in most modern languages trace their
roots back to Algol-60, there is a surprising number of subtle variations between
the control structures found in common programming languages in use today.
This paper will describe a mechanism to unify the control structures the various
programming languages use in an attempt to make it possible for a Visual BASIC
programmer to easily understand code written in Pascal or C++ as well as
make it possible for C++ programmers to read BASIC and Pascal programs,
etc.
Typical programming languages contain eight flow-of-control statements:
two conditional selection statements (if..then..else and case/switch), four loops
(while, repeat...until/do...while, and for loop), a program unit invocation (i.e.,
procedure call), and a sequence. There are other less common control structures
that include processes/coroutines, for each loops (iterators), and generators, but
this paper will focus only on the more common control mechanisms.
Control structures typically come in two forms: those that act on a single
statement as an operand and those that act on a sequence of statements. For
example, the if...then statement in Pascal operates on a single statement:
if (expression) then Single_Statement;
Of course it is possible to apply Pascal’s if statement to a list of statements,
but that involves creating a compound statement using a begin...end pair. There
are two problems with this type of statement. First of all, it introduces the problem
of where you are supposed to put the begin and end in a well-formatted program.
This is a very controversial issue with large numbers of programmers in different
camps. Some feel as if with a compound statement should look like this:
218 Embedded Systems

if (expression) then begin


{ Statement 1 }
{ Statement 2 }
.
.
.
{ Statement n }
end;
Others feel it should look like this:
if (expression) then
begin
{ Statement 1 }
{ Statement 2 }
.
.
.
{ Statement n }
end;
C/C++ programmers are even worse, there are no less than four common ways
of putting the opening and closing braces around a compound statement after an
“if”. The second problem with C/C++’s and Pascal’s “if” statements is the
ambiguity involved. Consider the following Pascal code:
if (expression) then
if (expression) then
(* Statement *)
else (* Statement *);
The nested code should be clearly intended. For example, in the below code it is
difficult to say which ‘if’ does the ‘else’ belong to.
Incorrect
if (expression) then
If (expression) then

endif;
Coding Guidelines 219

else

endif;
Correct
if (expression) then
If (expression) then

endif;
else

endif;
Now there is no question that the else belongs to the first if above, not the
second. Note that this form of “if” statement allows you to attach a list of
statements (between if and else or if and endif) rather than a single or compound
statement. Furthermore, it totally eliminates the religious argument concerning
where to put the braces or the begin...end pair on the if. The complete set of
modern programming language constructs includes:
– if...then...elseif...else...endif
– select...case...default...endselect (typical case/switch statement).
– while...endwhile
– repeat...until
– loop...endloop
– for...endfor
– break
– breakif
– continue
Loops
There are three general categories of looping constructs available in common
high-level languages—loops that test for termination at the beginning of the loop
(e.g., while), loops that test for loop termination at the bottom of the loop (e.g.,
repeat...until), and those that test for loop termination in the middle of the loop
(e.g., loop...endloop). It is possible simulate any one of these loops using any of
the others. This is particularly trivial with the loop...endloop construct:
/* Test for loop termination at beginning of LOOP...ENDLOOP */
loop
220 Embedded Systems

breakif (x==y);
.
.
.
endloop;
/* Test for loop termination in the middle of LOOP...ENDLOOP */
loop
.
.
.
breakif (x==y);
.
.
.
endloop;
/* Test for loop termination at the end of LOOP...ENDLOOP */
loop
.
.
.
breakif (x==y);
endloop;
Given the flexibility of the loop...endloop control structure, you might question
why one would even burden a compiler with the other loop statements. However,
using the appropriate looping structure makes a program far more readable,
therefore, you should never use one type of loop when the situation demands
another.

10.4.5 Program Layout


After naming conventions and where to put braces (or Begin…End), the other
major argument programmers engage in is how to lay out a program, i.e., what
are the indentations one should use in a well written program? Unfortunately,
the ideal program layout is something that varies by language. The layout of an
easy to read C/C++ program is considerably different than that of an assembly
language, Prolog, or Bison/YACC program. As usual, this section will describe
those conventions that generally apply to all programs. It will also discuss layouts
of the standard control structures described earlier. According to McConnell
Coding Guidelines 221

(Code Complete), research has shown that there is a strong correlation between
program indentation and comprehensibility. Miaria et. al (“Program Indentation
and Comprehension”) concluded that indentation in the two to four character
range was optimal even though many subjects felt that six-space indentation
looked better. These results are probably due to the fact that the eye has to
travel less distance to read indented code and therefore the reader’s eyes suffer
from less fatigue.
Steve McConnell, in Code Complete, mentions several objectives of good program
layout:
“The layout should accurately reflect the logical structure of the code.
Code Complete refers to this as the “Fundamental Theorem of Formatting”.
White space (blank lines and indentation) is the primary tool one can use to
show the logical structure of a program.
Consistently represent the logical structure of the code. Some common
formatting conventions (e.g., those used by many C/C++ programmers) are full
of inconsistencies. For example, why does the “{” go on the same line as an “if”
but below “int main()” (or any other function declaration)? A good style applies
consistently.
Improve readability. If the indentation scheme makes a program harder to
read, why waste time with it? As pointed out earlier, some schemes make the
program look pretty but, in fact, make it harder to read (see the example about
2–4 vs. 6 position indentation, above)”.
Withstand modifications. A good indentation scheme shouldn’t force a
programmer to modify several lines of code in order to affect a small change to
one line. For example, many programmers put a begin...end block (or “{“...”}”
block) after an if statement even if there is only one statement associated with
the if. This allows the programmer to easily add new statements to the then
clause of the if statement without having to add additional syntactical elements
later.
The principal tool for creating good layout is white space (or the lack thereof,
that is, grouping objects). The following paragraphs summarize McConnell’s
finding on the subject:
Grouping: Related statements should be grouped together. Statements that
logically belong together should contain no arbitrary interleaving white space
(blank lines or unnecessary indentation).
Blank lines: Blank lines should separate declarations from the start of code,
logically related statements from unrelated statements, and blocks of comments
from blocks of code.
222 Embedded Systems

Alignment: Align objects that belong together. Examples include type names in
a variable declaration section, assignment operators in a sequence of related
assignment statements, and columns of initialized data.
Indentation: Indenting statements inside block statements improves readability;
see the comments and rules earlier in this section.
In theory, a line of source code can be arbitrarily long. In practice, there
are several practical limitations on source code lines. Paramount is the amount
of text that will fit on a given terminal display device and what can be printed on
a typical sheet of paper. If this isn’t enough to suggest an 80 character limit on
source lines, McConnell suggests that longer lines are harder to read.
If a statement approaches the maximum limit of 80 characters, it should be
broken up at a reasonable point and split across two lines. If the line is a control
statement that involves a particularly long logical expression, the expression
should be broken up at a logical point (e.g., at the point of a low-precedence
operator outside any parentheses) and the remainder of the expression placed
underneath the first part of the expression. E.g.,
if
(
( ( x + y * z) < ( ComputeProfits(1980,1990) / 1.0775 ) ) &&
( ValueOfStock[ ThisYear ] >= ValueOfStock[ LastYear ] )
)
<< statements >>
endif;
Many statements (e.g., IF, WHILE, FOR, and function or procedure calls) contain
a keyword followed by a parenthesis. If the expression appearing between the
parentheses is too long to fit on one line, consider putting the opening and closing
parentheses in the same column as the first character of the start of the statement
and indenting the remaining expression elements. The example above
demonstrates this for the “IF” statement. The following examples demonstrate
this technique for other statements:
while
(
( NumberOfIterations < MaxCount ) &&
( i <= NumberOfIterations )
)
<< Statements to execute >>
endwhile;
Coding Guidelines 223

fprintf
(
stderr,
“Error in module %s at line #%d, encountered illegal value\n”,
ModuleName,
LineNumber
);
For block statements there should always be a blank line between the line
containing an if, elseif, else, endif ,while, endwhile, repeat, until, etc., and the
lines they enclose. This clearly differentiates statements within a block from a
possible continuation of the expression associated with the enclosing statement.
It also helps to clearly show the logical format of the code. Example:
if ( ( x = y ) and PassingValue( x, y ) ) then
Output( ‘This is done’ );
endif;
If a procedure, function, or other program unit has a particularly long actual or
formal parameter list, each parameter should be placed on a separate line. The
following (C/C++) examples demonstrate a function declaration and call using
this technique:
int
MyFunction
(
int NumberOfDataPoints,
float X1Root,
float X2Root,
float &YIntercept
);
x = MyFunction
(
GetNumberOfPoints(RootArray),
RootArray[ 0 ],
RootArray[ 1 ],
Solution
);
224 Embedded Systems

10.4.6 Comments and (program) Documentation


Almost everyone agrees that a program should have good comments.
Unfortunately, few people agree on the definition of a good comment. Some
people, in frustration, feel that minimal comments are the best. Others feel that
every line should have two or three comments attached to it. Everyone else
wishes they had good comments in their program but never seem to find the
time to put them in.
It is rather difficult to characterize a “good comment”. In fact, it’s much
easier to give examples of bad comments than it is to discuss good comments.
The following list describes some of the worst possible comments you can put in
a program:
• The absolute worst comment you can put into a program is an incorrect
comment. Consider the following Pascal statement:
x A := 10; { Set ‘A’ to 11 }
• It is amazing how many programmers will automatically assume the
comment is correct and tries to figure out how this code manages to set
the variable “A” to the value 11 when the code so obviously sets it to 10.
• The second worst comment you can place in a program is a comment
that explains what a statement is doing. The typical example is something
like “A := 10; { Set ‘A’ to 10 }”. Unlike the previous example, this comment
is correct. But it is still worse than no comment at all because it is redundant
and forces the reader to spend additional time reading the code (reading
time is directly proportional to reading difficulty). This also makes it harder
to maintain since slight changes to the code (e.g., “A := 9”) require
modifications to the comment that would not otherwise be required.
• The third worst comment in a program is an irrelevant one. Telling a joke,
for example, may seem cute, but it does little to improve the readability of
a program; indeed, it offers a distraction that breaks concentration.
• The fourth worst comment is no comment at all.
• The fifth worst comment is a comment that is obsolete or out of date
(though not incorrect). For example, comments at the beginning of the file
may describe the current version of a module and who last worked on it.
If the last programmer to modify the file did not update the comments, the
comments are now out of date.
Steve McConnell provides a long list of suggestions for high-quality code. These
suggestions include:
Use commenting styles that don’t break down or discourage modification.
Essentially, he’s saying pick a commenting style that isn’t so much work and
people refuse to use it. He gives an example of a block of comments surrounded
Coding Guidelines 225

by asterisks as being hard to maintain. This is a poor example since modern text
editors will automatically “outline” the comments for you. Nevertheless, the
basic idea is sound.
Comment as you go along. If you put commenting off until the last moment, then
it seems like another task in the software development process and management
is likely to discourage the completion of the commenting task in hopes of meeting
new deadlines.
Avoid self-indulgent comments. Also, you should avoid sexist, profane, or other
insulting remarks in your comments. Always remember, someone else will
eventually read your code.
Avoid putting comments on the same physical line as the statement they describe.
Such comments are very hard to maintain since there is very little room.
McConnell suggests that endline comments are okay for variable declarations.
For some this might be true but many variable declarations may require
considerable explanation that simply won’t fit at the end of a line. One exception
to this rule is “maintenance notes”. Comments that refer to a defect tracking
entry in the defect database are okay (note that the CodeWright text editor
provides a much better solution for this — buttons that can bring up an external
file). Endline comments are also useful for marking the end of a control structure
(e.g., “end{if};”).
Write comments that describe blocks of statements rather than individual
statements. Comments covering single statements tend to discuss the mechanics
of that statement rather than discussing what the program is doing.
Focus paragraph comments on the way rather than the how code should explain,
what the program is doing and why the programmer chose to do it that way
rather than explain what each individual statement is doing.
Use comments to prepare the reader for what is to follow. Someone reading the
comments should be able to have a good idea of what the following code does
without actually looking at the code. Note that this rule also suggests that
comments should always precede the code to which they apply.
When you do need to restore to some tricky code, make sure you fully document
what you’ve done.
Avoid abbreviations. While there may be an argument for abbreviating identifiers
that appear in a program, no way does this apply to comments.
Keep comments close to the code they describe. The prologue to a program
unit should give its name, describe the parameters, and provide a short description
of the program. It should not go into details about the operation of the module
itself. Internal comments should do that.
226 Embedded Systems

Comments should explain the parameters to a function, assertions about these


parameters, whether they are input, output, or in/out parameters.
Comments should describe a routine’s limitations, assumptions, and any side
effects.

10.5 PROJECT DEPENDENT STANDARDS


The standards and guidelines described in this document were selected on the
basis of common coding practices of people within our group and from many
language specific programming standard documents collected from the Internet.
They can’t be expected to be complete or optimal for each project and for each
language. Individual projects may wish to establish additional standards beyond
those given here and the language specific documents. Keep in mind that sweeping
per-project customizations of the standards are discouraged in order to make it
more likely that code throughout a project and across projects adopt similar
styles.
This is a list of coding practices that should be standardized for each project,
and may require additional specification or clarification beyond those detailed in
the standards documents.
1. Naming conventions:
What additional naming conventions should be followed? In particular,
systematic prefix conventions for functional grouping of global data and
also for names of structures, objects, and other data types may be useful.
2. Project specific contents of module and subroutine headers
3. File Organization
What kind of Include file organization is appropriate for the projects data
hierarchy?
Directory structure
Location of Make Files
4. Specifications for Error Handling
Specifications for detecting and handling of errors
Specifications for checking boundary conditions for parameters passed to
subroutines
5. Revision and Version Control
Configuration of archives, projects, revision numbering, and release
guidelines.
6. Guidelines for the use of lint** or other code checking programs.
7. Standardization of the development environment—compiler and linker
options and directory structures.
Coding Guidelines 227

** Lint is a code checking tool generally used when you post your application
from different development platform.

10.6 SUMMARY
The experience of many projects leads to the conclusion that using coding
standards makes the project goes smoother. It makes the code readable, easily
maintainable and makes it reusable. Since we can’t expect all the developers of
an application to remain in the project for its life time, the coding standards are
a must for future enhancement and releases. Knowledge transfer to a new
resource becomes an easy task with a more readable code. So the coding standard
plays an important role in software application development.
11 Embedded Systems—
Application, Design and
Coding Methodology

Learning Outcomes
R Embedded System Design
R Designers Perspective
R Requirements Specifications
R Implementation of the Proposed System
R Recap
R Quiz

11.1 EMBEDDED SYSTEM — DESIGN


In today’s world, embedded systems are everywhere—homes, offices, cars,
factories, hospitals, plans and consumer electronics. Their huge numbers and
new complexity call for a new design loom, one that emphasizes high-level tools
and hardware/software tradeoffs, rather than low-level assembly language
programming and logic design.
This chapter presents the traditionally distinct fields of software and hardware
design in a new integrated approach. It covers trends and challenges, introduces
the design and use of single purpose processors (“hardware”) and general
purpose processors (“software”), describes memories and buses, illustrates
hardware/software tradeoffs by means of a digital camera example, and
discusses advanced computation models, control systems, chip technologies,
and modern design tools.
Introduction to a Simple Digital Camera
A digital camera (or digicam) is a camera that takes video or still photographs,
or both, digitally by recording images via an electronic image sensor. Most 21st
century cameras are digital.
Embedded Systems–Application, Design and Coding Methodology 229

Digital cameras can do things while film cameras cannot: displaying images on
a screen immediately after they are recorded, storing thousands of images on a
single small memory device, and deleting images to free storage space. The
majority, including most compact cameras, can record moving video with sound
as well as still photographs. Some can crop and stitch pictures and perform
other elementary image editing. Some have a GPS receiver built in, and can
produce Geotagged photographs.
Putting it all together:
• Captures images
• Stores images in digital format
– No film and Multiple images stored in camera
• Number depends on amount of memory and bits used per image
• Downloads images to PC
• Only recently possible
• Systems-on-a-chip with Multiple processors and memories on one IC
– High-capacity flash memory
• Very simple description used for example
– Many more features with real digital camera
• Variable size images, image deletion, digital stretching, zooming in and
out, etc.

11.2 DESIGNERS PERSPECTIVE


Two key tasks need to be kept in mind while designing a digital camera as
follows:
– Processing images and storing in memory
• When shutter pressed:
– Image captured
– Converted to digital form by Charge-Coupled Device (CCD)
– Compressed and archived in internal memory
– Uploading images to PC
• Digital camera attached to PC
• Special software commands camera to transmit archived images serially
230 Embedded Systems

Lens area

Covered columns
Electro-
mechanical
Pixel rows shutter

Electronic
circuitry

Pixel columns

Fig. 11.1: Working principle of a digital camera

When exposed to light, each cell becomes electrically charged. This charge can
then be converted to a 8-bit value where 0 represents no exposure while 255
represents very intense exposure of that cell to light. Some of the columns are
covered with a black strip of paint. The light intensity of these pixels is used for
zero-bias adjustments of all the cells.
The electromechanical shutter is activated to expose the cells to light for a brief
moment. The electronic circuitry, when commanded, discharges the cells,
activates the electromechanical shutter, and then reads the 8-bit charge value of
each cell. These values can be clocked out of the CCD by external logic through
a standard parallel bus interface.
• Manufacturing errors cause cells to measure slightly above or below actual
light intensity
• Error typically same across columns, but different across rows
• Some of left most columns blocked by black paint to detect zero-bias
error
– Reading of other than 0 in blocked cells is zero-bias error
– Each row is corrected by subtracting the average error found in blocked
cells for that row
Embedded Systems–Application, Design and Coding Methodology 231

Zero-bias
Covered cells adjustment

Before zero-bias adjustment After zero-bias adjustment

Fig. 11.2: Zero bias error adjustment

• Store more images


• Transmit image to PC in less time
• JPEG (Joint Photographic Experts Group)
– Popular standard format for representing digital images in a compressed
form
– Provides for a number of different modes of operation
– Mode used in this chapter provides high compression ratios using DCT
(Discrete Cosine Transform)
– Image data divided into blocks of 8 × 8 pixels
– 3 steps performed on each block
• DCT
• Quantization
• Huffman encoding
• DCT (Discrete Cosine Transform) original 8 × 8 block into a cosine-
frequency domain
– Upper-left corner values represent more of the essence of the image
– Lower-right corner values represent finer details
• Can reduce precision of these values and retain reasonable image quality
• FDCT (Forward DCT) formula
– C(h) = if (h == 0) then 1/sqrt(2) else 1.0
• Auxiliary function used in main function F(u,v)
– F(u,v) = ¼ x C(u) x C(v) Σx=0..7 Σy=0...7 Dxy x cos(π(2u + 1)u/16) x
cos(π(2y + 1)v/16)
• Gives encoded pixel at row u, column v
• Dxy is the original pixel value at row x, column y
• IDCT (Inverse DCT)
232 Embedded Systems

Reverses process to obtain original block (not needed for this design)
Consider the list of Embedded Systems that are being used every day.
• Achieve high compression ratio by reducing image quality
– Reduce bit precision of encoded data
• Fewer bits needed for encoding
• One way is to divide all values by a factor of 2
– Simple right shifts can do this
– Dequantization would reverse process for decompression
• Serialize 8 × 8 block of pixels
– Values are converted into single list using zigzag pattern

Fig. 11.3: Zig-zag scanning for compression methodology

• Perform Huffman encoding


– More frequently occurring pixels assigned short binary code
– Longer binary codes left for less frequently occurring pixels
• Each pixel in serial list converted to Huffman encoded values
– Much shorter list, thus compression
• Record starting address and image size
– Can use linked list
• One possible way to archive images
– If max number of images archived is N:
• Set aside memory for N addresses and N image size variables
• Keep a counter for location of next available address
• Initialize addresses and image size variables to 0
• Set global memory address to N x 4
– Assuming addresses, image size variables occupy N x 4 bytes
• First image archived starting at address N x 4
Embedded Systems–Application, Design and Coding Methodology 233

• Global memory address updated to N x 4 + (compressed image size)


• Memory requirement based on N, image size, and average compression
ratio
• When connected to PC and upload command received
– Read images from memory
– Transmit serially using UART
– While transmitting
• Reset pointers, image size variables and global memory pointer accordingly
System Requirements
– Nonfunctional requirements
• Constraints on design metrics (e.g., “should use 0.001 watt or less”)
– Functional requirements
• System’s behaviour (e.g., “output X should be input Y times 2”)
– Initial specification may be very general and come from marketing dept.
• E.g., short document detailing market need for a low-end digital camera
that:
– captures and stores at least 50 low-res images and uploads to PC,
– costs around $100 with single medium-size IC costing less than $25,
– has long as possible battery life,
– has expected sales volume of 200,000 if market entry < 6 months,
– 100,000 if between 6 and 12 months,
– insignificant sales beyond 12 months
– Design metrics of importance based on initial specification
– Performance: time required to process image
– Size: number of elementary logic gates (2-input NAND gate) in IC
– Power: measure of avg. electrical energy consumed while processing
– Energy: battery lifetime (power x time)
– Constrained metrics
– Values must be below (sometimes above) certain threshold
– Optimization metrics
– Improved as much as possible to improve product
– Metric can be both constrained and optimization
– Refine informal specification into one that can actually be executed
– Can use C/C++ code to describe each function
234 Embedded Systems

– Called system-level model, prototype, or simply model


– Also is first implementation
– Can provide insight into operations of system
– Profiling can find computationally intensive functions
– Can obtain sample output used to verify correctness of final
implementation

11.3 REQUIREMENTS SPECIFICATIONS


• Determine system’s architecture
– Processors
• Any combination of single purpose (custom or standard) or general purpose
processors
– Memories, buses
• Map functionality to that architecture
– Multiple functions on one processor
– One function on one or more processors
• Implementation
– A particular architecture and mapping
– Solution space is set of all implementations
• Low quality image has resolution of 64 x 64
• Mapping functions to a particular processor type not done at this stage
• Starting point
– Low-end general purpose processor connected to flash memory
• All functionality mapped to software running on processor
• Usually satisfies power, size, and time-to-market constraints
• If timing constraint not satisfied then later implementations could:
– use single purpose processors for time critical functions
– rewrite functional specification

11.4 IMPLEMENTATION OF THE PROPOSED SYSTEM


Different modules of a digital camera can be implemented both on software and
on hardware. Some of the modules are discussed indepth as follows:
CCD Module
• Simulates real CCD
• CcdInitialize is passed name of image file
Embedded Systems–Application, Design and Coding Methodology 235

• CcdCapture reads “image” from file


• CcdPopPixel outputs pixels one at a time
#include <stdio.h>
#define SZ_ROW 64
#define SZ_COL (64 + 2)
static FILE *imageFileHandle;
static char buffer[SZ_ROW][SZ_COL];
static unsigned rowIndex, colIndex;
void CcdInitialize(const char *imageFileName) {
imageFileHandle = fopen(imageFileName, “r”);
rowIndex = –1;
colIndex = –1;
}
void CcdCapture(void) {
int pixel;
rewind(imageFileHandle);
for(rowIndex=0; rowIndex<SZ_ROW; rowIndex++) {
for(colIndex=0; colIndex<SZ_COL; colIndex++) {
if( fscanf(imageFileHandle, “%i”, &pixel) == 1 ) {
buffer[rowIndex][colIndex] = (char)pixel;
}
}
}
rowIndex = 0;
colIndex = 0;
}
char CcdPopPixel(void) {
char pixel;
pixel = buffer[rowIndex][colIndex];
if( ++colIndex == SZ_COL ) {
colIndex = 0;
if( ++rowIndex == SZ_ROW ) {
colIndex = –1;
236 Embedded Systems

rowIndex = –1;
}
}
return pixel;
}
UART Module
• Actually a half UART
– Only transmits, does not receive
• UartInitialize is passed name of file to output to
• UartSend transmits (writes to output file) bytes at a time
#include <stdio.h>
static FILE *outputFileHandle;
void UartInitialize(const char *outputFileName) {
outputFileHandle = fopen(outputFileName, “w”);
}
void UartSend(char d) {
fprintf(outputFileHandle, “%i\n”, (int)d);
}
CODEC Module
• Models FDCT encoding
• ibuffer holds original 8 x 8 block
• obuffer holds encoded 8 x 8 block
• CodecPushPixel called 64 times to fill ibuffer with original block
• CodecDoFdct called once to transform 8 x 8 block
• CodecPopPixel called 64 times to retrieve encoded block from obuffer

static short ibuffer[8][8], obuffer[8][8], idx;


void CodecInitialize(void) { idx = 0; }
void CodecPushPixel(short p) {
if( idx == 64 ) idx = 0;
ibuffer[idx / 8][idx % 8] = p; idx++;
}
void CodecDoFdct(void) {
Embedded Systems–Application, Design and Coding Methodology 237

int x, y;
for(x=0; x<8; x++) {
for(y=0; y<8; y++)
obuffer[x][y] = FDCT(x, y, ibuffer);
}
idx = 0;
}
short CodecPopPixel(void) {
short p;
if( idx == 64 ) idx = 0;
p = obuffer[idx / 8][idx % 8]; idx++;
return p;
}
• Implementing FDCT formula
C(h) = if (h == 0) then 1/sqrt(2) else 1.0
F(u,v) = ¼ x C(u) x C(v) Σx=0...7 Σy=0...7 Dxy x
cos(π(2u + 1)u/16) x cos(π(2y + 1)v/16)
• Only 64 possible inputs to COS, so table can be used to save performance
time
– Floating point values multiplied by 32,678 and rounded to nearest integer
– 32,678 chosen in order to store each value in 2 bytes of memory
– Fixed point representation explained more later
• FDCT unrolls inner loop of summation, implements outer summation as
two consecutive for loops
MAIN Module
• Main initializes all modules, then uses CNTRL module to capture,
compress, and transmit one image
• This system-level model can be used for extensive experimentation
– Bugs much easier to correct here rather than in later models
int main(int argc, char *argv[]) {
char *uartOutputFileName = argc > 1 ? argv[1] : “uart_out.txt”;
char *imageFileName = argc > 2 ? argv[2] : “image.txt”;
/* initialize the modules */
UartInitialize(uartOutputFileName);
238 Embedded Systems

CcdInitialize(imageFileName);
CcdppInitialize();
CodecInitialize();
CntrlInitialize();
/* simulate functionality */
CntrlCaptureImage();
CntrlCompressImage();
CntrlSendImage();
}
• Low-end processor could be Intel 8051 microcontroller
• Total IC cost including NRE about $5
• Well below 200 mW power
• Time-to-market about 3 months
• However, one image per second not possible
– 12 MHz, 12 cycles per instruction
• Executes one million instructions per second
– CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations
• ~100 assembly instructions each iteration
• 409,600 (4096 × 100) instructions per image
• Half of budget for reading image alone
– Would be over budget after adding compute-intensive DCT and Huffman
encoding

EEPROM 8051 RAM

SOC UART CCDPP

Fig. 11.4: Microcontroller and CCD representation on board

• Entire SOC tested on VHDL simulator


– Interprets VHDL descriptions and functionally simulates execution of
system
Embedded Systems–Application, Design and Coding Methodology 239

• Recall program code translated to VHDL description of ROM


– Tests for correct functionality
– Measures clock cycles to process one image (performance)
• Gate-level description obtained through synthesis
– Synthesis tool like compiler for SPPs
– Simulate gate-level models to obtain data for power analysis
• Number of times gates switch from 1 to 0 or 0 to 1
– Count number of gates for chip area

POINTS TO REMEMBER
1. Testing timing constraints is as important as testing functional behaviour
for an Embedded System.
2. Embedded Systems are in every “intelligent” device that is infiltrating
our daily lives: the cell phone in your pocket, and the entire wireless
infrastructure behind it; the Palm Pilot on your desk; the Internet router
your e-mails are channeled through; your big screen home theater
system; the air traffic control station as well as the delayed aircraft it is
monitoring! Software now makes upto 90 percent of the value of these
devices.
3. The computer you are using to read this page uses a microprocessor
to do its work. The microprocessor is the heart of any normal computer,
whether it is a desktop machine, a server or a laptop.

Review Questions
1. What is the difference between a digital camera and a mobile camera?
2. How does a digital camera differ from conventional cameras?
3. Define CPU Speed. Why are there limits on CPU speed?
4. What are the different compression methodologies available while
manufacturing a digital camera?
5. What is meant by pixel resolution? What are the leading companies that
manufacture digital camera today?
6. Which company made the world’s first OLED digital photo frame?
7. What is meant by colour filtering?

11.5 QUIZ
1. Digital images are made of tiny dots called:
(a) Cells (b) Electrolytes (c) Blotch (d) Pixels
240 Embedded Systems

2. Which two companies sold the first consumer-oriented digital cameras?


(a) Apple and Kodak
(b) Panasonic and Sony
(c) IBM and Canon
(d) LG and Samsung
3. What kind of image sensor do most digital cameras use?
(a) Complementary Metal-oxide Semiconductors (CMOS)
(b) Computerized Imaging Detectors (CID)
(c) Charged Couple Devices (CCD)
(d) Battery Operated Device
4. What is the name of the device that changes a pixel’s value into a digital
value in a CCD digital camera?
(a) Digitalization manager
(b) Analog-to-digital converter
(c) Pixelator
5. What’s the most common pattern of colour filters found in digital cameras?
(a) Bothan pattern
(b) Bayer filter pattern
(c) Filtered cell pattern
6. Just like traditional film cameras, the small opening that allows light to
pass through the lens of a digital camera is called:
(a) The focal point
(b) The aperture
7. What is the distance between the camera’s lens and its image sensor
called?
(a) Beta Length
(b) Focal length
Answers for Quiz
1. Pixels
2. Apple and Kodak
3. Charged Couple Devices (CCD)
4. Analog-to-digital converter
5. Bayer filter pattern
6. The focal point
7. Focal Length

You might also like