Embedded Systems (PDFDrive)
Embedded Systems (PDFDrive)
SYSTEMS
EMBEDDED
SYSTEMS
(SECOND EDITION)
D P KOTHARI
Ex-Visiting Professor
Royal Melbourne Institute of Technology
Melbourne, AUSTRALIA
SHRIRAM K VASUDEVAN
Technical Manager - Learning and Development
Amrita University, Coimbatore, INDIA
SUNDARAM R M D
Technical Leader, Wipro Technologies, INDIA
MURALI N
Lecturer, Nizwa College of Technology, OMAN
All rights reserved. No part of this book may be reproduced in any form, by photostat, microfilm,
xerography, or any other means, or incorporated into any information retrieval system, electronic or
mechanical, without the written permission of the copyright owner.
Every effort has been made to make the book error free. However, the author and publisher have no
warranty of any kind, expressed or implied, with regard to the documentation contained in this book.
Dedications
Kothari, D.P.—To son-in-laws Pankaj and Rahul
Shriram K Vasudevan—To Parents and Sister
Sundaram R M D—To Mom and Dad
Murali N.—To Friends and Parents
Preface
We wish to thank all the good hearts who have helped us in this project. In
particular, we wish to thank Subashri V, Sriram Karthik, Sivaraman R, and
Sunandhini M for their immense help and support in bringing the book to a good
shape.
D.P. Kothari
Shriram K Vasudevan
Sundaram R M D
Murali N.
Contents
Preface vii
1. Embedded Systems—An Introduction 1—10
1.1 Basic Idea on System 1
1.2 Embedded Systems—Definitions 1
1.3 Characteristics of Embedded Systems—An Overview
with Examples 2
1.4 Challenges in Designing an Embedded System 6
1.5 Categorization of Embedded Systems 7
1.6 Examples of Embedded Systems 8
1.7 Quiz 9
Learning Outcomes
R Basic Idea on System
R Definition of Embedded Systems
R Characteristics of Embedded Systems
R Challenges in Designing an Embedded System
R Categorization of Embedded Systems
R “Examples of Embedded Systems”
R Recap
R Quiz
Embedded Systems are available everywhere in this modern world. This chapter
will touch on all basic aspects of understanding an Embedded System.
• An embedded system is the one that has computer hardware with software
embedded in it as one of its most important components.
• It is a device that includes a programmable computer but is not itself
intended to be a general purpose computer.
Embedded System can be well defined by taking couple of classical examples.
First, an air conditioner is taken for understanding. What does an air
conditioner do? The temperature is set as per requirement, say 20°C.
There may be variations in external temperature and that will also reflect
in the room air conditioner is fitted. But however the external temperature
varies, the AC machine facilitates user with cool atmosphere (i.e., 20 °C)
inside the room as per requirement. What is the action taken?
Consider a second example of the pace maker. Its work is to trigger the
heart beat if at all heart is getting into trouble. How is this done?
Answers for both the questions are the same.
When looking into the definition of an Embedded System, one can get
answer for above quoted cases.
• An electronic controller built into the application, continuously monitors
the process variables and ensures that the Process Variable (PV) does
not change; in the event of a change the controller generates a counteracting
signal applied to the application so that the deviated PV is brought to its
normal operating value. This could define embedded systems clearly!
So here it is made very clear. In air conditioner, temperature is the process
variable. A controller inside will keep on monitoring the process variable. If at all
the room temperature changes due to variation in external temperature, controller
will take a counter acting signal and PV (temperature) will be brought to required
range.
Second case, controller inside a pace maker will keep monitoring the heart
beat count. If it is getting low, immediately a counter acting action will be taken
and it will boost up the heart.
Food for brain! Is laptop an embedded system?—This question will be
answered shortly!
• Tightly Constraint
• Real Time and Reactive
• Complex Algorithms
• User Interface
• Multi Rate
Each of the above characteristics are discussed below in detail.
1. Single Functioned
An Embedded System can execute a specific function repeatedly i.e., dedicated
function. As an example, Air conditioner will be cooling the room. Cooling is its
dedicated functionality and it cannot be used for any other purposes. AC can’t
be used for making calls. Likewise mobile phone is an Embedded System that
can be used to make and receive calls and it can’t be used for controlling room
temperature.
Consider the list of embedded systems that are being used every day.
1. Pager
2. Microwave oven
3. Mobile phone
4. ATMs
5. Car braking systems
6. Automobile cruise controllers
7. Pace makers
8. Modem
9. Network cards and many more
2. Tightly Constraint
Whatever system is being designed, they have constraints. Embedded Systems
are also tightly constraint in many aspects. Few aspects are being analyzed
here.
1. Manufacturing Cost
2. Performance
3. Size
4. Power
The above four parameters decide the success of Embedded System.
Consider buying a mobile phone as an example. If the mobile phone costs
in lakhs, will it be bought? (Instead Landline phone would be preferred). Next
scenario, the mobile phone that is bought, if it takes 1/2 an hour for making a call
and if it also hangs frequently, will it be opted? (No Way!). Third point if the
phone is weighing 3 kgs, will it be preferred? Finally coming to power criteria.
All Embedded Systems are almost battery operated. And it is mobile as well! So
it should be capable of retaining the charge for some reasonable amount of
time. Else the battery will drain faster and one has to keep charger handy all the
time. So it is very important to have this constraint in mind when designing an
embedded system.
4. Complex Algorithms
The processor inside the embedded system should perform operations that are
complex in nature. An example is digital camera. It is used to take colour
photographs, motion pictures, black and white pictures, etc. It needs to pull in
lots of complex algorithms for performing all the above mentioned operations.
So as a point to conclude, every Embedded System will have lots of complex
algorithms running inside it.
5. User Interface
Here too with an example the concept can be explained. NOKIA mobile phones
are very big hit in market right, Why? What is the reason? Is that because other
mobile did not perform well? No, is the answer. Nokia had excellent and simple
user interface. Calls can be made and received very easily. Typing SMS is also
easier… So it has been read by the people very well.
So designing system with easier and comfortable interface is most
important. Also it should have options required for the operation of the device.
Example is ATM machine; it has got comfortable interfaces and options. Keep
it in mind and design the system.
6. Multi Rate
Embedded Systems need to control and drive certain operations at one rate and
certain other operations at different rate. Example can be Digital Camera. It is
used to take pictures which are still. Also it is capable of shooting video. So it
has to be capable of driving the first operation from a speed different than the
second one.
A Small Recap: Please do not forget the definition of Real Time; down the line it
will be needed.
6 Embedded Systems
Meeting Deadlines
Hardware
Hardware Selection
Selection Embedded System Upgradability
Upgradability
Design Challenges
Will
Willititwork?
work?
1. Meeting Deadlines
How can the deadline be met that is meant for the product? Meeting deadline
accurately will need high speed hardware. Increasing hardware components
with quality would increase the cost of the product. This is the first challenge in
front of designers.
2. Hardware Selection
Embedded Systems never had a luxury of having much hardware. Taking memory
into consideration first, Embedded Systems will have very little inbuilt memory.
Adding more memory of smaller size will increase cost factor. So keep memory
only as much as needed. It can have an expansion slot for the system, if user is
willing to expand memory, who bothers, let user expand.
Coming to processor selection, if a very high speed processor is selected, it
would end up in draining the battery at the earliest. But it can’t be compromised
with speed also. So select a processor that perfectly fits in with requirement.
Too high speed processor would cost more and can drain battery also.
3. Is it upgradable and maintainable?
Assume that a mobile phone has been designed and it is released in the market.
But after reaching to the people, the product was found with problems in one or
Embedded Systems—An Introduction 7
two aspects. The developer would know that problem and it can be fixed. But
how will it reach the phone that had already reached to the public?
So it must be supporting with upgradation of versions of software for it.
Keep this in mind that the product should be upgradable with the same hardware!
Secondly, when develop software for embedded systems, it should be kept
in mind on maintainability. The code should not be just written in such a way that
only developer who developed it can understand. It should be understandable
for other engineers also. Other engineers should also be able to understand and
fix bugs in the code if any, if need be.
4. Will it work?
Nice Question. Isn’t it? Yeah. Please ensure if the system that has been designed
is really working fine. How can it be ensured? Through rigorous testing it is
possible; it needs to be proceeded with testing in many ways. First can be Unit
Testing, next stage is Sanity Testing and the third stage can be Regression testing.
Also even if the product has entered, it has to be constantly monitored. If any
customer complaint rises, that bug has to be looked into and has also to be fixed.
And more importantly, the bug that is fixed should not introduce any new bugs.
Let’s now get to know about the categorization of the embedded systems!
1.5 CATEGORIZATION OF EMBEDDED SYSTEMS
Embedded Systems can be categorized based on the complexity in building, cost
factors, purpose of the system, tools and other related environment availability,
etc. Keeping these points, Table 1.1 has been framed and it has dealt with
categories. Broadly one can classify embedded system into any of these,
1. Small Scale Embedded Systems,
2. Medium Scale Embedded Systems, and
3. Sophisticated Embedded Systems.
Table 1.1: Categorization of Embedded Systems
Contd...
8 Embedded Systems
5. Video games, all digital gaming machines, I pod, MP3/MP4 players, Xbox
and what not!
POINTS TO REMEMBER
1. An Embedded System is single functioned and AC can be remembered
as a simple example.
2. Embedded Systems are Real time systems which is reactive in nature.
3. Many design challenges are associated with making an embedded
system, including cost, power etc.
4. Embedded Systems are classified into three major divisions—Small scale,
medium scale and large scale Embedded Systems.
Review Questions
1. What is an Embedded System? Give an example.
2. Embedded Systems are quoted as single functioned systems. Justification
is required in this case.
3. Define Real time.
4. Throw an example for real time and reactive Embedded System.
5. What are the major categories of the Embedded Systems? Give an example
for each division.
6. Is LCD projector an Embedded System? Please justify.
1.7 QUIZ
1. Pick odd one out! (Embedded is the clue)
(a) Laptop (b) Projector
(c) Mobile phone (d) MP3 player
2. Some of the important characteristics expected in consumer electronics
products are ...............
(a) Recovering from failures
(b) Low cost
(c) Performance
(d) Low unit cost, low power consumption and smaller product size
3. One of the most important characteristics of Embedded Systems for
automotive industry (with respect to the inmates of the vehicle) is
................
(a) Recovering from failures
10 Embedded Systems
Learning Outcomes
R Understanding of Microprocessor and Microcontroller
R Functional Building Blocks of Embedded Systems
R Processor and Controller
R Memory, Ports and Communication Devices
R CISC vs. RISC Processors
R General Purpose Processor and DSP Processor
R Direct Memory Access—Indepth Analysis
R Cache Memory and its types
R Co-design of Hardware and Software
R System on Chip
R Tools for Embedded Systems
R Recap
R Quiz
Switch
Controller
JTAG
PCMCIA
MCU
card
Boot Flash
Extension
An Embedded System has a different set of issues to deal with than does desktop
software. This includes:
1. Handle situations that don’t arise with desktop software like doing several
things at once, responding to external events (button presses, sensor
readings, etc.).
2. Cope with all unusual conditions without human intervention including
Meet strict processing deadlines and never fail.
3. Able to log issues when things go out of control which helps in debugging
the issue later.
Embedded Systems were initially hardwired systems and then were
Microprocessor-based, then moved on to Microcontroller-based and
Specialized (application) processors and finally we have reached System
on a Chip (SOC).
Components of Embedded Systems 13
Address bus
Data bus
Digital Digital
Outputs Inputs
Peripherals = ports, clock, timers, UART, ADC converters, LCD drivers, DAC
and others.
Memory = EEPROM, SRAM, Flash, etc.
About 55% of all CPUs sold in the world are 8-bit microcontrollers and
microprocessors.
A typical home in a developed country is likely to have only four general purpose
microprocessors but around three dozen microcontrollers. A typical mid range
automobile has as many as 30 or more microcontrollers. They can also be found
in many electrical devices such as washing machines, microwave ovens, and
telephones.
2.4.1 Memory
Many types of memory devices are available for use in modern computer systems.
The RAM family includes two important memory devices: static RAM
(SRAM) and dynamic RAM (DRAM) as shown in Fig. 2.4. The primary
difference between them is the lifetime of the data they store. SRAM retains its
contents as long as electrical power is applied to the chip. If the power is turned
off or lost temporarily, its contents will be lost forever. DRAM, on the other
hand, has an extremely short data lifetime, typically about four milliseconds.
This is true even when power is applied constantly.
Memories in the ROM family are distinguished by the methods used to
write new data to them, and the number of times they can be rewritten. This
classification reflects the evolution of ROM devices from hardwired to
programmable to erasable-and-programmable. A common feature of all these
devices is their ability to retain data and programs forever, even during a power
failure.
In practice, almost all computers use a variety of memory types, organized
in a storage hierarchy around the CPU, as a trade-off between performance
and cost. Generally, the lower a storage is in the hierarchy, the lesser its bandwidth
and the greater its access latency is from the CPU. This traditional division of
storage to primary, secondary, tertiary and off-line storage is also guided by cost
per bit.
As memory technology has matured in recent years, the line between RAM
and ROM has blurred. Now, several types of memory combined features of
both. These devices do not belong to either group and can be collectively referred
to as hybrid memory devices. Hybrid memories can be read and written as
desired, like RAM, but maintain their contents without electrical power, just like
ROM. Two of the hybrid devices, EEPROM and flash, are descendants of
ROM devices. These are typically used to store code. The third hybrid, NVRAM,
is a modified version of SRAM. NVRAM usually holds persistent data.
16 Embedded Systems
2.4.2 Ports
Data can be sent either serially, one bit after another through a single wire, or in
parallel, multiple bits at a time, through several parallel wires. Most famously,
these different paradigms are visible in the form of the common PC ports “serial
port” and “parallel port”.
Early parallel transmission schemes often were much faster than serial
schemes but at added cost and complexity of hardware. Serial data transmission
is much more common in new communication protocols due to a reduction in
the I/O pin count, hence a reduction in cost. Common serial protocols include
SPI, and I2C. Surprisingly, serial transmission methods can transmit at much
higher clock rates per bit transmitted, thus tending to outweigh the primary
advantage of parallel transmission.
Parallel transmission protocols are now mainly reserved for applications
like a CPU bus or between IC devices that are physically very close to each
other, usually measured in just a few centimetres. Serial protocols are used for
longer distance communication systems, ranging from shared external devices
like a digital camera to global networks or even interplanetary communication
for space probes; however some recent CPU bus architectures are even using
serial methodologies as well.
higher speed. The features like simplicity and flexibility make this bus attractive
for consumer and automotive electronics.
→CISC Migration
RISC→
(a) Determined by VLSI technology.
(b) Software cost goes up constantly. To be convenient for programmers.
(c) To shorten the semantic gap between HLL and architecture without
advanced compilers.
(d) To reduce the program length because memory was expensive.
(e) VAX 11/780 reached the climax with > 300 instructions and >20 addressing
modes.
18 Embedded Systems
→RISC Migration
CISC→
(a) Things changed: HLL, Advanced Compiler, Memory size, etc.
(b) Finding: 25% instructions used in 95% time.
(c) Size: usually <100 instructions and <5 addressing modes.
(d) Other properties: fixed instruction format, register based, hardware control,
etc.
(e) Gains: CPI is smaller, Clock cycle shorter, Hardware simpler, Pipeline
easier.
(f) Cheaper: Programmability becomes poor, but people use HLL instead of
IS.
(e) DSP processors are known for their irregular instruction sets, which
generally allow several operations to be encoded in a single instruction.
For example, a processor that uses 32-bit instructions may encode two
additions, two multiplications, and four 16-bit data moves into a single
instruction. In general, DSP processor instruction sets allow a data move
to be performed in parallel with an arithmetic operation. GPPs/MCUs, in
contrast, usually specify a single operation per instruction.
While the above differences traditionally distinguish DSPs from GPPs/MCUs,
in practice it is not important what kind of processor you choose. What is really
important is to choose the processor that is best suited for your application; if a
GPP/MCU is better suited for your DSP application than a DSP processor, the
processor of choice is the GPP/MCU. It is also worth noting that the difference
between DSPs and GPPs/MCUs is fading: many GPPs/MCUs now include
DSP features, and DSPs are increasingly adding microcontroller features.
Address Bus
BR BG
Bus I/O
Grand DMA Data
I/O
Bus Controller I/O Device
Request Control
Modes vary by how the DMA controller determines when to transfer data,
but the actual data transfer process is the same for all the modes.
DMA Modes are:
(a) BURST Mode
(b) Cycle Stealing Mode
(c) Transparent Mode
Choice of mode is based on the application and software. Different applications
require different type of modes to be supported. Let’s take a look at the different
modes in general.
BURST Mode
1. Sometimes called Block Transfer Mode.
2. An entire block of data is transferred in one contiguous sequence. Once
the DMA controller is granted access to the system buses by the CPU, it
transfers all bytes of data in the data block before releasing control of the
system buses back to the CPU.
3. This mode is useful for loading programs or data files into memory, but it
does render the CPU inactive for relatively long periods of time.
CYCLE STEALING Mode
1. Viable alternative for systems in which the CPU should not be disabled
for the length of time needed for Burst transfer modes.
2. DMA controller obtains access to the system buses as in burst mode,
using BR and BG signals. However, it transfers one byte of data and then
de-asserts BR, returning control of the system buses to the CPU. It
continually issues requests via BR, transferring one byte of data per request,
until it has transferred its entire block of data.
3. By continually obtaining and releasing control of the system buses, the
DMA controller essentially interleaves instruction and data transfers. The
CPU processes an instruction, then the DMA controller transfers a data
value, and so on.
4. The data block is not transferred as quickly as in burst mode, but the CPU
is not idled for as long as in that mode.
5. Useful for controllers monitoring data in real time.
TRANSPARENT Mode
1. This requires the most time to transfer a block of data, yet it is also the
most efficient in terms of overall system performance.
2. The DMA controller only transfers data when the CPU is performing
operations that do not use the system buses. For example, the relatively
22 Embedded Systems
simple CPU has several states that move or process data solely within
the CPU:
NOP1: (No operation)
LDAC5: AC ← DR
JUMP3: PC ← DR,TR
CLAC1: AC ← 0, Z ←1
3. Primary advantage is that CPU never stops executing its programs and
DMA transfer is free in terms of time.
4. Disadvantage is that the hardware needed to determine when the CPU is
not using the system buses can be quite complex and relatively expensive.
Previous and current generation IC’s are finding their way into new designs
as embedded cores in a mix-and-match fashion. This requires greater
convergence of methodologies for co-design and co-verification and high
demands on system-on-a-chip-density. That’s why this concept was an elusion
for many years, until recently. In the future the need for tools to estimate the
impact of design changes earlier in the design process will increase.
To create a system-level design, the following steps should be taken:
1. Specification capture: Decomposing functionality into pieces by creating
a conceptual model of the system. The result is a functional specification,
which lacks any implementation detail.
2. Exploration: Exploration of design alternatives and estimating their quality
to find the best suitable one.
3. Specification: The specification as noted in 1 is now refined into a new
description reflecting the decisions made during exploration as noted in 2.
4. Software and hardware: For each of the components an implementation
is created, using software and hardware design techniques.
5. Physical design: Manufacturing data is generated for each component.
When successfully run over the steps above, Embedded System design
methodology from product conceptualization to manufacturing is roughly defined.
This hierarchical modeling methodology enables high productivity, preserving
consistency through all levels and thus avoiding unnecessary iteration, which
makes the process more efficient and faster.
Compilers
A compiler is a computer program (or set of programs) that transforms source
code written in a programming language (the source language) into another
computer language (the target language, often having a binary form known as
object code). The most common reason for wanting to transform source code
is to create an executable program.
Linker
In computer science, a linker or link editor is a program that takes one or more
objects generated by a compiler and combines them into a single executable
program as shown in Fig. 2.6. In IBM mainframe environments such as OS/360
this program is known as a linkage editor.
linker
On Unix variants, the term loader is often used as a synonym for linker. Other
terminology was in use, too. For example, on SINTRAN III, the process
performed by a linker (assembling object files into a program) was called loading
(as in loading executable code onto a file). Because this usage blurs the distinction
between the compile-time process and the run-time process, this article will use
linking for the former and loading for the latter. However, in some operating
systems the same program handles both the jobs of linking and loading a program;
see dynamic linking.
Simulators and Emulators
A simulator is a software that duplicates some processor in almost all the possible
ways. Emulator is a piece of computer hardware or software used with one
device to enable it to emulate another i.e., an emulator is a hardware which
duplicates the features and functions of a real system, so that it can behave like
the actual system.
Usually the emulators and simulators are used for the testing of new
architectures and also to give training in some complex systems. A most famous
example for a simulator is the flight simulator that simulates the functionalities
of an aircraft.
Components of Embedded Systems 27
POINTS TO REMEMBER
1. During the 1960s, computer processors were often constructed out of
small and medium-scale ICs containing from tens to a few hundred
transistors. The integration of a whole CPU onto a single chip greatly
reduced the cost of processing power.
2. Microcontrollers are “embedded” inside some other device (often a
consumer product) so that they can control the features or actions of
the product. Another name for a microcontroller, therefore, is “embedded
controller”.
3. The difference between RISC and CISC chips is getting smaller and
smaller. What counts is how fast a chip can execute the instructions it is
given and how well it runs existing software.
4. A DMA transfer copies a block of memory from one device to another.
While the CPU initiates the transfer by issuing a DMA command, it
does not execute it. For so-called “third party” DMA, as is normally
used with the ISA bus, the transfer is performed by a DMA controller
which is typically part of the motherboard chipset.
5. SOC designs usually consume less power and have a lower cost and
higher reliability than the multi-chip systems that they replace. And with
fewer packages in the system, assembly costs are reduced as well.
6. Generally, high-level programming languages, such as Java, make
debugging easier, because they have features such as exception handling
that make real sources of erratic behaviour easier to spot. In programming
languages such as C or assembly, bugs may cause silent problems such
as memory corruption, and it is often difficult to see where the initial
problem happened. In those cases, memory debugger tools may be
needed.
2.12 QUIZ
1. Hardware is tangible, but software is intangible – right or wrong?
2. Which type of memory is most closely connected to the processor?
(a) Main memory
(b) Secondary memory
(c) Disk memory
30 Embedded Systems
3. How is it possible that both programs and data can be stored on the same
floppy disk?
(a) A floppy disk has two sides, one for data and one for programs.
(b) Programs and data are both software, and both can be stored on any
memory device.
(c) A floppy disk has to be formatted for one or for the other.
4. Which one do you prefer while developing an application – CISC or RISC?
– Justify your choice.
(a) CISC
(b) RISC
5. What are the two general types of programs?
(a) Entertainment and Productivity
(b) Microsoft and IBM
(c) System software and Application software
6. Why does not a processor contain a cache of huge size?
(a) Costly
(b) Spacious
(c) Not available in the market
(d) No use
7. What tool is used for debugging the software on a hardware?
(a) JTAG
(b) BTAG
Learning Outcomes
R Software Life Cycle
R Embedded Life Cycle
• Waterfall model
• Spiral model
• Consecutive refinement model
• Rapid Application Development (RAD) Model
R Modeling of Embedded Systems
• UML (Unified Modeling Language)
• FSM (Finite State Machine) and
• Petri net modeling
R Simulation and Emulation
R Recap
R Quiz
Testing
Life cycle starts with requirement collection. First the designer has to be
clear on what exactly the product is meant for. Requirements have to be collected
with paramount care as missing out one of the requirements would nuisance the
final product. If the product is designed for customer usage, it would be better to
analyze the same sort of products that are already in market, so that the cons of
the existing product can be addressed clearly in the new product. A better start
will lead to a better finish as always.
Coming to the second phase, a product might have to execute so many
operations. For an instance, assume cellular phone. It should be proficient of
making and receiving calls, it would be good if it has a FM radio, Digital camera
and an MP3 player. These are all different ratio and require different expertise
to develop these features. It would be wise if the requirements are now broken
into small modules and assigned to different teams for working on them. Here
divide and conquer policy is being followed and the work is being assigned to the
right people with right expertise. This would also be easier for the managers to
manage the teams. And most importantly as many people work on the product,
total duration required to manufacture would be reduced in a big way.
Design Methodologies, Life Cycle and Modeling of Embedded Systems 33
The third and most important action is coding. Developers should write the
code for the functionalities that they are assigned with. Language can be (C,
CPP or JAVA etc.,) chosen based on the application. Coding has to be written in
a modular way. Also code should have enough comments and there should be
enough scope for upgradation of the code. And it would be better if needless
inclusion of header files is avoided.
Integration comes next. Here all the small modules (Assuming mobile phone
again, modules could be Radio, call handling, messaging, music player etc.)
have to be integrated to get a final end product. During integration great care
should be taken for ensuring that all the functionalities are integrated properly in
such a way that no functionality is getting affected. After integration testing has
to be done at various levels.
The final phase of cycle is testing. The product has to be tested for the
functionality. Initially, developer will do some minimal level of testing. It is referred
as unit testing. But there would be specific testers who can test the product
better at various levels. Software and hardware have to be tested for stability.
There should be set of test cases that have to be run to ensure the working of
the product. And more importantly even after the product has been released it
has to be constantly monitored for the performance. There might be some
complaints that would be raised by consumers. Until the product gets stabilized
in the market, testing has to be done constantly.
phase is being carried out, after completion of the phase in this model there is no
scope for checking with previous couple of phases if the target has been achieved
accurately.
But considering the lines that are drawn in backward direction the above
problem can be addressed easily. After the completion of phase 2, one should go
back and check with phase one if the requirements that have been quoted at
phase 1 are met in phase 2. So here even if one of the requirements is missed
out it could be speckled and corrections can be done immediately instead of
checking all the things at the end which would require lots of human efforts and
money.
Requirement
collection and analysis
Design
(Divide and Conquer approach)
Coding (Implementation
of design)
Integration
(Modules integration)
Testing
Testing &
Requirement
Analysis
Here in this model, each and every phase will have to undergo testing and
checking with requirement after the completion of that phase. So it will take lots
of time for each phase to be completed. And spiral gets larger and larger as it
moves on (It is an indication of more time being spent in that phase). It is
because testing will take more time at each phase. After the initial system is
made, designer and testing team should focus more on functionality of the system
and they must ensure if it is working fine and if they are meeting the requirement.
If the testers find some problems, they have to be fixed and again fully fledged
testing has to be carried out. After confirmation the final product can be produced
and could be released to the market.
Advantages
1. Highly realistic and chances for errors are less.
Disadvantages
1. Though it is a realistic model, it takes a lot of time which may affect the
product to reach the market on time.
3. Consecutive Refinement Model
This model is represented in Fig. 3.4.
36 Embedded Systems
Design (Divide and Conquer approach) Design (Divide and Conquer approach)
Testing Testing
Design
The down side to RAD is the propensity of the end user to force scope
creep into the development effort. Since it seems so easy for the developer to
produce the basic screen, it must be just as easy to add a widget or two. In most
RAD life cycle failures, the end users and developers were caught in an unending
cycle of enhancements, with the users asking for more and more and the
developers trying to satisfy them. The participants lost sight of the goal of
producing a basic, useful system in favour of the siren song of glittering perfection.
The advantages of this model include—it minimizes feature creep by
developing in short intervals resulting in miniature software projects and releasing
the product in mini-increments. The disadvantage is short iteration which may
not add enough functionality, leading to significant delays in final iterations. Since
Agile emphasizes real time communication (preferably face-to-face), utilizing it
is problematic for large multi-team distributed system development. Agile methods
38 Embedded Systems
Language. Why is it Unified? The reason is its applicability to many designs and
processes.
The Unified Modeling Language (UML) is used to specify, visualize, modify,
construct and document the artifacts of an object-oriented software intensive
system under development. The standard is managed and maintained by Object
Management Group (OMG). One can understand the complete history of UML
and its success stories by going through the site www.uml.org. Here reader can
comprehend the primitive and basic elements of UML. The basic rudiments of
UML can be listed as follows:
A. Class diagram
B. Object diagram
C. Package diagram
D. Stereotype diagram
E. State diagram and
F. Deployment diagram
Every diagram mentioned in the above list will be discussed in detail below.
A. Class diagram
Class diagram describes the structure of a system by showing the system’s
classes, their attributes, and the way the classes are related to each other. A
simple class diagram is represented in the following Fig. 3.7. From this one can
visualize how a class diagram would be represented. A class diagram would
have a class name, attributes and behaviours of a system.
Object name
C. Package diagram
Package describes how a system is split up into logical subsystems by showing
the dependencies among these groupings. Its representation is given in Fig.
3.10.
PACKAGE A
CLASS 1
CLASS 2
CLASS 3
D. Stereotype diagram
It is the collection of elements that can be frequently used/invoked. For an instance,
the timer/counter example would be good. It can be related to the functions in C
programming. Diagrammatically the same is represented in Fig. 3.11.
TIMER/COUNTER
Pause
Un Pause
F. Deployment diagram
It will clearly depict the hardware used in system implementations and the
execution environments and artifacts deployed on the hardware of the system.
The schematic diagram is shown in Fig. 3.13 below.
Browser
User
User
Presentation layer
MySQL (Web interface)
database
Current state
↓
Door
Door should
should bebe kept
keptopen
openforfor15
15seconds.
seconds.There
therewill
will
be a timer
a timer for checking
for checking thisthis functionality
functionality
Button press = Current floor
State 0 / idle Open Door
From the idle state after receiving the input from user, the controller designed
for the elevator will compare the input with current floor number. If greater, it is
overt that the elevator has to move up. And great care should be taken that
door should not open when the lift is in motion. So disabling the door open
function during motion is much appreciable. Next point as in figure, after reaching
the floor the door has to be kept open for some time (15 seconds) so that user
can step out and will keep it open until a new request comes in. And in the
meantime if at all there is an input from different user for bringing the elevator
to ground floor, it will follow the same mechanism as discussed above. Most
importantly do not change the direction of lift when there is no request for
moving up or down! Now reader can take a close look at elevator FSM
representation. It will be easier to understand the diagram.
The different applications of Finite State Machine in hardware and software
are discussed in lengths below.
Hardware Applications
In a digital circuit, an FSM may be built using a programmable logic device, a
programmable logic controller, logic gates and flipflops or relays. More
specifically, a hardware implementation requires a register to store state variables,
a block of combinational logic which determines the state transition, and a second
block of combinational logic that determines the output of an FSM. One of the
classic hardware implementations is the Richard’s controller.
Mealy and Moore machines produce logic with asynchronous output,
because there is a propagation delay between the flipflop and output. This causes
slower operating frequencies in FSM. A Mealy or Moore machine can be
convertible to a FSM which is given output directly from a flipflop, which makes
the FSM run at higher frequencies. This kind of FSM is sometimes called
Medvedev FSM. A counter is the simplest form of this kind of FSM.
Software Applications
The following concepts are commonly used to build software applications with
finite state machines:
• Automata-based programming
• Event driven FSM
• Virtual FSM (VFSM)
A question may now arise in the reader’s mind that which methodology is
to be preferred for modeling. It is purely based on one’s comfort and expertise
with the method.
Design Methodologies, Life Cycle and Modeling of Embedded Systems 45
p t p
From the above figure, places are always represented with circles;
transitions are represented by bars, arc by arrows and tokens by dots.
Properties of Petri nets
Before using Petri net for modeling, it is very much necessary to know the
properties of it.
(a) Sequential Execution
One simple example can be taken to make this property understood. In Alphabets,
letter C will come only after A and B are encountered. Same is the case here.
Transition t2 can fire only after t1 has completed its firing. Order of precedence
is very vital and has become inevitable here. Figure 3.17 reveals the same here.
p1 t1 p2 t2 p3
(b) Synchronization
This property is simple where transition t1 will be enabled only when at least one
token is there at each of its input places.
(c) Merging
When there are several tokens arriving from several places for service at same
transition, merging happens. Figure 3.18 is showing the same.
46 Embedded Systems
t1
(d) Concurrency
This is one of the most important properties of Petri net which can be very
easily understood from the following snap shot Fig. 3.19.
t1
t2
Here in this case shown, t1 and t2 are concurrent. This makes Petri net to be
used to model systems of distributed control with multiple processes executing
concurrently in time.
(e) Conflict
Again with a diagrammatic representation it would become easier to elaborate
on this property. From Fig. 3.20, it can be clearly seen that t1 and t2 both are
ready to fire, but the firing of any leads to disabling the other transition. This is
like a deadlock situation which is referred as conflict.
Design Methodologies, Life Cycle and Modeling of Embedded Systems 47
t1
t2
t1
t2
Since the properties are well known now, reader can be exposed to a real time
implementation situation. A chocolate vending machine can be taken as an
example here in Fig. 3.21.
User wishes to get a 15C chocolate from the machine. He deposits 5C + 5C + 5C
which is represented in the following figures.
Take15c
Take 15Cbar
bar Deposit
Deposit 10C
10c
5C
5c 15C
15c
Deposit
Deposit5C5c
Deposit
Deposit5C5c Deposit 5c
Deposit 5C
0C
0c Deposit5c
Deposit 5C
Deposit
Deposit10C
10c
10C
10c 20C
20c
Deposit
Deposit10C
10c
Take 20c
Take 20Cbar
bar
Deposit
Deposit5C
5c
Deposit
Deposit5C
5c Deposit
Deposit5c
5C Deposit 5c
Deposit 5C
0C
0c
Deposit
Deposit10C
10c
20C
20c
10C
10c
Deposit
Deposit10C
10c
Take
Take 20C bar
20c bar
Take
Take15C
15c bar
bar Deposit
Deposit10C
10c
5C 15C
15c
5c
Deposit
Deposit5C5c
Deposit 5c
Deposit 5C Deposit
Deposit5c
5C Deposit5c
Deposit 5C
0C
0c
Deposit
Deposit10C
10c
10C
10c 20C
20c
Deposit 10c
Deposit 10C
Take
Take 20C bar
20c bar
Take
Take15C
15c bar
bar Deposit
Deposit10C
10c
15C
15c
5C
5c
Deposit
Deposit5C5c
Deposit
Deposit 5C
5c Deposit 5c
Deposit 5C
0C
0c Deposit
Deposit5c
5C
Deposit10C
Deposit 10c
20C
20c
10c
10C
Deposit 10C
Deposit 10c
Take
Take20C
20cbar
bar
Take 15c
Take 15Cbar
bar Deposit
Deposit10C
10c
15C
15c
5C
5c
Deposit 5c
Deposit 5C
Deposit
Deposit5C
5c Deposit5c
Deposit 5C Deposit5c
Deposit 5C
0C
0c
Deposit
Deposit 10c
10C
10c
10C 20C
20c
Deposit10C
Deposit 10c
Take20c
Take 20Cbar
bar
And finally user will get 15 C chocolate. All the sequences have been represented
in the snapshots as shown in the above figures. Advantage of Petri net modeling
is that it gives visual effect and it is very simple to understand.
50 Embedded Systems
3.4.1 Simulation
Simulation is the imitation, replication of some real thing or a process. In short, it
is a model which can stand as a substitute for real system. One can test the
simulated model in angles by providing sample inputs and can expect outputs
from the simulated model. As real system can be very expensive, this is the
most commonly used way for testing and learning the system. The term simulation
is used in many contexts as simulation of a technology, safety engineering, training
and education.
One area where simulation is being used extensively is in Aviation. A pilot
before being handed with a flight will be first trained in flight simulator. It will
give the trainee lots of options to learn and at one point in time it will also
increase confidence of the trainee so that flight can be handled in real time. And
most importantly, a pilot trainee cannot practice in real flight as it would be very
risky and secondly it would be expensive as well. Microsoft has its very famous
flight simulator which has lots of options available in it which is useful for trainees
to get familiar with the system. Another classical example is testing the elevator.
An elevator before final implementation has to be tested with all possible
combinations and designer needs to ensure that request first raised has to be
responded first. Also designer should make sure all the security aspects are
fulfilled. Door of the elevator should never be opened when the elevator is
moving. All aspects of these kinds have to be tested and safety of the users
must be ensured. Simulation will help in testing all these strategies.
Short simulation is used when a real time system cannot be engaged or it
may not be accessible easily or it might be risky to use it or the system might not
be ready or system is not at all available for using it.
Coming to Embedded Systems side, Simulators play a vital role. A
programmer will not be provided with microcontrollers all the times. So a simulator
will help the programmer by simulating a microcontroller and its functions. Simple
and famous simulator that almost all the programmers use is Keil Simulator.
Some simulators go even a step better by including even the peripherals for
simulation. One hard fact to be accepted is, irrespective of speed of the PC in
which simulator installed, no simulator would be capable of simulating a
microcontroller’s behaviour in real time. Also simulating external events is a
tougher task. So one can come to a conclusion that simulator is best suited for
algorithms.
Advantages
1. First and foremost, it is not risky in any aspect to use it. Even if user
commits a mistake, it would never kill or injure the user.
Design Methodologies, Life Cycle and Modeling of Embedded Systems 51
3.4.2 Emulation
Having discussed on the demerits associated with simulator, emulator comes in.
An Emulator duplicates (emulates) the functions of one system using another
system, the second system will behave as the previous one. This is actually in
contrast to simulation which can concern an abstract model of system getting
simulated. Emulator to a better extent has overcome the problems related to
simulation. It can be faster as well as perfect. An emulator is a piece of hardware
which has behaviours similar to that of a real microcontroller with all its functionality
being inbuilt. A microcontroller’s behaviour can be emulated in real time.
In Circuit Emulator (ICE) is a commonly available emulator for Embedded
Systems. An ICE is very handy in debugging the software of an Embedded
System. The programmer can use ICE to load the code into Embedded System,
run them and can even do step through action (by keeping breakpoints in the
code) and can view and change the data used by system software. Emulator
can provide an interactive user interface for the programmer to investigate and
control the embedded system. Simply source code level debugger can be used
with a graphical window interface that is communicating through an emulator to
a target Embedded System which will now make the debugging easier and
comfortable. The most common problem being faced in embedded systems is
that they lack in intimating software failures. This issue can be fixed to a greater
extent with ICE. ICE helps the programmer to test the code in small pieces and
eventually can isolate the bugged area of the code. ICE provides execution
breakpoints, memory display, memory monitoring and peripheral control. Also
an ICE can be programmed to look for a specific condition and can identify it as
a fault.
Advantages
1. Faster comparing simulation.
2. No need to setup a special environment for using emulators.
52 Embedded Systems
Disadvantages
1. They are faster than simulators, but not as fast as real system.
2. Certain emulators are expensive which might trouble the user.
POINTS TO REMEMBER
1. Software life cycle has a sequence of five steps as: 1. Requirement
collection and analysis, 2. Design, 3. Coding, 4. Integration and 5. Testing.
2. Divide and conquer is the best approach to build a system.
3. Testing has to be carried out extensively before delivering the product
to market.
4. Embedded system can be built based on any of the following life cycles
as Waterfall model, Spiral model, Consecutive refinement model and
Rapid Application Development (RAD) Model.
5. Most followed and simple model for building an Embedded System is
Waterfall model (Provided, it should support back tracking).
6. Modeling a system before making a prototype will reduce the chances
of bugs in product and also will increase confidence of the designer.
7. Embedded systems can be modeled using UML or FSM or Petri nets.
Selection of the methodology is purely based on user’s convenience
and knowledge about the modeling technique preferred.
8. Basic components in UML are Class diagram, Object diagram, Package
diagram, Stereotype diagram, State diagram and Deployment diagram.
9. Petri nets have few important properties as Sequential Execution,
Synchronization, Merging, Concurrency and Conflict.
10. Simulation and emulation will help the designer to simulate the proposed
system and test it with all possible inputs. An example would be Aircraft
landing gear system.
Review Questions
1. Which model will you opt for building an embedded system? Justify the
reason behind your selection.
2. What is the advantage associated with Spiral Modeling?
3. Why should someone go for a modeling approach before implementation?
4. Can FSM be deployed only to model Embedded Systems? Justify.
5. Modeling will be useful for understanding the system requirements well.
Are there any other benefits with modeling?
Design Methodologies, Life Cycle and Modeling of Embedded Systems 53
3.5 QUIZ
1. For designing an embedded system which of the following modeling is
preferred?
(a) Waterfall model (b) Spiral model
(c) RAD model (d) Consecutive refinement model
2. The most important phase in software life cycle is
(a) Integration (b) Design
(c) Testing (d) Coding
3. A very important advantage associated with simulation is
(a) Improved safety and reliability (b) Reduced cost factor
(c) Reduced defects (d) Reduced setting up time
4. A major problem with simulation is
(a) Simulators are sometimes expensive
(b) Setting up a simulator may take time
(c) Cannot test with all possible inputs
(d) Though simulated, needs a physical system to test completely.
Learning Outcomes
R Basic notion about layering
R To get insight about middleware
R To study the basics of all the layers
R Recap
R Quiz
4.1 INTRODUCTION
Those who have studied Computer Networking would be very familiar with the
concept of layering. The famous ISO-OSI layering is the reference model for
most of the networks. Similarly, each and every Embedded System that we
have seen as examples in the first chapter follow a basic structure, whose
pictorial depiction is given in Fig. 4.1.
Application Layer
Application Software Layer
(Optional)
Hardware Layer
(Optional)
As one can see from the above diagram, an embedded system will be generally
composed of 3 layers,
Layers of an Embedded System 55
The major hardware components of most boards can be classified into five
major categories:
1. Central Processing Unit (CPU)—the master processor
2. Memory—where the system’s software is stored
3. Input Device(s)—input slave processors and relative electrical
components
4. Output Device(s)—output slave processors and relative electrical
components
5. Data Pathway/Bus—interconnects the other components, providing
a “highway” for data to travel on from one component to another,
including any wires, bus bridges, and/or bus controllers.
These five categories are based upon the major elements defined by the Von
Neumann model , a tool that can be used to understand any electronic device’s
hardware architecture. The Von Neumann model is a result of the published
work of John Von Neumann in 1945, which defined the requirements of a general
purpose electronic computer. Since embedded systems are a type of computer
system, this model can be applied as a means of understanding embedded systems
hardware.
The way the buses connect the above mentioned components can be seen
in the following figure.
55 System
SystemComponents
Components Commonly
Commonly connected
Connected via Buses
VIA Buses
As depicted, the input is fed to the embedded board first, which is then
stored at the memory unit that will be accessed by the processor for performing
manipulations. Then the processor stores the result again in the memory that
will be finally given out to the output device for display!
The specifications of the hardware components such as the processor input
and output devices are given in Fig. 4.4.
How does an OS for an embedded system differ from the other systems?
Embedded OSes vary in what components they possess; all OSes have a kernel
at the very least. The kernel is a component that contains the main
functionality of the OS,
• Process Management.
• Interrupt and error detection management.
• The multiple interrupts and/or traps generated by the various
processes need to be managed efficiently so that they are handled
correctly and the processes that triggered them are properly
tracked.
• Memory Management.
• I/O System Management.
Figure 4.6 shows the structure of an embedded OS.
Embedded OS
Middleware (Optional)
KERNEL
Process Management Memory Management
A special form of OS is an RTOS (Real Time OS) that deals with the tasks
which have stringent time requirements. A part of RTOS called Scheduler keeps
Layers of an Embedded System 59
track of state of each task and decides which one should go to the running state.
Unlike UNIX or Windows, the scheduler in RTOS are 100% simpleminded
about which task should get the processor. They simply look at the priorities you
assign to the tasks, and among the tasks that are not in the blocked state, the one
with highest priority runs, and rest of them will wait in Ready State. If a high
priority task hogs the microprocessor for a long time while lower priority tasks
are waiting in ready state, the low priority task has to wait. Scheduler assumes
that you knew what you were doing when you set the priorities.
It is the software within the application layer that inherently defines what type
of device an embedded system is, because the functionality of an application
represents at the highest level the purpose of that embedded system and does
most of the interaction with users or administrators of that device, if any
exists. Embedded applications can be divided according to whether they are
market specific (implemented in only a specific type of device, such as video-
Layers of an Embedded System 61
POINTS TO REMEMBER
1. Any embedded system would be composed of a hardware layer, system
software layer and an application software layer.
2. Hardware components include a processor, I/O devices, and the buses.
3. A system software layer is responsible for processing, I/O, memory
management.
4. Middleware layer.
5. Middleware is usually software that mediates between application
software and the kernel or device driver software.
6. Application layer is dependent on the underlying system software layer.
Review Questions
1. Why is layering required and how has layering been done in Embedded
Systems?
2. Refer an immediate example for layering approach in networking. (it’s
simple).
4.3 QUIZ
1. How many layers are there in an embedded system design?
(a) 2 (b) 3
(c) 4 (d) 5
2. Which layer of embedded system accommodates operating system?
(a) Hardware Layer (b) SSL (c) Application Layer.
Learning Outcomes
R Basic Idea on Operating System (OS)
R OS Functionalities
R Introduction to Kernel
• Kernel Components
R Real Time OS (RTOS) – An Introduction
R Comparison of RTOS with General Purpose OS (GPOS)
R Recap
R Quiz
• Peripheral Management
R The OS has to take care of all the peripherals attached to it. One
simple instance, when USB drive (Pen drive) is inserted in the system,
it would recognize it automatically and allow the user to explore and
use the contents. This is called as plug and play feature. Within no
time, OS recognizes the new peripheral being attached.
Here only little functionality has been quoted, OS really does many.
(not only computers!) and manages the computer hardware and renders best
possible service for efficient execution of various applications.
3. File Management
Every user will have lots of files in his system. And it is blatant that the
files can be of different types. For example, user could have C files, Word
documents, Excel sheets, PPTs and what not? There are huge varieties
of files available and users are making use of it. But Kernel takes the
responsibility of handling these files, allocating memory for the files,
creation of the files and deletion as well. It is regarded as one of the most
important service of kernel.
3. Semaphores
Semaphores are simple but powerful mechanisms for implementing better
resource sharing ability. In all the systems tasks created are in need to share the
resources. Sharing available resource without clash is very important as moving
resource from currently executing task may be seriously affected and at the
same time it may produce unwanted results. If the tasks are independent they
do not share any resources between them, and there will not be a question of
resource sharing problem. This can be compared with two roads that run in
parallel having no need to meet. Semaphores are of two types, (a) binary
semaphore and (b) counting semaphore. Both of these are covered in detail
with good examples in Chapter 6.
4. Message Queues
Message queues provide an asynchronous way of communication possible,
meaning that the sender and receiver of the message need not interact with the
message queue at the same time. Message queue has a wide range of
applications. Very simple applications can be taken as example here.
1. Taking input from the keyboard
2. To display output on the screen and
3. Voltage reading from transducer or sensor etc.
A task which has to send the message can put message in the queue and other
tasks. A message queue is a buffer-like object which can receive messages
from ISRs and tasks and the same can be transferred to other recipients. In
short, it is like a pipeline. It can hold the messages sent by sender for a period
until receiver reads it. And biggest advantage which someone can have in queue
is a receiver and sender need not use the queue on same time. A message
queue has been constructed and executed as an example in Chapter 6. Reader
will have good understanding on executing it.
5. Pipes
Pipe is unidirectional data communication mechanism. It is used to transfer data
in unidirectional way. Also a pipe would be used for establishing communication
between related processes only. If user needs a two way communication, then
two separate pipes have to be constructed. Many RTOSes and OSes provide
inbuilt system calls for constructing pipe. A writer can write from write end of
the pipe and reader can read from read end. If there is a necessity one can go
for creating named pipe which can be used for communication between unrelated
processes.
6. Memory Management
As the need for multiprogramming increases, need for managing the memory
also increases proportionally. Memory in memory management generally refers
to the main memory (RAM) management, where each process/task which has
68 Embedded Systems
RTOS GPOS
Example Vx Works, Nucleus, Win CE Windows (All variants),
(Windows Compact Embedded) Solaris, Linux, Unix,
Ubuntu etc.,
Memory RTOS has very less memory GPOS has very large
Requirements requirements. Example: Mobile memory requirements.
phone. Just 16 MB internal Even at times it would
memory can accommodate an require 400 to 1 GB of
RTOS in it. memory for it to be
installed. Example: Instal-
lation of windows requires
large memory.
Support and RTOS will not have so much of General purpose OS has
Facilities inbuilt support facilities. It will got lots of support that
have only the required files and can easily be rendered to
support. the user. Examples: plug
and play feature, Auto
play etc.,
Protection To be precise, RTOS has got The story is reverse here.
(From the very less protection for itself Assume a user is using
applications) from the applications. Assume a media player, word
mobile phone is being used for document and an excel
playing song in its MP3 player. sheet concurrently.
Unfortunately if the MP3 player Unexpectedly if media
gets stuck, there is no way for player gets stuck, user
the mobile to just close the MP3 would have the comfort of
player. It would require a just closing the media
complete restart. Most of the player. User would not
mobile users would have lose the data that he has
definitely felt it. typed in excel sheet or
word document.
Contd...
Real Time Operating Systems (RTOS)—An Introduction 71
POINTS TO REMEMBER
1. An operating system is a resource manager which manages all the
resources effectively.
2. OS mainly performs as a transformer, multiplexer and scheduler.
3. Heart of a system is OS and heart of the OS is Kernel.
4. Kernel has lots of components which facilitate OS to do resource
management. Semaphores, pipe, message queue, signals, memory
management unit, etc.
5. Real time behaviour differentiates a GPOS from RTOS.
6. Real time is defined as logical correctness of an operation within a
deterministic deadline.
7. RTOS will offer very less luxury to user in terms of memory, protection
from other applications, etc.
8. GPOS is luxurious in terms of memory and support features. Also installing
RTOS won’t take much memory where GPOS will occupy enormous
memory.
9. Response time is not a crucial parameter in GPOS but it is of paramount
importance in RTOS.
Review Questions
1. Define operating system.
2. Demonstrate how OS is playing the role as transformer.
3. What is real time behaviour? How is it important?
4. Why is inter process communication needed?
5. Define Semaphore.
6. Why has memory management got more importance?
7. When is a signal generated?
8. Why does embedded system need real time behaviour?
9. RTOS would not support too much functionality. Why so?
10. Differentiate GPOS and RTOS.
72 Embedded Systems
5.6 QUIZ
1. Which of the following is an RTOS?
(a) Vx Works (b) Linux
(c) Ubuntu (d) Windows XP
2. Which of the following is most important and expected behavior of an
RTOS?
(a) Reduced memory usage (b) Multitasking
(c) Response time (d) Protection from the
applications.
3. Which of the following can’t be installed in an Embedded System?
(a) Windows XP (b) VxWorks (c) WinCE
4. Which of the following is not an embedded system?
(a) Laptop (b) Cellular phone
(c) Washing machine (d) Pacemaker
5. Win. CE is the abbreviation of ________________.
Learning Outcomes
Note: In this chapter reader will be exposed more on the real time operating
system concepts. In particular, Linux is going to be used for explaining the
concepts. So it would be great if reader can have a Linux based system
while reading. It is very easy to collect a free Ubuntu (Linux) CD. One should
visit shipit.ubuntu.com and should fill in all the mandatory details asked.
Within ten days reader will be provided with the Ubuntu, free Linux CD. User
will be introduced to few basic operating system concepts then will be taken
through the RTOS.
R Linux—An Introduction
• Comparison of Unix and Linux
R Linux File System Architecture
• File descriptors in Linux
• Description with sample program
R RTOS concepts
• Task
• Task states
• Task transitions
• Task scheduling
R Inter Process Communication (IPC) Methodologies
• Pipe
• Named pipe or FIFO
• Message queue
• Shared memory
• Semaphores
• Task and resource synchronization
R Memory management
R Cache memory
74 Embedded Systems
(i) chmod options filename — change the read, write, and execute
permissions on files.
(j) File Compression
(i) gzip filename — compresses files, so that they take up much less
space.
(ii) gunzip filename — uncompresses files compressed by gzip.
(iii) gzcat filename — To look at a gzipped file without actually having to
gunzip it.
(k) printing :
(i) lpr filename — print.
(ii) lpq — check out the printer queue.
(iii) lprm jobnumber — remove something from the printer queue.
From here on /proc is used extensively for getting details of file descriptors
and process control block. Every process will have process id and it will be
updated in /proc file system.
OS maintains a database called Process Control Block which has details
of file descriptors. File descriptors are numbers allocated to all the files in Linux.
Since everything is file in Linux, Input, Output and Error messages are even
denoted by a number and that is referred as file descriptor. All the file descriptors
are updated clearly in a table called file descriptor table in PCB. The structure
of file descriptor table is referred below in Fig. 6.2.
0–stdin
1–stdout
2–stderr
3–file
4–file
From the picture one can understand that 0 is fd for standard input
(keyboard), 1 is fd for standard output (monitor) and 2 is the number allotted for
standard error message. Since 0,1 and 2 are allotted already any newly created
file will get number after 2. For an instance if a file is created it will get 3 as the
file descriptor and next file created will get 4. Theoretically discussing would
not be sufficient.
So writing a small C code in Linux with C programming will help the reader
to understand the concept in a great way. The following C code aims in creating
2 files. After creation one can manually check the file descriptor table easily
and can check if the files have been allotted with numbers as expected. GNU
tool kit in Linux helps to compile and execute C programs with GCC compilers.
GCC is expanded as GNU C Compiler. The following C code simply creates 2
files txt1.txt and txt2.txt.
// prog1.c
// C code for creating 2 files.
/* creat is the system call used to create new files through code.
While loop in code will execute the code continuously so that user can check
the
80 Embedded Systems
/proc file system. 777 in creat will give read, write and execute permissions
for all the users. This concept has been dealt in a detailed way in the later
part of this chapter.
*/
# include <stdio.h>
int main ()
{
int fd1, fd2;
printf (“ /n THIS WILL CREATE 2 FILES NOW”);
fd1 = creat (“txt1.txt”, 0777);
// creating txt1.txt file through this system call, also file permissions are
// mentioned. Since every function in C will have a return value, this will also
// have the same and here once the file is created, it will return an integer
// which is being referred as file descriptor.
fd2 = creat (“txt2.txt”, 0777);
while(1)
{
}}
Execution procedure
User has to store the file with .c extension and once done the following command
can be issued from prompt.
$ gcc –o prog1 prog1.c
Where
Gcc is – Compiler
Prog1 – Executable file name
Prog1.c – File name.
If there is no compilation error, executable file with name as mentioned
fd_demo will be available for execution. The snapshot is shown in the following
Fig. 6.3 where complete execution cycle has been followed.
Real Time Operating Systems—A Detailed Overview 81
6.2.1 Task
Preliminary thing, when someone will come across any Operating Systems (OS)
book is doing task. It is such an important element which can never be neglected.
It can be stated as a basic building block of an RTOS. A task needs to be
created first before being used. Procedure for creating task is different for
different RTOS.
82 Embedded Systems
Blocked: When a task is being executed, but at some point in time it requires
some external input, it goes blocked. Assume music is being played and a call is
made to the same mobile. After the call is over music should get resumed from
the place where it was left. If user has to press the play button for it to play the
music then it is said to be in blocked state.
Dormant
Ready
Task is unblocked
but is not the Task no longer has
the highest priority Task has the
highest-priority task
highest priority
Task is unblocked
and is the
highest-priority Running
Blocked
task
Task is blocked
due to a request
for an unavailable
resource
Priority/preemptive scheduling
(a) Shortest Job First (SJF)
The task which arrives early is let run irrespective of its execution time. Once
some other task arrives in the ready queue, its execution time is compared with
the one which is being executed. If the new task’s execution time is lesser than
the running task’s, the running task is preempted and moved to the ready queue,
also the one with lesser execution time is dispatched to the running state from
the ready state.
Example:
Table 6.1: SJF Scheduling
Task number Arrival time (in ms) Execution time (in ms)
T0 0 30
T1 3 21
T2 7 14
As shown in Table 6.1, T0 is arriving at time 0 and it’s the first task to arrive, so
it is allowed to execute. At time 3, another task T1 arrives with lesser execution
time than T0, so T0 is preempted and T1 is executed. Again, at time 7, yet
another task arrives with least execution time than T0, T1. So, it gets the highest
priority and it preempts T1. After T2 finishes, T1 resumes, then T0 resumes.
//C code for SJF (non-preemptive)
#include<stdio.h>
#include<conio.h>
#include<process.h>
void main ()
{
char p[10][5],temp[5];
int tot=0,wt[10],pt[10],i,j,n,temp1;
float avg=0;
clrscr ();
//get number of processes
printf(“enter no of processes:”);
86 Embedded Systems
scanf(“%d”,&n);
for(i=0;i<n;i++)
{
//for each process get the process name, burst/execution time
printf(“enter process%d name:\n”,i+1);
scanf(“%s”,&p[i]);
printf(“enter process time”);
scanf(“%d”,&pt[i]);
}
//compare the Burst times of processes and sort them in ascending order
//and execute the processes in that order
for(i=0;i<n–1;i++)
{
for(j=i+1;j<n;j++)
{
if(pt[i]>pt[j])
{
temp1=pt[i];
pt[i]=pt[j];
pt[j]=temp1;
strcpy(temp,p[i]);
strcpy(p[i],p[j]);
strcpy(p[j],temp);
}
}
}
//waiting time of first process is zero!
wt[0]=0;
wt[i]=wt[i–1]+et[i–1];
//find the total waiting time
tot=tot+wt[i];
}
//find the average waiting time
avg=(float)tot/n;
//print all the values
printf(“P_name\t P_time\t w_time\n”);
for(i=0;i<n;i++)
printf(“%s\t%d\t%d\n”,p[i],et[i],wt[i]);
printf(“total waiting time=%d\n avg waiting time=%f ”,tot,avg);
getch();
}
OUTPUT:
enter no of processes: 5
enter process1 name: aaa
enter process time: 4
enter process2 name: bbb
enter process time: 3
enter process3 name: ccc
enter process time: 2
enter process4 name: ddd
enter process time: 5
enter process5 name: eee
enter process time: 1
P_name P_time w_time
eee 1 0
ccc 2 1
bbb 3 3
aaa 4 6
ddd 5 10
88 Embedded Systems
T0 0 3
T1 3 2
T2 7 1
As shown in Table 6.2, T0 is the first task, it’s allowed to run. At time 3, T1
arrives with a higher priority than T0, which preempts T0 and gets the CPU. At
time 7, T2 comes with the least priority than the other two, so it preempts T1
and gets the CPU. Once T2 is done, T1 resumes, then T0 resumes.
// C code for Priority Scheduling:
#include<stdio.h>
#include<conio.h>
#include<process.h>
void main()
{
char p[10][5],temp[5];
int tot=0,wt[10],pt[10],i,j,n,temp1;
float avg=0;
clrscr();
//get number of processes
printf(“enter no of processes:”);
scanf(“%d”,&n);
//for each process get the process name, burst/execution time
Real Time Operating Systems—A Detailed Overview 89
for(i=0;i<n;i++)
{
printf(“enter process%d name:\n”,i+1);
scanf(“%s”,&p[i]);
printf(“enter process time”);
scanf(“%d”,&pt[i]);
}
//compare the priorities of processes and sort them in ascending order
//and execute the processes in that order
for(i=0;i<n–1;i++)
{
for(j=i+1;j<n;j++)
{
if(pt[i]>pt[j])
{
temp1=pt[i];
pt[i]=pt[j];
pt[j]=temp1;
strcpy(temp,p[i]);
strcpy(p[i],p[j]);
strcpy(p[j],temp);
}
}
}
//waiting time of first process is zero!
wt[0]=0;
tot=tot+wt[i];
}
//find the average waiting time
avg=(float)tot/n;
//print the values
printf(“p_name\t P_time\t w_time\n”);
for(i=0;i<n;i++)
printf(“%s\t%d\t%d\n”,p[i],et[i],wt[i]);
printf(“total waiting time=%d\n avg waiting time=% f”,tot,avg);
getch();
}
OUTPUT:
enter no of processes: 5
enter process1 name: aaa
enter process time: 4
enter process2 name: bbb
enter process time: 3
enter process3 name: ccc
enter process time: 2
enter process4 name: ddd
enter process time: 5
enter process5 name: eee
enter process time: 1
p_name P_time w_time
eee 1 0
ccc 2 1
bbb 3 3
aaa 4 6
ddd 5 10
n
Task Number Executing time (in ms)
T0 30
T1 21
T2 14
Here as referred in Table 6.3, it’s given that the time slice is 10 ms. The execution
would be as follows:
First T0 will execute for 10 ms. Then its time slice is expired, then T1, T2
executes successively. After the first round of execution, the remaining execution
times of T0, T1, and T2 would be 20,11,4 respectively. As one can observe, T0,
T1 would need 2 more rounds, T2 needs only one round to finish its execution.
//C code for round robin scheduling:
#include<stdio.h>
#include<conio.h>
#include<process.h>
#include<string.h>
void main()
{
char p[10][5];
//timer denotes time slice
int et[10],wt[10],timer=3,count,pt[10],rt,i,j,totwt=0,t,n=5,found=0,m;
float avgwt;
clrscr();
//get the process name, burst time for 5 processes
for(i=0;i<n;i++)
{
92 Embedded Systems
count=0;
Real Time Operating Systems—A Detailed Overview 93
for(i=0;i<m;i++)
{
for(j=i+1;j<=n;j++)
{
if(strcmp(p[i],p[j])==0)
{
count++;
found=j;
}
}
if(found!=0)
{
wt[i]=wt[found]–(count*timer);
count=0;
found=0;
}
}
//find the total waiting time
for(i=0;i<m;i++)
{
totwt+=wt[i];
}
//find the average waiting time
avgwt=(float)totwt/m;
//print the values
printf(“p_name\tp_time\tw_time\n”);
for(i=0;i<m;i++)
{
printf(“\n%s\t%d\t%d”,p[i],pt[i],wt[i]);
}
94 Embedded Systems
Non-preemptive scheduling
(a) First Come First Serve
The best example for this type of scheduling would be the First Come First
Serve method. It is a relatively simple concept to implement as well as to
understand. The idea is to let the tasks execute as they arrive. Execution of the
tasks will be in the same order as they arrive. Though it is simple, it has got
some serious drawbacks including, the overall waiting time of tasks would be
higher than when they are scheduled in SJF, priority scheduling methodologies.
The following table has an example which would explain this drawback.
Real Time Operating Systems—A Detailed Overview 95
n
Task Number Arrival time (in ms) Execution time (in ms)
T0 0 30
T1 3 21
T2 7 14
void main()
{
char p[10][5];
int tot=0,wt[10],i,n;
float avg=0;
clrscr();
//get number of processes
printf(“enter no of processes:”);
scanf(“%d”,&n);
//get the process’ name and burst time
for(i=0;i<n;i++)
{
printf(“enter process%d name:\n”,i+1);
scanf(“%s”,&p[i]);
printf(“enter process time:”);
scanf(“%d”,&pt[i]);
}
//waiting time of first process is zero
wt[0]=0;
//calculate the waiting time of other processes
for(i=1;i<n;i++)
{
//waiting times of the other process would be,
//the sum of waiting time and execution times of previous processes
wt[i]=wt[i–1]+et[i–1];
//find the total waiting time
tot=tot+wt[i];
}
//find the average waiting time
avg=(float)tot/n;
//print the values
printf(“P_name\t P_time\t w_time\n”);
Real Time Operating Systems—A Detailed Overview 97
for(i=0;i<n;i++)
printf(“%s\t%d\t%d\n”,p[i],et[i],wt[i]);
printf(“total waiting time=%d\n avg waiting time=%f”,tot,avg);
getch();}
OUTPUT:
enter no of processes: 5
enter process1 name: aaa
enter process time: 4
enter process2 name: bbb
enter process time: 3
enter process3 name: ccc
enter process time: 2
enter process4 name: ddd
enter process time: 5
enter process5 name: eee
enter process time: 1
P_name P_time w_time
aaa 4 0
bbb 3 4
ccc 2 7
ddd 5 9
eee 1 14
the tasks would not be able to finish their execution, but in fairness scheduling
most of the tasks are able to finish within the CPU time share.
6.3.1 Pipe
A pipe is very simple way of communicating between two processes. One
relevant real time example will be apt here. To water the plants available at
garden, the tube will be connected to water tank and another end of the pipe will
be used to water the plants. Same is the scenario. When process A has to
transfer data to process B it can use pipe and most important thing is pipe here
is unidirectional i.e., data can be sent in either of the directions at a time. If there
needs to be a dual communication then 2 pipes have to be used.
And one simpler scenario about this pipe is, it can be used only between
related processes. No two different unrelated processes can use pipe. None
will water plants in neighbour’s house. This is the case here.
The client server scenario can very well illustrate the application of pipes.
The client reads a file name from the STDIN (Standard Input, Keyboard) and
writes into the pipe. The server reads this file from the PIPE and opens the file
for reading. If the open is successful, the server responds by reading the file and
writing into the pipe, otherwise an error message would be generated. The
client then reads from the pipe, writing what it receives to the STDOUT. Linux
system programming would be very handy in explaining the Pipe concept. Figure
6.5 has the pipe representation diagrammatically.
Real Time Operating Systems—A Detailed Overview 99
# include <stdio.h>
# include <unistd.h>
int ret;
char buffer[15];
// this buffer is where data will be kept in.
pipe (pipefd);
//pipe system call is used which will create pipe.
//there will be 2 return values. One for read and
//another one for write.
ret = fork();
// creation of child process through fork is done
// here. This will now throw two return values.
// One > 0 and other == 0. >0 is for parent, equal
// to 0 is child.
read (pipefd[0],buffer,sizeof(buffer));
// data is now read, but need to display in the screen.
// for that purpose data is kept in buffer and from buffer
// can be written it to display
write (1,buffer,sizeof(buffer));
//Where 1 represents standard output, the screen.
}
return 0;
}
can be used for only related processes. Now this problem can be overcome by
using Named pipe or FIFO. Reader is advised to try the same program in Linux/
Unix PC which will definitely yield better understanding.
Other
Owner
/* fifo_write.c */
/* for creating FIFO system call mkfifo() has to be used. Also the FIFO
can’t be created in directory other than /tmp of Linux. */
// fifo_write.c
// Code Starts here.
# include <stdio.h>
104 Embedded Systems
# include <sys/stat.h>
# include <sys/types.h>
# include <fcntl.h>
# include <unistd.h>
// above are the standard header files for FIFO creation.
int main()
// main program starts here.
{
int fd, retval;
// mkfifo() will return a return value and it is
// collected in retval. Also fd is return value
// of open system call.
fflush(stdin);
fd = open(“/tmp/myfifo”,O_WRONLY);
// as the fifo is already created, It can be opened
// in write mode for writing the data to fifo from
// buffer.
close (fd);
// since write process is over, close the file.
return 0;
}
// fifo_read.c
// Code Starts here.
# include <stdio.h>
# include <sys/stat.h>
# include <sys/types.h>
# include <fcntl.h>
# include <unistd.h>
// above are the standard header files for FIFO creation.
int main()
// main program starts here.
{
int fd, retval;
char buffer[8];
fd = open(“/tmp/myfifo”,O_RDONLY);
// fifo has been already created in write program,
// so opening it in read only mode.
close(fd);
return 0;
}
Compilation of C code for write should be first done with gcc –o fifo_write
fifo_write.c and then execution of the same can be done with ./fifo_write.
Similar procedure has to be followed for fifo_read.c. If the read file is executed
first, it will wait until write is executed. Automatic synchronization will be there.
MESSAGE QUEUE
// message_snd.c
// Code starts here.
// Header files on IPC and message have to be included with
// normal other headers
#include<stdio.h>
#include<stdlib.h>
#include<sys/ipc.h>
#include<sys/types.h>
#include<sys/msg.h>
struct msgbuf
{
long mtype;
//to decide on message type.
char msgtxt[200];
//size of the data which has to be sent.
};
int main(void)
{
struct msgbuf msg;
// creating an instance of the structure.
int msgid;
// every message is represented by an id
key_t key;
// every queue needs a key, which the sender and receiver will agree upon
Real Time Operating Systems—A Detailed Overview 109
// STAGE 1 of PROGRAM //
if ((key=ftok(“message_snd.c”,‘b’)) == –1)
//ftok function generate the key and it will return the key if successful else
return –1
//ftok is file to key
//a key is generated with the file that current file itself.
//in case of failure, ftok will generate –1.
{
perror(“key”);
// if not created perror function will let us know why it has not been created
exit(1);
}
// STAGE 2 of PROGRAM //
if((msgid=msgget(key,0644|IPC_CREAT))==–1)
// message id is generated through this system call
// if successful it will return the message id through which queue can be
accessed
{
perror(“msgid”);
// if not formed it will give the error message.
exit(1);
}
#include<stdio.h>
#include<stdlib.h>
#include<sys/ipc.h>
#include<sys/types.h>
#include<sys/msg.h>
struct msgbuf
{
long mtype;
char msgtxt[200];
};
int main(void)
{
struct msgbuf msg;
int msgid;
key_t key;
// Stage 1 of PROGRAM //
if((key=ftok(“message_snd.c”, ‘b’))== –1)
// using the same file here to get the key.
{
perror(“key”);
// if not created perror function will let us know why it has not been created
exit(1);
}
// Stage 2 of program //
if((msgid=msgget(key,0644))==–1)
{
perror(“msgid”);
exit(1);
112 Embedded Systems
for(;;)
{
if(msgrcv(msgid,&msg,sizeof(msg),1,0)==–1)
// here msgrcv is used. This is the major difference between send and receive
program
{
perror(“msgrcv”);
exit(1);
}
printf(“%s\n”,msg.msgtxt);
}
return 0;
}
assign ownership to another user with shmctl(). It can also revoke this assignment.
Other processes with proper permission can perform various control functions
on the shared memory segment using shmctl(). Once created, a shared segment
can be attached to a process address space using shmat(). It can be detached
using shmdt(). The attaching process must have the appropriate permissions for
shmat(). Once attached, the process can read or write to the segment, as allowed
by the permission requested in the attach operation. A shared segment can be
attached multiple times by the same process. A shared memory segment is
described by a control structure with a unique ID that points to an area of
physical memory. The identifier of the segment is called the shmid. The structure
definition for the shared memory segment control structures and prototypes can
be found in <sys/shm.h>.
There are three steps:
1. Initialization
2. Attach
3. Detach
Two separate programs for read and write are presented here.
// shared memory write program
// shmwrite.c
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/types.h>
#include <stdio.h>
#include <string.h>
int main()
{
int retval,shmid;
void *memory = NULL;
char *p;
if (shmid < 0)
{
printf (“\n The Creation has gone as a failure, Sorry”);
shmid = shmget ((key_t)1234, 6, 0666);
// keeping a check here, if shmid is not created it will be created. As it is
already
// created it will be of not much use.
}
//on success shmat() returns the address of the attached shared memory
segment of
// so why void *memory = NULL is declared in the start of the code.
if (p == NULL)
{
printf(“\n Attachment failure, Sorry”);
return 0;
}
p=(char *) memory;
// specifying the data type.
// sending the characters, so we need to cast it to char
retval = shmdt(p);
if (retval < 0)
{
printf(“\n Suffered Detachment”);
return 0;
}
}
// shared memory read program
// shmread.c
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/types.h>
#include <stdio.h>
#include <string.h>
#include <memory.h>
int main ()
{
int retval,shmid;
void *memory = NULL;
char *p;
if (shmid < 0)
{
printf(“\n The Creation has gone as a failure, Sorry”);
shmid = shmget ((key_t)1234, 6, 0666);
}
printf(“\n We are getting the shared memory created %d”, shmid);
{
printf(“\n Attachment failure, Sorry”);
return 0;
}
p=(char *) memory;
printf (“\n MESSAGE is %s \n”,p);
Execution is similar to the cases dealt in the past. GCC has to be used and
compilation has to be done first for shmwrite.c then it has to be executed. After
executing file to write same procedure has to be followed for shmread.c. The
next scenario to be looked into is Semaphores.
Semaphores
If there is a two way road, then there will never be a problem for clash and
resource sharing (road will be available for cars travelling either ways). Figure
6.8 is a case referred here. Cars can move comfortably without any clash here.
So no need is there for synchronization or sharing of resources.
If there is an intersecting road there needs a definite way for avoiding clash i.e.,
an electronic signal will be used there which use green, red and yellow lights to
indicate the availability of road for the vehicles to move on. Based on signal,
vehicle can move and use the road without collision. Figure 6.9 shows this
concept diagrammatically.
118 Embedded Systems
One more real time example can be impelled here. Consider a railway track
which connects two cities. A train if uses the rail track, another train can’t use
the rail track on the same instance. If used, it would result in a dire accident. So
came the notion of semaphore, which is an indicator of the status of the rail
track (resource). Based on the indication of the semaphore, driver of the
subsequent train can comprehend the status of track. Figure 6.10 has semaphore
used in railway department in it. If free, rail track can be used or if not driver
has to wait until the resource is freed. This is semaphore and the same is being
used in embedded systems to avoid clashes for resource.
Before a task can access a shared resource, it should first have that access
permission. Semaphore (S) is an integer variable which helps in achieving
synchronized access of shared resources. S can have a value greater than or
equal to 0. It can be accessed only by two operations namely, wait and signal
operations. These operations work as follows: Wait (S) decrement the Semaphore
value and if in the process of decrementing the value of Semaphore reaches
negative value then the process is suspended and placed in queue for waiting.
Signal (S) increments the value of the Semaphore and it is opposite in action to
wait (S). In other words it causes the first process in queue to get executed. Let
S be the semaphore used to access a shared resource ‘D’.
function wait(S)
while (S<=0);
//wait in the queue, some other task is using D
//no operation takes place.
//Note: loop body is empty
// Once S value reaches a positive value, start using D after decrementing
S.
S– –;
function signal(S)
//the usage of shared resource is over, release ‘D’ by incrementing S
S++;
Where the Semaphore is stored?
As already seen, semaphore is a variable which is accessed by the tasks before
using a shared resource. So it should be globally available to all the tasks which
execute concurrently. In the operating system, there is a globally accessible
area called kernel which is used to store the Semaphore value. We can view
the existing Semaphores by giving the following command in the unix editor
prompt:
$ ipcs –s
A simple Example explaining S
2. Uses display and after usage it will release it. On releasing the shared
device, it is implicit that it increments the S value by one, which would
now become 1 i.e., any other task wanting to use the shared resource can
now acquire S.
3. Task2 acquires S, decrements again.
4. Releases the resource and increments S after usage.
If both the tasks need access, kernel can give the access to only one of the
tasks i.e., semaphore will be given to only one task. This allocation can be based
on priority or first come first serve basis. If many tasks need access, then they
will be kept in queue. They will wait for their turn and they can gain access.
DISPLAY
TASK1 TASK2
Acquire
(value = 0)
Initial Initial
value = 1 Available Unavailable value = 0
Release
(value = 1)
Task a Task b
Wait for Semaphore S (acquires S) Wait for Semaphore T (acquires T)
Wait for Semaphore T Wait for Semaphore S
(T value will be negative (S value will be negative
since it’s already used by b, so, queued) since it’s already used by a, so, queued)
……….. ………..
……….. ………..
Signal(S) Signal(T)
Signal(T) Signal(S)
The above sequences when executed, they lead to a deadlocked situation. Neither
of the two tasks succeed in finishing execution. This would eventually lead to
the starvation of both the tasks, task a waiting for T while having the semaphore
S, and task b waiting for S having the semaphore for T.
All the philosophers are made to sit in a round table with forks on left and
right side of the plate. The philosopher in the hungry state can go to the eating
state only if both the forks on both the sides of the plates are available. Even if
one is missing, the philosopher cannot go for tasting the food. Philosophers who
do not feel hungry will be in thinking state.
Fork
Eating philosopher
Thinking philosopher
As it can be seen from the Fig. 6.13, only two from 5 philosophers can be
in eating state at any point of time to avoid conflicts. Another famous example
for mutual exclusion problem is reader-writer problem where no two writers or
a writer and a reader can perform their work simultaneously.
capacity and flash memory, the cheapest with large storage capacity. Our focus
would be on cache memory and RAM.
Properties of RAM
1. Popularly known as physical memory
2. Size is lesser than hard disk (for instance, 512 MB RAM<<<120 GB
HDD)
3. So, speeder than hard disk
4. Costlier when compared to hard disk
5. Volatile in nature, which means, doesn’t remember anything when the
power is shut down
6. It holds the most important, operating system (the OS) with itself for
speeder operations
7. Accessed by CPU for execution of programs
8. Accessed during DMA (Direct Memory Access) by CPU
9. Size ranges from 128 MB to 10 GB
Why memory management?
As explained in the properties of RAM, it is only a limited capacity memory
device. It cannot hold lots and lots of programs as HDD does, which eventually
arises a question. What will happen if the size of the user program is larger than
the RAM’s? Here comes the need for memory management.
The program which has to be executed should be brought in to the RAM
from the secondary storage device like HDD. Only then it’ll be able to get
executed. How the program is allocated space in the RAM? It is another question
which needs attention in Memory management. There are few techniques
available to deal with memory allocation problem. The basic form of memory
allocation is done by partitioning the RAM. It can take any of the following
forms:
1. Fixed size partitioning
2. Variable size partitioning
3. Dynamic allocation based on size of the programs.
1. Fixed size partitioning
Memory is divided into equal sized partitions in this technique. Each partition
can hold a program of equal of lesser size (Refer Fig. 6.14). Number of such
partitions determines the amount of multiprogramming ability. Suppose the
capacity of the RAM is 512 MB and first 80 MB are allocated to the OS. The
rest of them are divided into partitions. The size of the partition can be anything.
For instance, it may be of 20 KB size. So, programs’ size should be of the same
Real Time Operating Systems—A Detailed Overview 125
size. Else some portion of the memory will be wasted! The waste is technically
called internal fragmentation which is discussed in the following subsection.
Example:
Let the size of the RAM be 512 MB, and the approximate size of each partition
be 100 KB.
And the programs of following size arrive.
120 KB, 80 KB, 18 KB, 60 KB.
How the allocation is going to be?
• First program needs a space > partition size. So it is given two partitions.
• Then the second needs a space lesser than partition size, which is given
the 3rd partition.
• And 3rd needs much lesser space than the partition size. It’s given the 4th
partition.
• Similarly 4th program needs a space lesser than the partition size. It’s
given the 5th partition.
As it can be seen that with each allocation, there goes some space wasted. In
the first allocation, 80 KB is wasted, similarly 20 KB, 82 KB, 40 KB wasted for
2nd, 3rd, 4th allocations respectively. So, amount of memory wasted is more which
is very costly to spare. This is the disadvantage with fixed size partitioning.
Advantage:
• Simple to implement
Disadvantage:
• Memory wastage is more
2. Variable size partitioning
To eliminate the disadvantage with the fixed size partitioning, this type of
partitioning was introduced. In this type, partitions are of different sizes. Each
size is associated with a queue into which the processes of equal/ lesser sizes
are admitted. Then the processes are given the memory in FIFO manner.
Pictorially the same is represented in Fig. 6.15.
126 Embedded Systems
New Processes
In this type, when a new process arrives, based on its size, it is admitted to any
one of the queues available. In this type also memory wastage happens. But
lesser than the previous type. For example, let the sizes of partitions be 10, 20,
40, 60, 100, 200 KB. Assume the same programs as fixed size partitioning.
When they are allocated to variable sized partitions, 120, 80, 18, 60 KBs are
given the partitions 200, 100, 20, 60 KBs respectively. The memory which is
going to be unused is much lesser than the previous allocation method. There is
yet another allocation method, which is the most famous type of allocation,
which is dynamic memory allocation i.e., allocating the memory at the execution
time. It is covered in the last subsection.
During the initial references, the miss rate would be much higher, since the
cache is empty during the start. But as time rolls on, the cache is filled and there
will be more hits than misses. Cache memory concept is diagrammatically
represented in Fig. 6.16.
A real world example for caching
Everybody uses Google almost all the time for their work. Once the keywords
which are to be searched are keyed in, ten pages each containing the relevant
information with respect to the search key are displayed to the user. For example,
while searching for “cache memory” the following link appears.
In this case (Fig. 6.17), even if the original page has been removed, the cached
page will be given to the user who is searching for that page. Also, it is not that
the cached page is the most recently updated page. It may be on the same day
the page is updated. And the one which is being displayed mayn’t reflect that. If
the user clicks on the cached hyperlink, he/she gets a remainder at the top of the
displayed page regarding the updation conflict. The most frequently visited links
will be having such tags. Some pages may not be cached also. In this case the
cached hyperlink is absent. Figure 6.17 can stand as support for the concept
discussed.
For high cache performance
1. Hit time should be minimized.
2. Miss rate should be reduced.
3. Miss penalty should be reduced (penalty associated with the cache miss,
may be in the form of CPU time/cost/both).
memory space exactly what they need. Not even a single byte is allocated
extra. This process would eliminate internal partitioning, but introduces a new
problem of holes which are free space in between the partitions, the size a hole
may not accommodate a program but when all the holes are combined together,
they will become useful. The holes are also known as external fragmentation
(Fig. 6.18). The process of combining the holes together is popularly known as
compaction. The holes’ concept can be understood from the following figure.
Three popular techniques are available for dynamic partitioning.
1. First fit
2. Best fit
3. Next fit
Next fit:
– Scans memory from the location of the last placement.
– More often allocates a block of memory at the end of memory where the
largest block is found.
– The largest block of memory is broken up into smaller blocks.
– Compaction is required to obtain a large block at the end of memory.
First fit
80 MB
80 MB
Best fit
120 MB
Next fit
Occupied
block
50 MB
Recently allocated
block. (Note: next fit
follows the next
available free block,
i.e. 120 MB block)
6.7 FRAGMENTATION
Fragmentation has been already discussed in the previous topics such as
partitioning and dynamic memory allocation. Here, a brief summary of those:
Real Time Operating Systems—A Detailed Overview 131
1. Due to the poor usage of the memory partitions, some portion of the
fragments become an empty space which is of no use. This is generally
known as fragmentation.
2. There are two kinds of fragmentation.
R Internal
R External
3. Internal fragmentation arises due to the smaller size of the block which is
very small for the partition in which it is allotted i.e., partition size>block
size.
4. External fragmentation is due to the dynamic allocation of memory where
sizes of the blocks allocated from time to time differ which creates holes
in the memory. These holes will be of no use most of the times due to its
smaller size. Once the holes are combined using compaction, the
disadvantage of holes disappears!
5. Internal fragmentation cannot be resolved using any technique since it is
private to the fragment.
2. Programs, in the absence of the virtual memory, need to wait for the
RAM to free some space to let the new programs to run.
3. Increases the degree of multi-programming by letting many processes to
reside in the RAM at the same time.
Disadvantage
1. The major disadvantage with respect to the virtual memory scheme is
that, the time it takes to move portions of processes to the hard disk.
PC (Program Counter)
SP (Stack Pointer)
CPU registers
POINTS TO REMEMBER
1. Linux is an open source OS.
2. A task is the basic element in every OS, which is the program in execution
Task states including Dormant, Running, Ready, Blocked.
3. Task scheduling means arranging the tasks in ready state in an order in
which they will be allowed to get executed.
4. Task scheduling is of 2 types, preemptive, non-preemptive.
5. Pipe is a logical communication channel for processes wanting to
communicate with each other.
6. Message queue is something which processes use to share messages.
7. Synchronization is needed among the tasks accessing the common
resource. Semaphore and Mutual exclusion are the key concepts to
achieve synchronization.
8. Semaphore is a variable that ensures synchronized access to the shared
resources.
9. Mutual exclusion is a technique which allows a process to enter critical
section only if no other process is currently inside the critical section.
10. Memory management deals with efficient management of the main
memory in multi programming environments.
134 Embedded Systems
6.10 QUIZ
1. When does a task move from running state to blocked state?
2. Write down the memory hierarchy.
3. Cache memory is faster and costlier than RAM—true/false.
4. Differentiate preemptive and non-preemptive techniques.
5. Differentiate binary semaphore and counting semaphore.
6. Why is task synchronization needed?
7. What is the disadvantage of using virtual memory?
Learning Outcomes
R Serial Communication Basics
• RS 232 model
• I2C (I Square C) model
R CAN and CAN OPEN
R SPI and SCI
R USB
R IEEE 1394 – Apple Fire Wire
R HDLC – An Insight
R Parallel Communication Basics
• PCI interface
• PCI-X interface
R Device Drivers – An Introduction
• Serial port device driver
• Parallel port device driver
R Recap
R Quiz
Space
Mark
Bit Time
Character Frame
13 1
25 14
1. Protective Ground 14. Secondary TD
2. Transmit Data (TD) 15. Transmit Clock
3. Receive Data (RD) 16. Secondary RD
4. Request to Send (RTS) 17. Receiver Clock
5. Clear to Send (CTS) 18. Local Loop Back
6. Data Set Ready (DSR) 19. Secondary RTS
7. Signal Ground 20. Data Terminal Ready (DTR)
8. Data Carrier Detect (CD) 21. Remote Loop Back
9. Reserved 22. Ring Indicate
10. Reserved 23. Data Rate Detect
11. Unassigned 24. Transmit Clock
12. Secondary CD 25. Test Mode
13. Secondary CTS
designed to absorb line noise. A low level is defined as logic 1 and is referred to
as “marking.” Similarly, a high level is defined as logic 0 and is referred to as
“spacing.”
Vdd
Pull-up resistors
SDA
Master
Micro- SCL
controller
Like the CAN and LIN protocols, the I²C also follows the master-slave
communication protocol. But the I²C bus is a multi-master bus, which means
more than one IC devices are capable of initiating a data transfer can be connected
to it. The device that initiates the communication is called MASTER, whereas
the device being addressed by the Master is called as SLAVE. It is the master
device who always does generation of clock signals, which means each master
generates its own clock signals when transferring data on the bus.
140 Embedded Systems
Modes of Operation
The I²C bus can operate in three modes, or in other words the data on the I2C
bus can be transferred in three different modes.
1. Standard mode
2. Fast mode
3. High-Speed (HS) mode.
Standard mode
1. This is the original Standard mode released in early 80’s.
2. It has maximum data rates of 100kbps.
3. It uses 7-bit addressing, which provides 112 slave addresses.
Enhanced or Fast mode
The fast mode added some more features to the slave devices.
1. The maximum data rate was increased to 400 kbps.
2. To suppress noise spikes, Fast-mode devices were given Schmidt-triggered
inputs.
3. The SCL and SDA lines of an I²C-bus slave device were made to exhibit
high impedance when power was removed.
High-Speed Mode
This mode was created mainly to increase the data rate up to 36 times faster
than standard mode. It provides 1.7 MBPS (with C>b = 400pF), and 3.4Mbps
(with C>b = 100pF).
The major difference between High Speed (HS) mode in comparison to
standard mode is, HS mode systems must include active pull up resistors on the
SCL line. The other difference is, the master operating in HS-mode sends a
compatibility request signal in code form to slave, if not acknowledge (a bit
name within the I2C frame) remains high after receiving the compatibility code,
than the master assumes the slave is capable of HS-mode.
Antilock Cruise
Transmission Airbags Window
braking Control
Control Node Control Node mirror
Control Node
Control
Node
Like any network applications, CAN also follows layered approach to the system
implementation. It conforms to the Open Systems Interconnection (OSI) model
that is defined in terms of layers. The ISO 11898 (For CAN) architecture defines
the lowest two layers of the seven layers OSI/ISO model as the data-link layer
and physical layer.
The rest of the layers (called Higher Layers) are left to be implemented by the
system software developers (used to adapt and optimize the protocol on multiple
media like twisted pair. Single wire, optical, RF or IR). The Higher Level Protocols
(HLP) is used to implement the upper five layers of the OSI in CAN. CAN use
a specific message frame format for receiving and transmitting the data. The
two types of frame format available are:
(a) Standard CAN protocol or Base frame format
(b) Extended CAN or Extended frame format
Error detection and correction
This mechanism is used for detecting errors in messages appearing on the CAN
bus, so that the transmitter can retransmit the message. The CAN protocol
defines five different ways of detecting errors. Two of these work at the bit
level, and the other three at the message level.
1. Bit Monitoring
2. Bit Stuffing
3. Frame Check
4. Acknowledgment Check
5. Cyclic Redundancy Check
1. Each transmitter on the CAN bus monitors (i.e., reads back) the transmitted
signal level. If the signal level read differs from the one transmitted, a Bit
Error is signaled. Note that no bit error is raised during the arbitration
process.
2. When five consecutive bits of the same level have been transmitted by a
node, it will add a sixth bit of the opposite level to the outgoing bit stream.
The receivers will remove this extra bit. This is done to avoid excessive
DC components on the bus, but it also gives the receivers an extra
opportunity to detect errors: if more than five consecutive bits of the
same level occur on the bus, a Stuff Error is signaled.
3. Some parts of the CAN message have a fixed format, i.e., the standard
defines exactly what levels must occur and when (Those parts are the
CRC Delimiter, ACK Delimiter, End of Frame, and also the Intermission).
If a CAN controller detects an invalid value in one of these fixed fields, a
Frame Error is signaled.
Networks for Embedded Systems 143
7.3.1 SPI
The Serial Peripheral Interface Bus or SPI bus is a synchronous serial data link
standard named by Motorola that operates in full duplex mode. Devices
communicate in master/slave mode where the master device initiates the data
frame. Multiple slave devices are allowed with individual slave select (chip
select) lines. Sometimes SPI is called a “four-wire” serial bus, contrasting with
three-, two-, and one-wire serial buses.
SCLK SCLK
SPI MOSI MOSI SPI
Master Slave
MISO MISO
SS SS
During each SPI clock cycle, a full duplex data transmission occurs:
R the master sends a bit on the MOSI line; the slave reads it from that same
line.
R the slave sends a bit on the MISO line; the master reads it from that same
line.
Not all transmissions require all four of these operations to be meaningful
but they do happen. Transmissions normally involve two shift registers of some
given word size, such as eight bits, one in the master and one in the slave; they
are connected in a ring. Data are usually shifted out with the most significant bit
first, while shifting a new least significant bit into the same register. After that
register has been shifted out, the master and slave have exchanged register
values. Then each device takes that value and does something with it, such as
writing it to memory. If there is more data to exchange, the shift registers are
loaded with new data and the process repeats.
Transmissions may involve any number of clock cycles. When there are
no more data to be transmitted, the master stops toggling its clock. Normally, it
then deselects the slave. Transmissions often consist of 8-bit words, and a
master can initiate multiple such transmissions if it wishes/needs. However,
other word sizes are also common, such as 16-bit words for touchscreen
controllers or audio codecs, like the TSC2101 from Texas Instruments; or 12-bit
words for many digital-to-analog or analog-to-digital converters.
Every slave on the bus that hasn’t been activated using its slave select line
must disregard the input clock and MOSI signals, and must not drive MISO.
The master must select only one slave at a time.
Some slave devices are designed to ignore any SPI communications in
which the number of clock pulses is greater than specified. Others don’t care,
ignoring extra inputs and continuing to shift the same output bit. It is common for
different devices to use SPI communications with different lengths, as, for
example, when SPI is used to access the scan chain of a digital IC by issuing a
command word of one size (perhaps 32 bits) and then getting a response of a
different size (perhaps 153 bits, one for each pin in that scan chain).
SPI devices sometimes use another signal line to send an interrupt signal to
a host CPU. Examples include pen-down interrupts from touchscreen sensors,
thermal limit alerts from temperature sensors, alarms issued by real time clock
chips, SDIO, and headset jack insertions from the sound codec in a cell phone.
Interrupts are not covered by the SPI standard; their usage is neither forbidden
nor specified by the standard.
Advantages
• Full duplex communication
Networks for Embedded Systems 145
7.3.2 SCI
A Serial Communications Interface (SCI) is a device that enables the serial
(one bit at a time) exchange of data between a microprocessor and peripherals
such as printers, external drives, scanners, or mice. In this respect, it is similar to
a Serial Peripheral Interface (SPI). But in addition, the SCI enables serial
communications with another microprocessor or with an external network. The
term SCI was coined by Motorola in the 1970s. In some applications it is known
as a Universal Asynchronous Receiver/Transmitter (UART).
The SCI contains a parallel-to-serial converter that serves as a data
transmitter, and a serial-to-parallel converter that serves as a data receiver. The
two devices are clocked separately, and use independent enable and interrupt
signals. The SCI operates in a Nonreturn-To-Zero (NRZ) format, and can function
in half-duplex mode (using only the receiver or only the transmitter) or in full
duplex (using the receiver and the transmitter simultaneously). The data speed
is programmable.
146 Embedded Systems
Serial interfaces have certain advantages over parallel interfaces. The most
significant advantage is simpler wiring. In addition, serial interface cables can
be longer than parallel interface cables, because there is much less interaction
(crosstalk) among the conductors in the cable.
The term SCI is sometimes used in reference to a serial port. This is a
connector found on most personal computers, and is intended for use with serial
peripheral devices. Normally data is sent as 8 or 9 bit words [least significant bit
first].
A START bit marks the beginning of the frame. The start bit is active low.
The figure above shows a framed 8 bit data word. The data word follows the
start bit. A parity bit may follow the data word [after the MSB] depending on
the protocol used. A mark parity bit [always set high] may be used, a space
parity bit [always set low] may be used, or an even/odd parity bit may be used.
The even parity bit will be a 1 if the number of ones/zeros is even, or a zero
if there are an odd number. The odd parity bit will be high if there is an odd
number of ones/zeros in the data field. No parity bit is used in the example
above. A stop bit will normally follow the data field [or parity bit if used]. The
stop bit is used to bring [or insure] the signal rests at a logic high following the
end of the frame; so when the next start bit arrives it will bring the bus from a
high to low. Idle characters are sent as all ones with no start or stop bits. The RT
clock rate is 16 times the incoming baud rate. The RT clock is re-synchronized
after every start bit.
7.4 USB
Universal Serial Bus (USB) is a specification to establish communication between
devices and a host controller (usually a personal computer), developed and
invented by Intel. USB has effectively replaced a variety of interfaces such as
serial and parallel ports.
USB can connect computer peripherals such as mice, keyboards, digital
cameras, printers, personal media players,flash drives, Network Adapters, and
external hard drives. For many of those devices, USB has become the standard
connection method.
USB was designed for personal computers, but it has become common
place on other devices such as smart phones, PDAs and video game consoles,
and as a power cord. As of 2008, there are about 2 billion USB devices sold per
year, and approximately 6 billion total sold to date.
Unlike the older connection standards RS-232 or Parallel port, USB
connectors also supply electric power, so many devices connected by USB do
not need a power source of their own.
Networks for Embedded Systems 147
Host Controller
Logical Pipes
Endpoints in
the device
Device
Wireless USB is the new wireless extension to USB that combines the
speed and security of wired technology with the ease-of-use of wireless
technology. Wireless connectivity has enabled a mobile lifestyle filled with
conveniences for mobile computing users. Supporting robust high-speed wireless
connectivity, wireless USB utilizes the common WiMedia* Ultra-Wide Band
(UWB) radio platform developed by the WiMedia Alliance.
148 Embedded Systems
0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0
Synchronous framing
On synchronous links, this is done with bit stuffing. Any time that 5 consecutive
1-bits appear in the transmitted data, the data is paused and a 0-bit is transmitted.
This ensures that no more than 5 consecutive 1-bits will be sent. The receiving
device knows this is being done, and after seeing 5, 1-bits in a row, a following
0-bit is stripped out of the received data. If the following bit is a 1-bit, the
receiver has found a flag.
This also (assuming NRZI with transition for 0 encoding of the output)
provides a minimum of one transition per 6 bit times during transmission of data,
and one transition per 7 bit times during transmission of flag, so the receiver can
stay in sync with the transmitter. Note however, that for this purpose encodings
such as 8b/10b encoding are better suited. HDLC transmits bytes of data with
the least significant bit first (little-endian order).
Asynchronous framing
When using asynchronous serial communication such as standard RS-232 serial
ports, bits are sent in groups of 8, and bit-stuffing is inconvenient. Instead they
use “control-octet transparency”, also called “byte stuffing” or “octet stuffing”.
The frame boundary octet is 01111110, (7E in hexadecimal notation). A “control
escape octet”, has the bit sequence ‘01111101’, (7D hexadecimal). If either of
these two octets appear in the transmitted data, an escape octet is sent, followed
by the original data octet with bit 5 inverted. For example, the data sequence
“01111110” (7E hex) would be transmitted as “01111101 01011110” (“7D 5E”
hex). Other reserved octet values (such as XON or XOFF) can be escaped in
the same way if necessary.
channel transmits eight bits (or a byte) simultaneously. A serial channel would
transmit those bits one at a time. If both operated at the same clock speed, the
parallel channel would be eight times faster. A parallel channel will generally
have additional control signals such as a clock, to indicate that the data is valid,
and possibly other signals for handshaking and directional control of data
transmission. Before the development of high-speed serial technologies, the
choice of parallel links over serial links was driven by these factors:
• Speed: Superficially, the speed of a parallel data link is equal to the number
of bits sent at one time times the bit rate of each individual path; doubling
the number of bits sent at once, doubles the data rate. In practice, clock
skew reduces the speed of every link to the slowest of all of the links.
• Cable length: Crosstalk creates interference between the parallel lines,
and the effect worsens with the length of the communication link. This
places an upper limit on the length of a parallel data connection that is
usually shorter than a serial connection.
• Complexity: Parallel data links are easily implemented in hardware, making
them a logical choice. Creating a parallel port in a computer system is
relatively simple, requiring only a latch to copy data onto a data bus. In
contrast, most serial communication must first be converted back into
parallel form by a Universal Asynchronous Receiver/Transmitter (UART)
before they may be directly connected to a data bus.
The decreasing cost of integrated circuits, combined with greater consumer
demand for speed and cable length, has led to parallel communication links
becoming deprecated in favour of serial links; for example, IEEE 1284 printer
ports vs. USB, Parallel ATA vs. Serial ATA, and SCSI vs. FireWire.
On the other hand, there has been a resurgence of parallel data links in RF
communication. Rather than transmitting one bit at a time (as in Morse code
and BPSK), well-known techniques such as PSM, PAM, and multiple-input,
multiple-output communication send a few bits in parallel. (Each such group of
bits is called a “symbol”). Such techniques can be extended to send an entire
byte at once (256-QAM). More recently techniques such as OFDM have been
used in Asymmetric Digital Subscriber Line to transmit over 224 bits in parallel,
and in DVB-T to transmit over 6048 bits in parallel.
A key difference between PCIe bus and the older PCI, is the bus topology.
PCI uses a shared parallel bus architecture, where the PCI host and all devices
share a common set of address/data/control lines. In contrast, PCIe is based on
point-to-point topology, with separate serial links connecting every device to the
root complex (host). Due to its shared bus topology, access to the PCI bus is
arbitrated (in the case of multiple masters), and limited to 1 master at a time, in
a single direction. Furthermore, PCI’s clocking scheme limits the bus clock to
the slowest peripheral on the bus (regardless of the devices involved in the bus
transaction). In contrast, a PCIe bus link supports full-duplex communication
between any two endpoints, with no inherent limitation on concurrent access
across multiple endpoints.
In terms of bus protocol, PCIe communication is encapsulated in packets.
The work of packetizing and depacketizing data and status message traffic is
handled by the transaction layer of the PCIe port (described later). Radical
differences in electrical signaling and bus protocol require the use of a different
mechanical form factor and expansion connectors (and thus, new motherboards
and new adapter boards); PCI slots and PCIe slots are not interchangeable. At
the software level, PCIe preserves backward compatibility with PCI; legacy
PCI system software can detect and configure newer PCIe devices without
explicit support for the PCIe standard, though PCIe’s new features will not be
accessible. (And PCIe cards cannot be inserted into PCI slots).
The PCIe link between 2 devices can consist of anywhere from 1 to 32
lanes. In a multi-lane link, the packet data is striped across lanes, and peak data-
throughput scales with the overall link width. The lane count is automatically
negotiated during device initialization, and can be restricted by either endpoint.
For example, a single-lane PCIe (x1) card can be inserted into a multilane slot
(x4, x8, etc.), and the initialization cycle will auto negotiate the highest mutually
supported lane count.
The link can dynamically down configure the link to use fewer lanes, thus
providing some measure of failure tolerance in the presence of bad/unreliable
lanes. The PCIe standard defines slots and connectors for multiple widths: x1,
x4, x8, x16, x32. This allows PCIe bus to serve both cost-sensitive applications
where high throughput is not needed, as well as performance-critical applications
such as 3D graphics, network (10 Gigabit Ethernet, multiport Gigabit Ethernet),
and enterprise storage (SAS, Fibre Channel).
As a point of reference, a PCI-X (133 MHz 64 bit) device and PCIe device
at 4-lanes (x4), Gen1 speed have roughly the same peak transfer rate in a single
direction: 1064 MB/sec. The PCIe bus has the potential to perform better than
the PCI-X bus in cases where multiple devices are transferring data
communicating simultaneously, or if communication with the PCIe peripheral is
bidirectional.
154 Embedded Systems
PCI Express × 1
PCI Express × 16
A driver typically communicates with the device through the computer bus
or communications subsystem to which the hardware connects. When a calling
program invokes a routine in the driver, the driver issues commands to the device.
Once the device sends data back to the driver, the driver may invoke routines in
the original calling program. Drivers are hardware-dependent and operating
system specific. They usually provide the interrupt handling required for any
necessary asynchronous time dependent hardware interface.
to split each 8-bit byte into two 4-bit nibbles which were fed in sequentially
through the status lines. Each pair of nibbles was then recombined into an 8-bit
byte. This same method (with the splitting and recombining done in software)
was also used to transfer data between PCs using a lap link cable.
Device drivers, particularly on modern Windows platforms, can run in kernel-
mode (Ring 0) or in user-mode (Ring 3). The primary benefit of running a driver
in user mode is improved stability, since a poorly written user mode device
driver cannot crash the system by overwriting kernel memory. On the other
hand, user/kernel mode transitions usually impose a considerable performance
overhead, thereby prohibiting user mode drivers for low latency and high
throughput requirements.
Kernel space can be accessed by user module only through the use of
system calls. End user programs like the UNIX shell or other GUI based
applications are part of the user space. These applications interact with hardware
through kernel supported functions.
Virtual device drivers represent a particular variant of device drivers. They
are used to emulate a hardware device, particularly in virtualization environments,
for example when a DOS program is run on a Microsoft Windows computer or
when a guest operating system is run on, for example, a Xen host.
Instead of enabling the guest operating system to dialog with hardware,
virtual device drivers take the opposite role and emulate a piece of hardware, so
that the guest operating system and its drivers running inside a virtual machine
can have the illusion of accessing real hardware. Attempts by the guest operating
system to access the hardware are routed to the virtual device driver in the host
operating system as e.g., function calls. The virtual device driver can also send
simulated processor-level events like interrupts into the virtual machine.
Virtual devices may also operate in a nonvirtualized environment. For
example a virtual network adapter is used with a virtual private network, while
a virtual disk device is used with iSCSI.
POINTS TO REMEMBER
1. I²C uses only two bidirectional open-drain lines, Serial Data Line (SDA)
and Serial Clock (SCL),pulled up with resistors. Typical voltages used
are +5 V or +3.3 V although systems with other voltages are permitted.
2. The SPI bus can operate with a single master device and with one or
more slave devices.
3. A Synchronous Serial Port (SSP) is a controller that supports the Serial
Peripheral Interface (SPI), 4-wire Synchronous Serial Interface (SSI),
and Microwire serial buses. A SSP uses a master-slave paradigm to
communicate across its connected bus.
Networks for Embedded Systems 157
7.9 QUIZ
1. Protocols may include which of the following
(a) Signaling (b) Authentication
(c) Error checking (d) All of these
2. A device driver simplifies programming by acting as translator between a
................. or operating systems that use it.
(a) hardware device and the applications
(b) Software and applications
(c) None of these
3. Virtual serial port emulation can be useful in case there is a lack of
available..................... ports or they do not meet the current requirements.
(a) Physical serial (b) Data
(c) Network (d) None of these
4. Some synchronous devices provide a ............ to synchronize data
transmission, especially at higher data rates.
(a) Data signal (b) Timer
(c) Clock signal (d) No signal
Learning Outcomes
R Basic Introduction about Microcontrollers
R Comparison of 8051
R Architectural Details with Block Diagram of 8051.
R Microcontroller Resources
• Bus width
• Program and data memory
• Parallel ports
• EEPROM and flash memory
• Pulse Width Modulated (PWM) output
• On-chip Digital to Analog Converter (DAC) using PWM or timer
• On-chip A/D convertors (ADC)
• Reset circuit
• Watchdog Timer (WDT) device
• Bit wise manipulation capability
• Power down mode
• Timers
• Real time clock
• Serial asynchronous and synchronous communication interface
• Asynchronous serial communication
• Synchronous serial communication
An Overview and Architectural Analysis of 8051 Microcontroller 159
• SFR registers
• Port registers SFR
• PSW Program Status Word
• Stack Pointer
• Data Pointer
• Accumulator
• B register
• Program Counter
• SFR Registers for the internal timer
• Power control register
• Serial port registers
• Interrupt registers
R Internal and External Memory
R Memory Organizations
R Timer or Counter
R Input and Output Ports
R Interrupts — An Insight
R Assembly Language Programming
R Recap
R Quiz
8.1 INTRODUCTION
Microcontrollers are designed for embedded applications, unlike microprocessors
which are used for general purpose computing. The architecture of
microcontroller is also much different from microprocessors. Microcontrollers
shortly called as UC’s or MCU’s architecture varies with the application it is
designed for. So, different microcontrollers have different architectures. Typically
its architecture is a combination of processor core, memory, programmable
I/O’s, programmable memory (generally flash memory) and small amount of
RAM. 8051 follows Harvard architecture. Harvard architecture is a standard
computer architecture in which program instructions are stored in different
memory locations from data. Each type of memory is accessed via a separate
bus, allowing instructions and data to be fetched in parallel and thus improving
the speed of execution.
160 Embedded Systems
The block diagram of 8051 looks like as shown below in Fig. 8.1. It contains a
Central Processing Unit (CPU), which acts as a control unit, which determines
the control flow. Based on programming it also manages to share the resources
effectively. CPU works on an oscillator, which determines the speed of the
microcontroller. Oscillators are generally formed of a crystal oscillator. The
original 8051 core runs at 12 clock cycles per machine cycle, with most instructions
executing in one or two machine cycles. With a 12 MHz clock frequency, the
8051 could thus execute 1 million one cycle instructions per second or 500,000
An Overview and Architectural Analysis of 8051 Microcontroller 161
two cycle instructions per second. Enhanced 8051 cores are now commonly
used which run at six, four, two, or even one clock per machine cycle, and have
clock frequencies of up to 100 MHz, and are thus capable of an even greater
number of instructions per second.
The 8051 architecture provides many functions (CPU, RAM, ROM, I/O, interrupt
logic, timer, etc.) in a single package. The complete architecture in detail is
presented in Fig. 8.2.
• 8-bit ALU, Accumulator and 8-bit Registers; hence it is an 8-bit
microcontroller
• 8-bit data bus—It can access 8 bits of data in one operation
• 16-bit address bus—It can access 216 memory locations—64 KB (65536
locations) each of RAM and ROM
• On-chip RAM—128 bytes (data memory)
• On-chip ROM—4 kBytes (program memory)
• Four byte bidirectional input/output port
• UART (serial port)
• Two 16-bit Counter/timers
• Two-level interrupt priority
• Power saving mode (on some derivatives)
162 Embedded Systems
Vcc
Port 0 Port 2
Vss Drivers Drivers
Program
Address
ACC Stack Register
Pointer
PC
PC
Interrupt, Serial Port,
ALU Incremented
Incremente
and Timer Blocks
Program
PSW Counter
PSE N#
Instruction
Register
ALE/PROGH Timing
and DPTR
EA#NPP Control
RST
Port 1 Port 3
Latch Latch
Port 1 Port 3
OSC. Drivers Drivers
XTAL1 XTAL2
P10 - P17 P3.0 - P3.7
R Interrupt, serial port, timer and control units perform specific functions
under the control of timing and control unit.
R Mapping means that output bits b0, b1,…, bn-1 are such that the decimal
values of these are proportional to signal ratio.
R The 8051-MCU provides a power down mode bit for serial communication.
Where bit transferred rate can slowed by half, so that power is saved
during communication form MCU.
(xii) Timers
R A timer counts the equal interval clock pulses from an oscillator circuit.
The pulses are used after a suitable and fixed or programmable pre-
scaling (division) factor.
R 8051 family MCU has two timers T0 and T1.
R 8052 variants in the family has additional timer T2 and 8096 family has
two programmable timers T1 and T2.
R T1 facilitates high speed inputs. It captures the time instances into a FIFO
and records up to eight events in quick succession.
R A timer/counter mode runs in a non-stop, reset, and load disabled state. It
also works in other mode called real time clock.
(xiii) Real Time Clock
R Real Time clock is an important resources in a microcontroller because
using this as an OS sets the system clock and schedules the task and time
delay functions.
R It is an on-chip device made from the timer working in non-reset, non-
loadable and non-stop mode.
R Real time clock is used because it never stops and cannot be reset.
R In 11T format, the next bit after data bits and before stop bit there is bit
for error checking or bit to indicate the meaning of the preceding 8 bits as
an addresse or data.
Carry flag. C
R This is a conventional carry, or borrow flag used in arithmetic operations.
The carry flag is also used as the ‘Boolean accumulator’ for Boolean
instruction operating at the bit level. This flag is sometimes referenced as
the CY flag.
Auxiliary carry flag. AC
R This is a conventional auxiliary carry flag (half carry) for use in BCD arithmetic.
Flag 0. F0
R This is a general purpose flag for user programming.
Register bank select 0 and register bank select 1. RS0 and RS1
R These bits define the active register bank (bank 0 is the default register
bank).
Overflow flag. OV
R This is a conventional overflow bit for signed arithmetic to determine if
the result of a signed arithmetic operation is out of range.
Even Parity flag. P
R The parity flag is the accumulator parity flag, set to a value, 1 or 0, such
that the number of ‘1’ bits in the accumulator plus the parity bit add up to
an even number.
confined the stack size and this is sometimes a limitation for 8051
programmes. The SP contains the address of the data byte currently on
the top of the stack. The SP pointer is initialized to a defined address. A
new data item is ‘pushed’ on to the stack using a PUSH instruction which
will cause the data item to be written to address SP + 1. Typical instructions,
which cause modification to the stack are: PUSH, POP, LCALL, RET,
RETI etc. The SP SFR, on start-up, is initialized to 07h so this means the
stack will start at 08h and expand upwards in Internal RAM. If register
banks 1 to 3 are to be used the SP SFR should be initialized to start higher
up in Internal RAM. The following instruction is often used to initialise the
stack:
• MOV SP, #2Fh
(xxii) Accumulator
R This is the conventional accumulator that one expects to find in any
computer, which is used to hold result of various arithmetic and logic
operations. Since the 8051 microcontroller is just an 8-bit device, the
accumulator is, as expected, an 8-bit register.
R The accumulator, referred to as ACC or A, is usually accessed explicitly
using instructions such as:
R INC A; Increment the accumulator
174 Embedded Systems
(xxiii) B Register
R The B register is an SFR register at addresses F0h which is bit-addressable.
The B register is used in two instructions only: i.e., MUL (multiply) and
DIV (divide). The B register can also be used as a general purpose register.
addressed at 8Ch and 8Dh respectively. These two registers are associated
with Timer 1.
FFFFh
External
8051 chip
DATA
Memory
Internal 0000h
Memory
Internal FFFFh
SFRs
External
Internal CODE
RAM 0000h Memory
ROM
SFR Registers
R The SFR registers are located within the Internal Memory in the address
range 80h to FFh, as shown in Fig. 8.5. Not all locations within this range
are defined. Each SFR has a very specific function. Each SFR has an
address (within the range 80h to FFh) and a name which reflects the
purpose of the SFR. Although 128 bytes of the SFR address space is
defined only 21 SFR registers are defined in the standard 8051. Undefined
SFR addresses should not be accessed as this might lead to some
unpredictable results. Note some of the SFR registers are bit addressable.
SFRs are accessed just like normal Internal RAM locations.
89s51
Osc C/T = 0 TL1 TH1
:12 C/T = 1
0 = open
P3.5/T1 G C/T M1 M0 G C/T M1 M0 1 = close
P3.3/INT1
= TIMER 0
= TIMER 1
M1 M0 Operating
0 0 8048 Timer, TLx serves as 5 bit prescaler
0 1 16 bit Timer/Counter THx and TLx are cascaded, there is no prescaler
8 bit auto re-load Timer/ Counter THx holds a value which is to be reloaded
1 0
into TLx each time it overflows
(Timer 0) TL0 is an 8 bit Timer/ Counter controlled by the standard timer 0
1 1 control
control bits (Timer 1) Timer/Counter 1 stopped bits
(Timer 1) Timer/ Counter 1 stopped
Timer/ Counter Control ( TCON ) Register
MSB LSB
TF1 TR1 TF0 TR0 IE1 IT1 IE0 IT0
Pin 9: RST A logic one on this pin disables the microcontroller and clears the
contents of most registers. In other words, the positive voltage on this pin resets
the microcontroller. By applying logic zero to this pin, the program starts execution
from the beginning.
Pins 10–17: Port 3 Similar to port 1, each of these pins can serve as general
input or output. Besides, all of them have alternative functions:
Pin 10: RXD Serial asynchronous communication input or Serial synchronous
communication output.
Pin 11: TXD Serial asynchronous communication output or Serial synchronous
communication clock output.
Pin 12: INT0 Interrupt 0 inputs.
Pin 13: INT1 Interrupt 1 input.
Pin 14: T0 Counter 0 clock input.
Pin 15: T1 Counter 1 clock input.
Pin 16: WR Write to external (additional) RAM.
Pin 17: RD Read from external RAM.
Pin 18, 19: X2, X1 Internal oscillator input and output. A quartz crystal which
specifies operating frequency is usually connected to these pins. Instead of it,
miniature ceramics resonators can also be used for frequency stability. Later
versions of microcontrollers operate at a frequency of 0 Hz up to over 50 Hz.
Pin 20: GND Ground.
Pins 21-28: Port 2 If there is no intention to use external memory then these
port pins are configured as general inputs/outputs. In case external memory is
used, the higher address byte, i.e., addresses A8–A15 will appear on this port.
Even though memory with capacity of 64Kb is not used, which means that not
all eight port bits are used for its addressing, the rest of them are not available as
inputs/outputs.
Pin 29: PSEN If external ROM is used for storing program then a logic zero
(0) appears on it every time the microcontroller reads a byte from memory.
Pin 30: ALE Prior to reading from external memory, the microcontroller puts
the lower address byte (A0–A7) on P0 and activates the ALE output. After
receiving signal from the ALE pin, the external register (usually 74HCT373 or
74HCT375 add-on chip) memorizes the state of P0 and uses it as a memory
chip address. Immediately after that, the ALU pin is returned its previous logic
state and P0 is now used as a Data Bus. As seen, port data multiplexing is
performed by means of only one additional (and cheap) integrated circuit. In
other words, this port is used for both data and address transmission.
An Overview and Architectural Analysis of 8051 Microcontroller 183
Pin 31: EA By applying logic zero to this pin, P2 and P3 are used for data and
address transmission with no regard to whether there is internal memory or not.
It means that even there is a program written to the microcontroller, it will not be
executed. Instead, the program written to external ROM will be executed. By
applying logic one to the EA pin, the microcontroller will use both memories,
first internal then external (if exists).
Pins 32–39: Port 0 Similar to P2, if external memory is not used, these pins can
be used as general inputs/outputs. Otherwise, P0 is configured as address output
(A0–A7) when the ALE pin is driven high (1) or as data output (Data Bus)
when the ALE pin is driven low (0).
Pin 40: VCC +5V power supply.
Pin configuration, i.e., whether it is to be configured as an input (1) or an output
(0), depends on its logic state. In order to configure a microcontroller pin as an
input, it is necessary to apply logic zero (0) to appropriate I/O port bit. In this
case, voltage level on appropriate pin will be 0.
Similarly, in order to configure a microcontroller pin as an input, it is necessary
to apply a logic one (1) to appropriate port. In this case, voltage level on appropriate
pin will be 5V (as is the case with any TTL input). This may seem confusing but
don’t lose your patience. It all becomes clear after studying simple electronic
circuits connected to an I/O pin.
via the serial port, or if some external event had occured. Besides making the
main program ugly and hard to read, such a situation would make our program
inefficient since we’d be burning precious “instruction cycles” checking for
events that usually don’t happen.
The 8051 provides five interrupt sources. These are listed below.
1. Timer 0 (TF0) and timer 1 (TF1) interrupt.
2. External hardware interrupts, INT0 and INT1.
3. Serial communication interrupts TI and RI.
8051 can be configured such that when Timer 0 overflows or when a character
is sent/received, the appropriate interrupt handler routines are called.
Obviously, we need to be able to distinguish between various interrupts and
executing different code depending on what interrupt was triggered. This is
accomplished by jumping to a fixed address when a given interrupt occurs.
This means that if a Serial Interrupt occurs at the exactly same instant that an
External 0 Interrupt occurs, the External 0 Interrupt will be executed first and
the Serial Interrupt will be executed once the External 0 Interrupt has completed.
The 8051 offers two levels of interrupt priority: high and low. By using interrupt
priorities you may assign higher priority to certain interrupt conditions.
For example, you may have enabled Timer 1 Interrupt which is automatically
called every time Timer 1 overflows. Additionally, you may have enabled the
Serial Interrupt which is called every time a character is received via the serial
port. However, you may consider that receiving a character is much more
important than the timer interrupt. In this case, if Timer 1 Interrupt is already
executing you may wish that the serial interrupt itself interrupts the Timer 1
Interrupt. When the serial interrupt is complete, control passes back to Timer 1
Interrupt and finally back to the main program. You may accomplish this by
assigning a high priority to the Serial Interrupt and a low priority to the Timer 1
Interrupt.
Interrupt priorities are controlled by the IP SFR (B8h). The IP SFR has the
following format:
0 IT0
ITO
INT0
INTO IE0
1
TF0
TFO
Interrupt
0 sources
IT1
INT1 IE1
1
TF1
T1
RI
ARITHMETIC INSTRUCTIONS
BRANCH INSTRUCTIONS
Logic Instructions
Logic instructions perform logic operations upon corresponding bits of two
registers. After execution, the result is stored in the first operand.
192 Embedded Systems
LOGIC INSTRUCTIONS
Bit-oriented Instructions
Similar to logic instructions, bit-oriented instructions perform logic operations.
The difference is that these are performed upon single bits.
An Overview and Architectural Analysis of 8051 Microcontroller 193
BIT-ORIENTED INSTRUCTIONS
The bytes and cycles mentioned in all the instruction sets are very important, as
these decide the amount of memory occupied and the speed of execution for a
particular set of instructions or in short a program.
Note
To know the beauty of microcontroller, one has to do the hands-on experiments
with it. Try some experiments like driving a dc motor clockwise and anticlockwise
(Doing this you are actually making an automated car control) based on interrupts,
to experience the real power of microcontrollers. Try writing more programs
which can trigger your knowledge.
POINTS TO REMEMBER
1. Microcontroller is different from microprocessor.
2. 8051 follows Harvard architecture and it has an 8-bit ALU.
3. Microcontrollers are meant for specific purpose, means a dedicated
operation.
4. 8-bit internal data bus width and 16-bit internal address bus is supported
by 8051.
194 Embedded Systems
5. Program counter is a register that will hold the address of the next
instruction to be executed. It is 16 bits wide.
6. A and B registers are called accumulators where A is one of the operands
for most of the arithmetic operations and will also accumulate the results
in it. B register is meant for Division and subtraction operation.
7. There are many Special function registers available for carrying out
specific operations.
8. Program Status Word (PSW) will have the flag status bits in it and one
can refer PSW to read flags.
9. 8051 has 8-bit stack pointer with initial default value defined by processor
is 0x07.
10. Registers for serial IOs, timers, ports and interrupt handlers are also
supported.
11. Two external interrupt pins, INT0 and INT1.
12. Four ports of 8-bits each in single chip mode.
13. Two timers are supported.
14. Certain versions of 8051 even have DMA (Direct Memory Access)
support.
15. Most of the versions support Watch dog timer feature also.
16. Following instruction sets are available for the programmer to use.
(i) Arithmetic Instructions
(ii) Branch Instructions
(iii) Data Transfer Instructions
(iv) Logic Instructions and
(v) Bit Oriented Instructions
Learning Outcomes
R Basic Introduction to Processors
R ARM Architecture
• Different versions
• ARM internal-core block diagram
• Instruction set
• Programming model and data types
• C Assignments in ARM—few examples
R SHARC Architecture
• Working principle
• Addressing modes
• C Assignments with examples
R ARM vs. SHARC
R Blackfin Processors
• Core features
• Memory and DMA
• Microcontroller features
• Peripherals
R Texas Instruments—DSP Processor
R Assembly Language Programming on Hardware Processors
R Recap
R Quiz
The first microprocessors emerged in the early 1970s and were used for electronic
calculators, using binary-coded decimal arithmetic on 4-bit words. Other
embedded uses of 4-bit and 8-bit microprocessors, such as terminals, printers,
various kinds of automation etc., followed soon after. Affordable 8-bit
microprocessors with 16-bit addressing also led to the first general purpose
microcomputers from the mid 1970s on.
Table 9.1 gives the different versions of ARM processor including the
specifications. This includes CPU, Description, ISA, Voltage, Clock speed and
MIPS details. The various properties of ARM architecture are detailed below:
• 32-bit RISC processor core (32-bit instructions)
• 37 pieces of 32-bit integer registers (16 available)
• Pipelined (ARM7: 3 stages)
• Cached (depending on the implementation)
• Von Neumann-type bus structure (ARM7), Harvard (ARM9)
• 8 / 16 / 32 -bit data types
• 7 modes of operation (usr, fiq, irq, svc, abt, sys, und)
Address Register
Address
Incrementer Scan Control
A
L Register Bank
U
(31 x 32 bit registers) nEXEC
B (6 status registers) DATA32
U
S BIGEND
PROG32
MCLX
nWAIT
Booth's nRW
A Multiplier nBW
B
Instruction nIRQ
B B Decoder nFIQ
U U
S & nRESET
S
Control ABORT
Logic nOPC
Barrel nTRANS
Shifter nMREQ
SEQ
LOCK
32 bit ALU nCPI
CPA
CPB
nM(4:0)
D&E nENOUT
DOUT(31.0) DATA(31,0)
train, chassis electronics systems in electric and fossil fuel powered vehicles
worldwide.
SHARC Processor
– On-chip memory (> 1Gbit) evenly split between Program Memory (PM)
and Data Memory (DM)
– Program memory can be used to store some data.
– Allows data to be fetched from both memory in parallel
Some interesting applications of ARM and SHARC processors include:
• ARM
– Compaq iPAQ
– Nintendo Gameboy
• SHARC
– Cellular phone
– Music synthesis
– Stereo Receivers
data and instruction caches, and instructions for bit test, byte, word, or
integer accesses and a variety of on-chip peripherals.
The ISA also features a high level of expressiveness, allowing the assembly
programmer (or compiler) to highly optimize an algorithm to the hardware features
present.
9.5.4 Peripherals
Blackfin processors contain a wide array of connectivity peripherals.
• USB 2.0 OTG (On-The-Go)
• ATAPI
• MXVR: A MOST (Media Oriented Systems Transport) Network Interface
Controller.
Advanced Architectures 207
obviously have the name “TMS320” and all processors with “C5” in the name
are code compatible and share the same basic features. Sometimes you will
even hear people talking about “C55x” and similar subgroupings, since processors
in the same series and same generation are even more similar.
The TMS320 series can be programmed using C, C++, and/or assembly
language. Most work on the TMS320 processors is done using Texas Instruments
proprietary tool chain and their integrated development environment Code
Composer Studio, which includes a mini operating system called DSP/BIOS.
Additionally, a department at the Chemnitz University of Technology has
developed preliminary support for the TMS320C6x series in the GNU Compiler
Collection.
In November 2007, TI released part of its tool chain as freeware for non-
commercial users, offering the bare compiler, assembler, optimizer and linker
under a proprietary license. However, neither the IDE nor a debugger were
included, so for debugging and JTAG access to the DSPs, users still need to
purchase the complete tool chain.
locations and other entities. The use of symbolic references is a key feature of
assemblers, saving tedious calculations and manual address updates after program
modifications. Most assemblers also include macro facilities for performing textual
substitution—e.g., to generate common short sequences of instructions as inline,
instead of called subroutines.
Assemblers are generally simpler to write than compilers for high-level
languages, and have been available since the 1950s. Modern assemblers,
especially for RISC architectures, such as SPARC or POWER, as well as x86
and x86–64, optimize Instruction scheduling to exploit the CPU pipeline efficiently.
Some examples are as below:
POINTS TO REMEMBER
1. A processor register (or general purpose register) is a small amount of
storage available on the CPU whose contents can be accessed more
quickly than storage available elsewhere. 8051 follows Harvard
architecture and it has an 8-bit ALU.
2. The ARM architecture is licensable. Companies that are current or
former ARM licensees include Alcatel-Lucent, Apple Inc., Atmel,
Broadcom, etc.
3. Off-chip memory can be used with the SHARC. This memory can only
be configured for one single size. If the off-chip memory is configured
as 32-bit words to avoid waste, then only the on-chip memory may be
used for code execution and extended floating-point.
4. The Blackfin architecture encompasses various different CPU models,
each targeting particular applications.
5. A popular choice for 2G Software defined cell phone radios, particularly
GSM, circa, late 1990s when many Nokia and Ericsson cell phones
made use of the C54x family of TI DSP processor.
6. DA25x is an ARM processor and a C55x core. It has some on-chip
peripherals like a USB slave controller and security features.
Documentation of this chip is only available after signing a Texas
Instruments NDA.
7. Assembly can sometimes be portable across different operating systems
on the same type of CPU.
Advanced Architectures 211
Learning Outcomes:
R Need for standards in coding
R Limitations associated with coding
R Existing standards for programming
R Summary
Coding conventions are only useful if they are in tune with the work your team
is currently doing. To ensure that they don’t become obsolete, they have to be
constantly updated and adapted to the latest technologies used by your team.
up with its own set of coding standards. The objective of this document is to
define coding standards and guidelines while coding for different technologies.
10.4.1 Modularization
A module is a collection of objects that are logically related. Those objects may
include constants, data types, variables, and program units (e.g., functions,
procedures, etc.). Note that objects in a module need not be physically related.
For example, it is quite possible to construct a module using several different
source files. Likewise, it is quite possible to have several different modules in
the same source file. However, the best modules are physically related as well
as logically related; that is, all the objects associated with a module exist in a
single source file (or directory, if the source file would be too large) and nothing
else is present.
Modules contain several different objects including constants, types,
variables, and program units (routines). Modules share many of the attributes
with routines; this is not surprising since routines are the major components of
a typical module. However, modules have some additional attributes of their
own. The following sections describe the attributes of a well-written module.
Module Attributes
A module is a generic term that describes a set of program related objects
(routines as well as data and types of objects) that are somehow coupled. Good
modules share many of the same attributes as good routines as well as the
ability to hide certain details from code outside the module. Good modules exhibit
strong cohesion. That is, a module should offer a (small) group of services that
are logically related. For example, a “printer” module might provide all the services
one would expect from a printer. The individual routines within the module would
provide the individual services.
Coding Guidelines 215
Good modules exhibit loose coupling. That is, there are only a few, well-
defined (visible) interfaces between the module and the outside world. Most
data is private, accessible only through accessing functions (see information
hiding below). Furthermore, the interface should be flexible. Good modules exhibit
information hiding. Code outside the module should only have access to the
module through a small set of public routines. All data should be private to that
module.
On a 32-bit machine:
typedef short int16;
typedef int int32;
Don’t redefine existing types. This may seem like a contradiction to the guideline
above, but it really isn’t. This statement says that if you have an existing type
216 Embedded Systems
that uses the name “integer” you should not create a new type named “integer”.
Doing so would only create confusion. Another programmer, reading your code,
may confuse the old “integer” type every time she/he sees a variable of type
integer. This applies to existing user types as well as predefined types. Since it
is possible to declare symbols at different points in a program, different
programmers have developed different conventions that concern the position of
their declarations. The two most popular conventions are the following:
– Declare all symbols at the beginning of the associated program unit
(function, procedure, etc.)
– Declare all variables as close as possible to their use.
Logically, the second scheme above would seem to be the best. However, it has
one major drawback – although names typically have only a single definition,
the program may use them in several different locations. But those who absolutely
desire to put their definitions as close to the for loop as possible can always do
something like the following:
// Previous statements in this code...
.
{
int i;
for (i=start; i <= end; ++k)
}
.
// Additional statements in this code.
10.4.3 Names
According to studies done, the use of high-quality identifiers in a program
contributes more to the readability of that program than any other single factor,
including high-quality comments. The quality of your identifiers can make or
break your program; program with high-quality identifiers can be very easy to
read, programs with poor quality identifiers will be very difficult to read. There
are very few “tricks” to developing high quality names; most of the rules are
nothing more than plain old fashion common sense. Unfortunately, programmers
(especially C/C++ programmers) have developed many arcane naming
conventions that ignore common sense. The biggest obstacle most programmers
have to learn how to create good names is an unwillingness to abandon existing
conventions. Yet their only difference when quizzed on why they adhere to
(existing) bad conventions seems to be “because that’s the way I’ve always
done it and that’s the way everybody else does it”.
Coding Guidelines 217
else
…
endif;
Correct
if (expression) then
If (expression) then
…
endif;
else
…
endif;
Now there is no question that the else belongs to the first if above, not the
second. Note that this form of “if” statement allows you to attach a list of
statements (between if and else or if and endif) rather than a single or compound
statement. Furthermore, it totally eliminates the religious argument concerning
where to put the braces or the begin...end pair on the if. The complete set of
modern programming language constructs includes:
– if...then...elseif...else...endif
– select...case...default...endselect (typical case/switch statement).
– while...endwhile
– repeat...until
– loop...endloop
– for...endfor
– break
– breakif
– continue
Loops
There are three general categories of looping constructs available in common
high-level languages—loops that test for termination at the beginning of the loop
(e.g., while), loops that test for loop termination at the bottom of the loop (e.g.,
repeat...until), and those that test for loop termination in the middle of the loop
(e.g., loop...endloop). It is possible simulate any one of these loops using any of
the others. This is particularly trivial with the loop...endloop construct:
/* Test for loop termination at beginning of LOOP...ENDLOOP */
loop
220 Embedded Systems
breakif (x==y);
.
.
.
endloop;
/* Test for loop termination in the middle of LOOP...ENDLOOP */
loop
.
.
.
breakif (x==y);
.
.
.
endloop;
/* Test for loop termination at the end of LOOP...ENDLOOP */
loop
.
.
.
breakif (x==y);
endloop;
Given the flexibility of the loop...endloop control structure, you might question
why one would even burden a compiler with the other loop statements. However,
using the appropriate looping structure makes a program far more readable,
therefore, you should never use one type of loop when the situation demands
another.
(Code Complete), research has shown that there is a strong correlation between
program indentation and comprehensibility. Miaria et. al (“Program Indentation
and Comprehension”) concluded that indentation in the two to four character
range was optimal even though many subjects felt that six-space indentation
looked better. These results are probably due to the fact that the eye has to
travel less distance to read indented code and therefore the reader’s eyes suffer
from less fatigue.
Steve McConnell, in Code Complete, mentions several objectives of good program
layout:
“The layout should accurately reflect the logical structure of the code.
Code Complete refers to this as the “Fundamental Theorem of Formatting”.
White space (blank lines and indentation) is the primary tool one can use to
show the logical structure of a program.
Consistently represent the logical structure of the code. Some common
formatting conventions (e.g., those used by many C/C++ programmers) are full
of inconsistencies. For example, why does the “{” go on the same line as an “if”
but below “int main()” (or any other function declaration)? A good style applies
consistently.
Improve readability. If the indentation scheme makes a program harder to
read, why waste time with it? As pointed out earlier, some schemes make the
program look pretty but, in fact, make it harder to read (see the example about
2–4 vs. 6 position indentation, above)”.
Withstand modifications. A good indentation scheme shouldn’t force a
programmer to modify several lines of code in order to affect a small change to
one line. For example, many programmers put a begin...end block (or “{“...”}”
block) after an if statement even if there is only one statement associated with
the if. This allows the programmer to easily add new statements to the then
clause of the if statement without having to add additional syntactical elements
later.
The principal tool for creating good layout is white space (or the lack thereof,
that is, grouping objects). The following paragraphs summarize McConnell’s
finding on the subject:
Grouping: Related statements should be grouped together. Statements that
logically belong together should contain no arbitrary interleaving white space
(blank lines or unnecessary indentation).
Blank lines: Blank lines should separate declarations from the start of code,
logically related statements from unrelated statements, and blocks of comments
from blocks of code.
222 Embedded Systems
Alignment: Align objects that belong together. Examples include type names in
a variable declaration section, assignment operators in a sequence of related
assignment statements, and columns of initialized data.
Indentation: Indenting statements inside block statements improves readability;
see the comments and rules earlier in this section.
In theory, a line of source code can be arbitrarily long. In practice, there
are several practical limitations on source code lines. Paramount is the amount
of text that will fit on a given terminal display device and what can be printed on
a typical sheet of paper. If this isn’t enough to suggest an 80 character limit on
source lines, McConnell suggests that longer lines are harder to read.
If a statement approaches the maximum limit of 80 characters, it should be
broken up at a reasonable point and split across two lines. If the line is a control
statement that involves a particularly long logical expression, the expression
should be broken up at a logical point (e.g., at the point of a low-precedence
operator outside any parentheses) and the remainder of the expression placed
underneath the first part of the expression. E.g.,
if
(
( ( x + y * z) < ( ComputeProfits(1980,1990) / 1.0775 ) ) &&
( ValueOfStock[ ThisYear ] >= ValueOfStock[ LastYear ] )
)
<< statements >>
endif;
Many statements (e.g., IF, WHILE, FOR, and function or procedure calls) contain
a keyword followed by a parenthesis. If the expression appearing between the
parentheses is too long to fit on one line, consider putting the opening and closing
parentheses in the same column as the first character of the start of the statement
and indenting the remaining expression elements. The example above
demonstrates this for the “IF” statement. The following examples demonstrate
this technique for other statements:
while
(
( NumberOfIterations < MaxCount ) &&
( i <= NumberOfIterations )
)
<< Statements to execute >>
endwhile;
Coding Guidelines 223
fprintf
(
stderr,
“Error in module %s at line #%d, encountered illegal value\n”,
ModuleName,
LineNumber
);
For block statements there should always be a blank line between the line
containing an if, elseif, else, endif ,while, endwhile, repeat, until, etc., and the
lines they enclose. This clearly differentiates statements within a block from a
possible continuation of the expression associated with the enclosing statement.
It also helps to clearly show the logical format of the code. Example:
if ( ( x = y ) and PassingValue( x, y ) ) then
Output( ‘This is done’ );
endif;
If a procedure, function, or other program unit has a particularly long actual or
formal parameter list, each parameter should be placed on a separate line. The
following (C/C++) examples demonstrate a function declaration and call using
this technique:
int
MyFunction
(
int NumberOfDataPoints,
float X1Root,
float X2Root,
float &YIntercept
);
x = MyFunction
(
GetNumberOfPoints(RootArray),
RootArray[ 0 ],
RootArray[ 1 ],
Solution
);
224 Embedded Systems
by asterisks as being hard to maintain. This is a poor example since modern text
editors will automatically “outline” the comments for you. Nevertheless, the
basic idea is sound.
Comment as you go along. If you put commenting off until the last moment, then
it seems like another task in the software development process and management
is likely to discourage the completion of the commenting task in hopes of meeting
new deadlines.
Avoid self-indulgent comments. Also, you should avoid sexist, profane, or other
insulting remarks in your comments. Always remember, someone else will
eventually read your code.
Avoid putting comments on the same physical line as the statement they describe.
Such comments are very hard to maintain since there is very little room.
McConnell suggests that endline comments are okay for variable declarations.
For some this might be true but many variable declarations may require
considerable explanation that simply won’t fit at the end of a line. One exception
to this rule is “maintenance notes”. Comments that refer to a defect tracking
entry in the defect database are okay (note that the CodeWright text editor
provides a much better solution for this — buttons that can bring up an external
file). Endline comments are also useful for marking the end of a control structure
(e.g., “end{if};”).
Write comments that describe blocks of statements rather than individual
statements. Comments covering single statements tend to discuss the mechanics
of that statement rather than discussing what the program is doing.
Focus paragraph comments on the way rather than the how code should explain,
what the program is doing and why the programmer chose to do it that way
rather than explain what each individual statement is doing.
Use comments to prepare the reader for what is to follow. Someone reading the
comments should be able to have a good idea of what the following code does
without actually looking at the code. Note that this rule also suggests that
comments should always precede the code to which they apply.
When you do need to restore to some tricky code, make sure you fully document
what you’ve done.
Avoid abbreviations. While there may be an argument for abbreviating identifiers
that appear in a program, no way does this apply to comments.
Keep comments close to the code they describe. The prologue to a program
unit should give its name, describe the parameters, and provide a short description
of the program. It should not go into details about the operation of the module
itself. Internal comments should do that.
226 Embedded Systems
** Lint is a code checking tool generally used when you post your application
from different development platform.
10.6 SUMMARY
The experience of many projects leads to the conclusion that using coding
standards makes the project goes smoother. It makes the code readable, easily
maintainable and makes it reusable. Since we can’t expect all the developers of
an application to remain in the project for its life time, the coding standards are
a must for future enhancement and releases. Knowledge transfer to a new
resource becomes an easy task with a more readable code. So the coding standard
plays an important role in software application development.
11 Embedded Systems—
Application, Design and
Coding Methodology
Learning Outcomes
R Embedded System Design
R Designers Perspective
R Requirements Specifications
R Implementation of the Proposed System
R Recap
R Quiz
Digital cameras can do things while film cameras cannot: displaying images on
a screen immediately after they are recorded, storing thousands of images on a
single small memory device, and deleting images to free storage space. The
majority, including most compact cameras, can record moving video with sound
as well as still photographs. Some can crop and stitch pictures and perform
other elementary image editing. Some have a GPS receiver built in, and can
produce Geotagged photographs.
Putting it all together:
• Captures images
• Stores images in digital format
– No film and Multiple images stored in camera
• Number depends on amount of memory and bits used per image
• Downloads images to PC
• Only recently possible
• Systems-on-a-chip with Multiple processors and memories on one IC
– High-capacity flash memory
• Very simple description used for example
– Many more features with real digital camera
• Variable size images, image deletion, digital stretching, zooming in and
out, etc.
Lens area
Covered columns
Electro-
mechanical
Pixel rows shutter
Electronic
circuitry
Pixel columns
When exposed to light, each cell becomes electrically charged. This charge can
then be converted to a 8-bit value where 0 represents no exposure while 255
represents very intense exposure of that cell to light. Some of the columns are
covered with a black strip of paint. The light intensity of these pixels is used for
zero-bias adjustments of all the cells.
The electromechanical shutter is activated to expose the cells to light for a brief
moment. The electronic circuitry, when commanded, discharges the cells,
activates the electromechanical shutter, and then reads the 8-bit charge value of
each cell. These values can be clocked out of the CCD by external logic through
a standard parallel bus interface.
• Manufacturing errors cause cells to measure slightly above or below actual
light intensity
• Error typically same across columns, but different across rows
• Some of left most columns blocked by black paint to detect zero-bias
error
– Reading of other than 0 in blocked cells is zero-bias error
– Each row is corrected by subtracting the average error found in blocked
cells for that row
Embedded Systems–Application, Design and Coding Methodology 231
Zero-bias
Covered cells adjustment
Reverses process to obtain original block (not needed for this design)
Consider the list of Embedded Systems that are being used every day.
• Achieve high compression ratio by reducing image quality
– Reduce bit precision of encoded data
• Fewer bits needed for encoding
• One way is to divide all values by a factor of 2
– Simple right shifts can do this
– Dequantization would reverse process for decompression
• Serialize 8 × 8 block of pixels
– Values are converted into single list using zigzag pattern
rowIndex = –1;
}
}
return pixel;
}
UART Module
• Actually a half UART
– Only transmits, does not receive
• UartInitialize is passed name of file to output to
• UartSend transmits (writes to output file) bytes at a time
#include <stdio.h>
static FILE *outputFileHandle;
void UartInitialize(const char *outputFileName) {
outputFileHandle = fopen(outputFileName, “w”);
}
void UartSend(char d) {
fprintf(outputFileHandle, “%i\n”, (int)d);
}
CODEC Module
• Models FDCT encoding
• ibuffer holds original 8 x 8 block
• obuffer holds encoded 8 x 8 block
• CodecPushPixel called 64 times to fill ibuffer with original block
• CodecDoFdct called once to transform 8 x 8 block
• CodecPopPixel called 64 times to retrieve encoded block from obuffer
int x, y;
for(x=0; x<8; x++) {
for(y=0; y<8; y++)
obuffer[x][y] = FDCT(x, y, ibuffer);
}
idx = 0;
}
short CodecPopPixel(void) {
short p;
if( idx == 64 ) idx = 0;
p = obuffer[idx / 8][idx % 8]; idx++;
return p;
}
• Implementing FDCT formula
C(h) = if (h == 0) then 1/sqrt(2) else 1.0
F(u,v) = ¼ x C(u) x C(v) Σx=0...7 Σy=0...7 Dxy x
cos(π(2u + 1)u/16) x cos(π(2y + 1)v/16)
• Only 64 possible inputs to COS, so table can be used to save performance
time
– Floating point values multiplied by 32,678 and rounded to nearest integer
– 32,678 chosen in order to store each value in 2 bytes of memory
– Fixed point representation explained more later
• FDCT unrolls inner loop of summation, implements outer summation as
two consecutive for loops
MAIN Module
• Main initializes all modules, then uses CNTRL module to capture,
compress, and transmit one image
• This system-level model can be used for extensive experimentation
– Bugs much easier to correct here rather than in later models
int main(int argc, char *argv[]) {
char *uartOutputFileName = argc > 1 ? argv[1] : “uart_out.txt”;
char *imageFileName = argc > 2 ? argv[2] : “image.txt”;
/* initialize the modules */
UartInitialize(uartOutputFileName);
238 Embedded Systems
CcdInitialize(imageFileName);
CcdppInitialize();
CodecInitialize();
CntrlInitialize();
/* simulate functionality */
CntrlCaptureImage();
CntrlCompressImage();
CntrlSendImage();
}
• Low-end processor could be Intel 8051 microcontroller
• Total IC cost including NRE about $5
• Well below 200 mW power
• Time-to-market about 3 months
• However, one image per second not possible
– 12 MHz, 12 cycles per instruction
• Executes one million instructions per second
– CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations
• ~100 assembly instructions each iteration
• 409,600 (4096 × 100) instructions per image
• Half of budget for reading image alone
– Would be over budget after adding compute-intensive DCT and Huffman
encoding
POINTS TO REMEMBER
1. Testing timing constraints is as important as testing functional behaviour
for an Embedded System.
2. Embedded Systems are in every “intelligent” device that is infiltrating
our daily lives: the cell phone in your pocket, and the entire wireless
infrastructure behind it; the Palm Pilot on your desk; the Internet router
your e-mails are channeled through; your big screen home theater
system; the air traffic control station as well as the delayed aircraft it is
monitoring! Software now makes upto 90 percent of the value of these
devices.
3. The computer you are using to read this page uses a microprocessor
to do its work. The microprocessor is the heart of any normal computer,
whether it is a desktop machine, a server or a laptop.
Review Questions
1. What is the difference between a digital camera and a mobile camera?
2. How does a digital camera differ from conventional cameras?
3. Define CPU Speed. Why are there limits on CPU speed?
4. What are the different compression methodologies available while
manufacturing a digital camera?
5. What is meant by pixel resolution? What are the leading companies that
manufacture digital camera today?
6. Which company made the world’s first OLED digital photo frame?
7. What is meant by colour filtering?
11.5 QUIZ
1. Digital images are made of tiny dots called:
(a) Cells (b) Electrolytes (c) Blotch (d) Pixels
240 Embedded Systems