0 ratings0% found this document useful (0 votes) 71 views16 pagesNumerical Analysis - Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Numerical Analysts
by Fatel oisa$
Chapter SaunBlare og. fib,
l NUMERICAL COMPUTATIONS
1.1 INTRODUCTION
In this text we are concemed with numerical methods used to solve the most common
mathematical problems that arise in the physical, biological, and social sciences, and
many other disciplines. The problem is stated int mathematical verms by using various
assumptions. The next step is to solve the stated mathematical problem. Unfortunately,
‘many practical problems do not have an analytical solution; consequently, we look for
an approximation or numerical solution. Also, an analytical solution may not be conven-
ient for numerical evaluation. Therefore, we look for methods that give an approximate
solution to our formulated problem. Because these methods work with numbers and
produce numbers, they are called numerical methods. Numerical analysis provides a
means of proposing and analyzing numerical methods for the study and solution of
mathematically stated problems. Tu facilitate computation, numerical methods are
programmed for execution on a computer. A poorly written computer program can
spoil a good numerical method both with inaccurate answers and by using excessive
computer time for computation. Therefore, it is very important to take into account
the programming aspects of a numerical method. A computer output must also be
analyzed for its correctness.
A complete and unambiguous set of directions to solve a mathematical problem
to the desired accuracy in a finite number of steps is called an algorithm. Thus a
numerical method can be considered an algorithm,
We imagine a program library containing subroutines, written by experts, for every
conceivable situation. In fact, there exists a large number of computer packages like
IMSL (international mathematical and statistical library), NAG (numerical algorithm
group), and more through which many subroutines are available on mainframe com-
puters. Also. IMSL and NAG have special subsets of their full libraries for micracom-
puters, For microcomputers. MATHCAD, MATLAB, and MATHEMATICA also pro-
vide programs. Other packages are being developed continually. While it is easy to
call these subroutines, there are many pitfalls i numerical computation. One should
be able to recognize symptoms of numerical ill health and diagnose a problem. It is
important to have a clear understanding of the numerical methods used by these sub-
routines,
In the next section, we develop some fundamental notions about digital computers
since they are the principal means for our calculations.2
CHapreR 1
NuMericat. COMPUTATIONS
1.2. NUMBER REPRESENTATION
As is well known, our usual number system is the decimal system. The number 796.85
is expressed as
796.85 = 7X 10° +9 X 10' +6 x 10 +8 x 101 +5 x 107
The number 1U 1s the base ot the decimal system.
Electrical impulses are either on or off and computers read pulses sent by their
electrical components. If “‘off"” state represents 0 and “‘on"’ state represents 1, then
‘computers can use a system that needs only 0 and 1 as digits to represent a real number.
This system is called the binary system and has base 2. Consider
(101.101):
IX B+Ox2+Ox2+1x 2
FUXIHOXIAEITXIP
i 1
=8+0+0+1+2+0+5
o+0 ztots
1 1_7
=O45+
a8
Further, consider
(MOIONIN); = 1X 24+ 1X 2+OXM+1LX M+ OX 2
FIX BHIX BH IX P+ IX MH LXD
= 512 + 256 +0 +64 +04 16+ 8 44424 1 = 863
In order to represent 863, we need 10 binary digits. This is a major drawback of the
Dimmay system, The vctal or hexauecinal number systei (with base system 8) presents
a compromise between the binary and decimal system when we discuss how numbers
are stored in the computer. IBM 3033 uses the base 16 and the numbers 10, 11, 12,
13, 14, and 15 are usually denoted by A, B,C, D, E, and F, respectively. Most computers
have an integer mode to represent integers and a floating-point mode to represent real
numbers within given limits The loating-point representation is closely connected to
scientific notation. Letting x be any real number, x can be represented in floating-point
form as
x = (sign Ndhdy... didies op X BE (12.1)
where the characters d, are digits in the base B system. In other words, 0 = d, =
B- 1 fori = 1, with d, # 0 and ¢ an integer. The number (dd)...
did,., ...) is called the mantissa and ¢ is called the exponent or characteristic of x.
Usually ¢ io restricted by
-Nse M then usually the result is meaningless and is called
overflow. If an exponent < —N, then usually the result is retumed as zero without
any warning message by many computers and is called underflow. Similarly an infinite
representation of a mantissa cannot be used and so the mantissa of x has to be terminated
at r digits. Let us denote this terminated number by fi(x). This termination is done in——
FXAMPLE 1.1
1.2 Number Representation 3
two ways. The first way is to delete the digits di, diay -.. to get the following
chopped representation.
AQ) = (sign ddr... dp X BE (1.2.3)
The second way is to add B/2 to d,,, and then chop off the resulting digits 8,1,
B,.2,.-. to get the rounded representation
11(x) = (sign (0,02... Bp X BY (1.2.4)
where 8, may or may not be d and e, may or may not be ¢.
‘The number 13/6 has an infinite decimal representation given by 13/6 = 2.166666... =
0.216666 ... X 10', Letting 1 = 5, the chopping representation of 13/6 is
o(22) = aio x 1 = 2108
For the rounding representation. add 5 to the sixth digit. 6 + 5. and then chop off the
digits after the fifth digit. Thus
‘The error that results from replacing a number by either its chopped representation
or rounded representation is called round-off error.
In Table 1.2.1 the floating-point characteristics are given for commonly used digital
computers (Atkinson 1989) for single precision,
B N M Br
2 975 0707.11 x 10"?
2 48 1638481917. x 10"
Hewlett Packard HP-45. 11C.1SC 10,10 98 100 1,00 x 10°
1BM3033 166 64 639.54 x 1077
DEC VAX 117780224 128 1271.19 x 1077
PDP-11 oe 128 7 119 x 107
Prime #S0 223, 128 1272.38 x 107
The question of rounding or chopping in Table 1.2.1 often depends on the installation
or the compiler
Let us denote the set of all numbers represented by Equation (1.2.3) or Equation
(1.2.4) and zero by
F = FB. M)
The real numbers, program instructions, integers, alphabetic symbols, and so on,
are stored in words in digital computers. These words have a fixed number of digits.4
CHAPTER 1
Numericat COMPUTATIONS
‘The number of digits (bits) in a word is called the word length. For scientific calculations
4 long length is desirable; a short length, on the other hand, is significantly less
expensive and perhaps more useful for business calculations and data processing. Word
lengths range from 12 bits to 60 bits. In some computers a longer word is broken into
smaller pieces called bytes (each consisting of 8 bits) for ease in handling.
Consider a hypothetical computer that uses 32 bits in a word. Of the 32 bits, the
first bit is used to hold the sign of the number. 0) for + and 1 far —. The remaining
31 bits hold a binary number Oto 1111110111 H1TTL10110 1UL1D11111 1. This applies
only to integers. For a floating-point number, the first bit holds the sign, the next 7
bits hold the exponent (including one for the sign of the exponent), and the last 24
bits hold the mantissa,
Sign f+ | EXPONENT
Consider for example
DP AL11OOL LULLED attain nay
‘The first bit indicates that the number is negative. The next bit indicates that the
exponent is negative. The next 6 bits, 111001, are equivalent to
IX B+ ix Bei x Prox Ps ox tx 2
324+ 16+8+1
37
The last 24 bits indicate that the mantissa is
PX 2H LK Hee EEK DM
2 1
(: - })
= 1 ~ (0.596046) x 10-7
~ 0,999999 to seven places
The closest 7 digit decimal number is ~ 0.6938893 x 10°" for —(1 — (4)*) x 2°".
Thus the machine number is used to represent any real number in the interval
(= 069388935 x 10°", —0,69388925 x 10°"),
Since we are representing real numbers with approximate real numbers, our interest
is to find the maximum error involved. Let «be an approximation of « in the devimal
system for the following cases:
() r=21666 0) r= 00004 @ x
at = 2.1667 x* = 0.0003 ”
|v — x*] = 0.0001 |x — x] = 0.0001 [x - +]
In all cases, the absolute error |x — x*| is 10°*. In case (1) x* is a good approximation
for x, while in case (2) x is so small that |x — x*| represents a significant change. In
case (3) x* seems to be an excellent approximation. Thus it is clear that the ratio of
|x — x#] to [x] is important.—
EXAMPLE 1.2.2
1.2. Number Representation 5
Let x* be an approximation to x. The relative error in.x* is given by
X—*| provided x # 0
In case (1) the relative error |(x — x*)éx| = (2.1666 — 2.1667)/2.1666] = 0.00005;
in case (2) the relative error |(x — .°*)/x| = |(0.0004 — 0.0003)/0.0004| = 0.25; and
in case (3) the relative error |(x — x*)/x| = {(10000.0001 — 10000)/10000.0001|
10-*
In case (2) the relative error indicates that x* is a poor approximation to x.
ase
Since we do not know the true real number in a practical situation, we do not
know what the error io. We will be happy with come bounds on the error. Let us find
the relative error when we chop or round a given real number x in our decimal system.
Let x be represented by
x = (sign dds... didyey ...) X 10° (1.2.5)
We approximate x by simply chopping off the digits d,,1, dj.2, -.., to get the
chopped representation
fw) (sign x\(didy ...d,) X 10° (1.2.6)
Thus the relative chopping error in x* is given by
«00 idiezee)
(ddr didi).
beidyers) X10
(did... dda.
Since 1 = d, = 9, the minimum value of the denominator is 0.1 and .d,.id,
< 1. the relative chopping error 1y
= 107 jo (1.2.7)
For rounding, if d... < 5. then d..d.s ‘Therefore,
atl dds
4 do sdss- DX 10" 21 ors
t (dds. ddan) 2
WS 10°
The result 0.23331 x 10° is stored.
Multiplication is simple because the exponents do not have to be aligned. Since
10" x 10° x 0.24689 x 0.13579
0,335251931 x 10*
and fl(xy) = 0.33525 x 10% the relative error l(xy — fi(xy))/lxy)] = 0.57598. x 10%
Division is not allowed when y = 0. If the mantissa of the numerator is greater
than the mantissa of the denominator, we shift the mantissa of the numerator one place
to the right. Thus
xy
0.24689
0.13579
4, 00246890000
0.1357900000,
10 x
10° 0.1918175000 ». 10°
therefore fi(x/y) = 0.18181 x 10° and the relative error |((x/y) — A(a/y)/Ca/y)| =
0.41250 x 10——
AMPLE 1.3.1
Numericat COMPUTATIONS
Normally f(x) # x. Using Equations (1.2.9) and (1.2.10).
Lo
x
where —B'~! < € 0 if chopped and —-
can be expressed in a more useful form:
AG) = x1 + 6) (1.3.2)
Denote machine addition, subtraction, multiplication, and division by the symbols
®, ©, ®, and © respectively. For any floating-point numbers x and y, we have from
Equation (1.3.2)
(1.3.1)
''S € S38!” if rounded. Equation (1.3.1)
fe + y) = x@y = (x + y+ @)
fix ~ y) = xOy = (x - (1 +)
fey) = x@y — (1 1)
and
i()-0
wnere €:, €, €, and €, may be different. It can be seen from Equations (1.3.2) and
(1.3.3) that
Za+e) (133)
y
fla + (y +2) =1®0VO2 = xGly + 2) +6)
= + [ly + 21 + eg + 9)
x1 + &) + (y + 21 + el ted
Ma +) +2) =GONOr= + Wl +e@z
= [@ + yXI + &) + 2] + @)
= x + yl + GML + &) + 2(1 + @)
Hence, often
*OVVOD#*KOYNOz
In other words, the associative law breaks down, Similarly (Exercise 11) the distriutive
law often fails:
1O(VOD*ONOH@2
Mlustrate that the associative law breaks dows, Let x = 0.52867 X 10',y = U.S8254 X
10°, and z = 0.25678 x 10!. Find x @ (y @ z) and (x @ y) @z.
Since y and are first added, the exponent of z is adjusted so that the exponents,
of y and = are the same. Hence
0.38234 00000 x 10°
0.02367 S0OW) X 10°
y + = = 0.40801 80000 x 10?
and
Aly + 0.40801 x 10°1.3 Floating-Point Arithmetic 9
Now to add (y ® z) to x, the exponents of (y ® z) and x are adjusted. Therefore,
x = 0.52867 00000 x 10
y®z = 0.00408 01000 x 10"
x+(y@z2 1.53275 01000 x 10°
and
+@(y@2) = 0.53275 x 10" (13.4)
One can verify that
x@y = 0.53249 00000 x 104
and
(«®y) @z = 0.53274 x 10" (135)
Comparing Equations (1.3.4) and (1.3.5),4@ VO) 4U@OyN Oz. aon
One must not implicitly assume the validity of the aavociative law. Although the
associative law for addition is not valid in floating-point arithmetic, it is comforting
to know that the commutative law x @ y = y @ x still holds and should be valuable
in our programming,
Another important source of error is the subtraction of a number from a nearly
equal number. Consider x = V'457 ~ 0.213755 x 10?and y = V456 ~ 0.2135415 x
10°, Subtract y from x on the five-digit machine. First x and y would be stored as
fix) = 0.21377 x 10° and fi(y) = 0.21354 x 10°, Since our hypothetical machine.
has double length register,
f(x) = 0.21377 00000 x
f(y) = 0.71384 ONO ¥ 102
A(x} = fC¥) = 0.00023 00000 x 10?
and
ACA) ~ ACY)
f(xy © AC») = 0.23000 x 107!
.02300
‘The last three zeros at the end of the mantissa arc of no use. Since the exact value
of x — y ~ 0.000234 x 10° = 0.0234, we have the relative error
= MA AC
r-y
0.17170 x 10°! (1.3.6)
Thie relative error is quite large when compared to the relative enwuis vf M(4) aun MC).
How can a more accurate result be obtained? Sometimes the problem can be reformu-
lated to avoid the subtraction. In this example,
_ (W957 = V456)-V456 + V457)
Vas7 + Va56
= SS =O (A + A
Visi + Vase
= 1 (0.42731 x 107) = 0.23402 x 10-! = 0.023402
Va57 - V456
‘The relative error 0.72643 x 10°‘ is very small compared to Equation (1.3.6).