0% found this document useful (0 votes)

87 views

FloatingPoint Handout

This document provides an introduction to floating-point arithmetic and computation. It discusses the importance of understanding floating-point arithmetic for scientific computing. It outlines key topics like standards, properties of floating-point numbers, error analysis, and algorithms. The document also reviews the IEEE 754 standard which specifies numeric formats and arithmetic operations for floating-point calculations.

Uploaded by

catherine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views

FloatingPoint Handout

Uploaded by

catherine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 122

An Introduction to

Floating-Point Arithmetic and Computation

Jeff Arnold

CERN openlab
9 May 2017

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 1
Agenda

• Introduction
• Standards
• Properties
• Error-Free Transformations
• Summation Techniques
• Dot Products
• Polynomial Evaluation
• Value Safety
• Pitfalls and Gremlins
• Tools
• References and Bibliography

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 2
Why is Floating-Point Arithmetic Important?

• It is ubiquitous in scientific computing

• Most research in HEP can’t be done without it
• Algorithms are needed which
• Get the best answers
• Get the best answers all the time
• “Best”means the right answer for the situation and context
• There is always a compromise between fast and accurate

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 3
Important to Teach About Floating-Point Arithmetic

• A rigorous approach to floating-point arithmetic is seldom

taught in programming courses
• Not enough physicists/programmers study numerical analysis
• Many physicists/programmers think floating-point arithmetic
is
• inaccurate and ill-defined
• filled with unpredictable behaviors and random errors
• mysterious
• Physicists/programmers need to be able to develop correct,
accurate and robust algorithms
• they need to be able to write good code to implement those
algorithms

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 4
Reasoning about Floating-Point Arithmetic

Reasoning about floating-point arithmetic is important because

• One can prove algorithms are correct without exhaustive
evaluation
• One can determine when they fail
• One can prove algorithms are portable
• One can estimate the errors in calculations
• Hardware changes have made floating-point calculations
appear to be less deterministic
• SIMD instructions
• hardware threading
Accurate knowledge about these factors increases confidence in
floating-point computations

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 5
Classification of real numbers

In mathematics, the set of real numbers R consists of

• rational numbers Q {p/q : p, q ∈ Z, q 6= 0}
• integers Z {p : |p| ∈ W}
• whole W {p : p ∈ N ∪ 0}
• natural N {p : p ∈ {1, 2, ...}}
• irrational numbers {x : x ∈ R x ∈
/ Q}
• algebraic numbers A
• transcendental numbers

Dyadic rationals: ratio of an integer and 2b where b is a whole

number

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 6
Some Properties of Floating-Point Numbers

Floating-point numbers do not behave as do the real numbers

encountered in mathematics.
While all floating-point numbers are rational numbers
• The set of floating-point numbers does not form a field under
the usual set of arithmetic operations
• Some common rules of arithmetic are not always valid when
applied to floating-point operations
• There are only a finite number of floating-point numbers

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 7
Floating-Point Numbers are Rational Numbers

What does this imply?

• Since there are only a finite number of floating-point
numbers, there are rational numbers which are not
floating-point numbers
• The decimal equivalent of any finite floating-point value
contains a finite number of non-zero digits
√
• The values of transcendentals such as π, e and 2 cannot be
represented exactly by a floating-point value regardless of
format or precision

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 8
How Many Floating-Point Numbers Are There?

• ∼ 2p+1 (2emax + 1)
• Single-precision: ∼ 4.3 × 109
• Double-precision: ∼ 1.8 × 1019
• Number of protons circulating in LHC: ∼ 6.7 × 1014

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 9
Standards

There have been three major standards affecting floating-point

arithmetic:
• IEEE 754-1985 Standard for Binary Floating-Point Arithmetic
• IEEE 854-1987 Standard for Radix-Independent
Floating-Point Arithmetic
• IEEE 754-2008 Standard for Floating-Point Arithmetic
• This is the current standard
• It is also an ISO standard (ISO/IEC/IEEE 60559:2011)

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 10
IEEE 754-2008

• Merged IEEE 754-1985 and IEEE 854-1987

• Tried not to invalidate hardware which conformed to IEEE
754-1985
• Standardized larger formats
• For example, quad-precision format
• Standardized new instructions
• For example, fused multiply-add (FMA)

From now on, we will only talk about IEEE 754-2008

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 11
Operations Specified by IEEE 754-2008

All these operations must return the correct finite-precision result

using the current rounding mode
• Addition
• Subtraction
• Multiplication
• Division
• Remainder
• Fused multiply add (FMA)
• Square root
• Comparison

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 12
Other Operations Specified by IEEE 754-2008

• Conversions between different floating-point formats

• Conversions between floating-point and integer formats
• Conversion to integer must be correctly rounded
• Conversion between floating-point formats and external
representations as character sequences
• Conversions must be monotonic
• Under some conditions, binary → decimal → binary
conversions must be exact (“round-trip”conversions)

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 13
Special Values

• Zero
• zero is signed
• Infinity
• infinity is signed
• Subnormals
• NaN (Not a Number)
• Quiet NaN
• Signaling NaN
• NaNs do not have a sign

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 14
Rounding Modes in IEEE 754-2008

The result must be the infinity-precise result rounded to the

desired floating-point format.
Possible rounding modes are
• Round to nearest
• round to nearest even
• in the case of ties, select the result with a significand which is
even
• required for binary and decimal
• the default rounding mode for binary
• round to nearest away
• required only for decimal
• round toward 0
• round toward +∞
• round toward −∞

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 15
Exceptions Specified by IEEE 754-2008

• Underflow
• Absolute value of a non-zero result is less than the smallest
non-zero finite floating-point number
• Result is 0
• Overflow
• Absolute value of a result is greater than the largest finite
floating-point number
• Result is ±∞
• Division by Zero
• x/y where x is finite and non-zero and y = 0
• Inexact
• The result, after rounding, is different than the
infinitely-precise result

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 16
Exceptions Specified by IEEE 754-2008

• Invalid
• An
√ operand is a NaN
• x where x < 0
√
• however, −0 = −0
• (±∞) ± (±∞)
• (±0) × (±∞)
• (±0)/(±0)
• (±∞)/(±∞)
• some floating-point→integer or decimal conversions

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 17
Formats Specified in IEEE 754-2008

Formats
• Basic Formats:
• Binary with sizes of 32, 64 and 128 bits
• Decimal with sizes of 64 and 128 bits
• Other formats:
• Binary with a size of 16 bits
• Decimal with a size of 32 bits

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 18
Transcendental and Algebraic Functions

The standard recommends the following functions be correctly

rounded:
• ex , ex − 1, 2x , 2x − 1, 10x , 10x − 1
• logα (Φ) for α = e, 2, 10 and Φ = x, 1 + x
p √
• x2 + y 2 , 1/ x, (1 + x)n , xn , x1/n
• sin(x), cos(x), tan(x), sinh(x), cosh(x), tanh(x) and their
inverse functions
• sin(πx), cos(πx)
• And more . . .

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 19
We’re Not Going to Consider Everything...

The rest of this talk will be limited to the following aspects of

IEEE 754-2008:
• Binary32, Binary64 and Binary128 formats
• The radix in these cases is always 2: β = 2
• This includes the formats handled by the SSE and AVX
instruction sets on the x86 architecture
• We will not consider any aspects of decimal arithmetic or the
decimal formats
• We will not consider “double extended”format
• Also known as the “IA32 x87”format
• The rounding mode is assumed to be round-to-nearest-even

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 20
Storage Format of a Binary Floating-Point Number
p − 1 bits
w bits - -

s E significand

IEEE Name Format Size w p emin emax

Binary32 Single 32 8 24 -126 +127
Binary64 Double 64 11 53 -1022 +1023
Binary128 Quad 128 15 113 -16382 +16383
Notes:
• E = e − emin + 1
• emax = −emin + 1
• p − 1 will be addressed later

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 21
The Value of a Floating-Point Number

The format of a floating-point number is determined by the

quantities:
• radix β
• sometimes called the “base”
• sign s ∈ {0, 1}
• exponent e
• an integer such that emin ≤ e ≤ emax
• precision p
• the number of “digits”in the number

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 22
The Value of a Floating-Point Number

The value of a floating-point number is determined by

• the format of the number
• the digits in the number: xi , 0 ≤ i < p, where 0 ≤ xi < β.
The value of a floating-point number can be expressed as
p−1
X
x = (−)s β e xi β −i
i=0

where the significand is

p−1
X
m= xi β −i
i=0

with
0≤m<β
2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 23
The Value of a Floating-Point Number

The value of a floating-point number can also be written

p−1
X
s e−p+1
x = (−) β xi β p−i−1
i=0

where the integral significand is

p−1
X
M= xi β p−i−1
i=0

and M is an integer such that

0 ≤ M < βp

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 24
The Value of a Floating-Point Number

The value of a floating-point number can also be written as


 (−)s M
if e − p + 1 < 0
x= β −(e−p+1)
 (−)s β e−p+1 M if e − p + 1 ≥ 0

where M is the integral significand.

This demonstrates explicitly that a floating-point number is a

rational dyadic number.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 25
Requiring Uniqueness

p−1
X
x = (−)s β e xi β −i
i=0

To make the combination of e and {xi } unique, x0 must be

non-zero if possible.
Otherwise, using binary radix (β = 2), 0.5 could be written as
• 2−1 × 1 · 20 (e = −1, x0 = 1)
• 20 × 1 · 2−1 (e = 0, x0 = 0, x1 = 1)
• 21 × 1 · 2−2 (e = 1, x0 = x1 = 0, x2 = 1)
• ...

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 26
Requiring Uniqueness

This requirement to make x0 6= 0 if possible has the effect of

minimizing the exponent in the representation of the number.

However, the exponent is constrained to be in the range

emin ≤ e ≤ emax .

Thus, if minimizing the exponent would result in e < emin , then x0

must be 0.

A non-zero floating-point number with x0 = 0 is called a

subnormal number. The term “denormal”is also used.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 27
Subnormal Floating-Point Numbers

p−1
X
m= xi β −i
i=0

• If m = 0, then x0 = x1 = · · · = xp−1 = 0 and the value of the

number is ±0
• If m 6= 0 and x0 6= 0, the number is a normal number with
1≤m<β
• If m 6= 0 but x0 = 0, the number is subnormal with
0<m<1
• The exponent of the value is emin

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 28
Why have Subnormal Floating-Point Numbers?

• Subnormals allow for “gradual”rather than “abrupt”underflow

• With subnormals, a = b ⇔ a − b = 0

However, processing of subnormals can be difficult to implement in

hardware
• Software intervention may be required
• May impact performance

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 29
Why p − 1?

• For normal numbers, x0 is always 1

• For subnormal numbers and zero, x0 is always 0
• There are many more normal numbers than subnormal
numbers
An efficient storage format:
• Don’t store x0 in memory; assume it is 1
• Use a special exponent value to signal a subnormal or zero;
e = emin − 1 seems useful
• thus E = 0 for both a value of 0 and for subnormals

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 30
A Walk Through the Doubles

0x0000000000000000 plus 0
0x0000000000000001 smallest subnormal
...
0x000fffffffffffff largest subnormal
0x0010000000000000 smallest normal
...
0x001fffffffffffff
0x0020000000000000 2× smallest normal
...
0x7fefffffffffffff largest normal
0x7ff0000000000000 +∞
2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 31
A Walk Through the Doubles

0x7fefffffffffffff largest normal

0x7ff0000000000000 +∞
0x7ff0000000000001 NaN
...
0x7fffffffffffffff NaN
0x8000000000000000 −0

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 32
A Walk Through the Doubles

0x8000000000000000 minus 0
0x8000000000000001 smallest -subnormal
...
0x800fffffffffffff largest -subnormal
0x8010000000000000 smallest -normal
...
0x801fffffffffffff
...
0xffefffffffffffff largest -normal
0xfff0000000000000 −∞

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 33
A Walk Through the Doubles

0xffefffffffffffff largest -normal

0xfff0000000000000 −∞
0xfff0000000000001 NaN
...
0xffffffffffffffff NaN
0x0000000000000000 Back to the beginning!

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 34
2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 35
Notation

• Floating-point operations are written

• ⊕ for addition
• for subtraction
• ⊗ for multiplication
• for division
• a ⊕ b represents the floating-point addition of a and b
• a and b are floating-point numbers
• the result is a floating-point number
• in general, a ⊕ b 6= a + b
• similarly for , ⊗ and
• f l(x) denotes the result of a floating-point operation using
the current rounding mode
• E.g., f l(a + b) = a ⊕ b

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 36
Some Inconvenient Properties of Floating-Point Numbers

Let a, b and c be floating-point numbers. Then

• a + b may not be a floating-point number
• a + b may not always equal a ⊕ b
• Similarly for the operations −, × and /
• Recall that floating-point numbers do not form a field
• (a ⊕ b) ⊕ c may not be equal to a ⊕ (b ⊕ c)
• Similarly for the operations , ⊗ and
• a ⊗ (b ⊕ c) may not be equal to (a ⊗ b) ⊕ (a ⊕ c)
• (1 a) ⊗ a may not be equal to a

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 37
The Fused Multiply-Add Instruction (FMA)

• Computes (a × b) + c in a single instruction

• There is only one rounding
• There are two roundings with sequential multiply and add
instructions
• May allow for faster and more accurate calculation of
• matrix multiplication
• dot product
• polynomial evaluation
• Standardized in IEEE 754-2008
• Execution time similar to an add or multiply but latency is
greater.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 38
The Fused Multiply-Add Instruction (FMA)

However... Use of FMA may change floating-point results

• f l(a × b + c) is not always the same as (a ⊗ b) ⊕ c

• The compiler may be allowed to evaluate an expression as
though it were a single operation
• Consider
double a , b , c ;
c = a >= b ? std :: sqrt ( a * a - b * b ) : 0;
There are values of a and b for which the computed value of
a * a - b * b is negative even though a>=b

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 39
The Fused Multiply-Add Instruction (FMA)

Consider the following example:

double x = 0 x1 .3333333333333 p +0;
double x1 = x * x ;
double x2 = fma (x ,x ,0);
double x3 - fma (x , x , -x * x ));

x = 0 x1 .3333333333333 p +0;
x1 = x * x = 0 x1 .70 a3d70a3d70ap +0
x2 = fma (x ,x ,0) = 0 x1 .70 a3d70a3d70ap +0
x3 = fma (x ,x , - x * x ) = -0 x1 . eb851eb851eb8p -55
x3 is the difference between the exact value of x*x and its value
converted to double precision. The relative error is ≈ 0.24 ulp

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 40
The Fused Multiply-Add Instruction (FMA)

Floating-point contractions
• Evaluate an expression as though it were a single operation
double a , b, c, d;
// Single expression ; maybe replaced
// by a = FMA (b , c , d )
a = b * c + d;

• Combine multiple expression into a single operation

double a , b , c , d ;
// Multiple expressions ; maybe replaced
// by a = FMA (b , c , d )
a = b ; a *= c ; a += d ;

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 41
The Fused Multiply-Add Instruction (FMA)

Contractions are controlled by compiler switch(es) and #pragmas

• -fpp-contract=on|off|fast
• #pragma STDC FP CONTRACT ON|OFF

IMPORTANT: Understand how your particular compiler

implements these features
• gcc behavor has changed over time and may change in the
future
• clang behaves differently than gcc

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 42
Forward and Backward Errors

The problem we wish to solve is

f (x) → y

but the problem we are actually solving is

f (x̂) → ŷ

Our hope is that

x̂ = x + ∆x ≈ x
and
f (x̂) = f (x + ∆x) = ŷ ≈ y = f (x)

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 43
Forward and Backward Errors

For example, if
f (x) = sin(x) and x = π
then
y=0
However, if
x̂ = M PI
then
x̂ 6= x and f (x̂) 6= f (x)
Note we are assuming that if x̂ ≡ x then std::sin(^
x) ≡ sin(x)

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 44
Forward and Backward Errors

Absolute forward error: |ŷ − y| = |∆y|

|ŷ − y| |∆y|
Relative forward error: =
|y| |y|
This requires knowing the exact value of y and that y 6= 0

Absolute backward error: |x̂ − x| = |∆x|

|x̂ − x| |∆x|
Relative backward error: =
|x| |x|
This requires knowing the exact value of x and that x 6= 0

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 45
Forward and Backward Errors

By J.G. Nagy, Emory University. From Brief Notes on Conditioning, Stability and Finite Precision Arithmetic

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 46
Condition Number

• Well conditioned: small ∆x produces small ∆y

• Ill conditioned: small ∆x produces large ∆y

relative change in y
condition number =
relative change in x

∆y
y
= ∆x

x0
xf (x)
≈
f (x)

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 47
Condition Number

• ln x for x ≈ 1
0
xf (x) 1
Condition number ≈ = →∞
f (x) ln x

• sin x for x ≈ π
x
Condition number ≈ →∞

sin x

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 48
Error Measures

ulp: ulp(x) is the place

Pvalue of−i
the least bit of the significand of x
If x 6= 0 and |x| = β e p−1 x
i=0 i β , then ulp(x) = β e−p+1

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 49
IEEE 754 and ulps

IEEE 754 requires that all results be correctly rounded from the
infinitely-precise result.
If x is the infinitely-precise result and x̂ is the
“round-to-even”result, then

|x − x̂| ≤ 0.5ulp(x̂)

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 50
Approximation Error

const double a = 0.1;

const double b = 0.01;

• Both 0.1 and 0.01 are rational numbers but neither is a

floating-point number
• The value of a is greater than 0.1 by ∼ 5.6 × 10−18 or ∼ 0.4
ulps
• The value of b is greater than 0.01 by ∼ 2.1 × 10−19 or ∼ 0.1
ulps

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 51
Approximation Error

const double a = 0.1;

const double b = 0.01;
double c = a * a ;

• c is greater than b by 1 ulp or ∼ 1.7 × 10−18

• c is greater than 0.01 by ∼ 1.9 × 10−18 > 1 ulp

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 52
Approximating π

# include < cmath >

const float a = M_PI ;
const double b = M_PI ;

• The value of a is greater than π by ∼ 8.7 × 10−8

• The value of b is less than π by ∼ 1.2 × 10−16

This explains why sin(M PI) is not zero: the argument is not
exactly π

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 53
Associativity

const double a = +1.0 E +300;

const double b = -1.0 E +300;
const double c = 1.0;
double x = ( a + b ) + c ; // x is 1.0
double y = a + ( b + c ); // y is 0.0

• The order of operations matters!

• The compiler and the compilation options used matter as well
• Some compilation options allow the compiler to re-arrange
expressions
• Some compilers re-arrange expressions by default

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 54
Distributivity

const double a = 10.0/3.0;

const double b = 0.1;
const double c = 0.2;
double x = a * ( b + c );
// x is 0 x1 .0000000000001 p +0
double y = ( a * b ) + ( a * c );
// y is 0 x1 .0000000000000 p +0

• Again, the order of operations, the compiler and the

compilation options used all matter

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 55
The “Pigeonhole”Principle

• You have n + 1 pigeons (i.e., discrete objects)

• You put them into n pigeonholes (i.e., boxes)
• At least one pigeonhole contains more than one pigeon.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 56
The “Pigeonhole”Principle

An example of using the ”Pigeonhole” Principle:

• The number of IEEE Binary64 numbers in [1, 2) is N = 252

• The number of IEEE Binary64 numbers in [1, 4) is 2N
• Each value in [1, 4) has its square root in (1, 2]
• Since there are more values in [1, 4) than in [1, 2), there must
be at least two distinct floating-point numbers in [1, 4) which
have the same square root

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 57
Catastrophic Cancellation

Catastrophic cancellation occurs when two nearly equal

floating-point numbers are subtracted.
If x ≈ y, their signifcands are nearly identical. When they are
subtracted, only a few low-order digits remain. I.e., the result has
very few significant digits left.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 58
Sterbenz’s Lemma

Lemma
Let a and b be floating-point numbers with

b/2 ≤ a ≤ 2b

. If subnormal numbers are available, a b = a − b.

Thus there is no rounding error associated with a b when a and b
satisfy the criteria.
However, there may be lost of significance.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 59
Error-Free Transformations

An error-free transformation (EFT) is an algorithm which

transforms a (small) set of floating-point numbers into another
(small) set of floating-point numbers of the same precision without
any loss of information.

f (x, y) 7−→ (s, t)

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 60
Error-Free Transformations

EFTs are most useful when they can be implemented using only
the precision of the floating-point numbers involved.
EFTs exist for
• Addition: a + b = s + t where s = a ⊕ b
• Multiplication: a × b = s + t where s = a ⊗ b
• Splitting: a = s + t

Additional EFTs can be derived by composition. For example, an

EFT for dot products makes use of those for addition and
multiplication.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 61
An EFT for Addition

Require: |a| ≥ |b|

1: s ← a ⊕ b
2: t ← b (s a)
3: return (s, t)
Ensure: a + b = s + t where s = a ⊕ b and t are floating-point
numbers
A possible implementation
void
FastSum ( const double a , const double b ,
double * const s , double * const t ) {
// No unsafe optimizations !
*s = a+b;
* t = b -(* s - a );
return ;
}
2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 62
Another EFT for Addition: TwoSum

1: s←a⊕b
2: z ←s a
3: t ← (a (s z) ⊕ (b z)
4: return (s, t)
Ensure: a + b = s + t where s = a ⊕ b and t are floating-point
numbers
A possible implementation
void
TwoSum ( const double a , const double b ,
double * const s , double * const t ) {
// No unsafe optimizations !
*s = a+b;
double z = *s - a ;
* t = (a -(* s - z ))+( b - a );
return ;
2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 63
Comparing FastSum and TwoSum

• A realistic implementation of FastSum requires a branch and

3 floating-point opertions
• TwoSum takes 6 floating-point operations but requires no
branches
• TwoSum is usually faster on modern pipelined processors
• The algorithm used in TwoSum is valid in radix 2 even if
underflow occurs but fails with overflow

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 64
Precise Splitting Algorithm

• Given a base-2 floating-point number x, determine the

floating-point numbers xh and xl such that x = xh + xl
• For 0 < δ < p, where p is the precision and δ is a parameter,
• The signficand of xh fits in p − δ bits
• The signficand of xl fits in δ − 1 bits
• All other bits are 0
• δ is typically chosed to be dp/2e
• No information is lost in the transformation
• Aside: how do we end up only needing
(p − δ) + (δ − 1) = p − 1 bits?
• This scheme is known as Veltkamp’s algorithm

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 65
Precise Splitting EFT

Require: C = 2s + 1; C ⊗ x does not overflow

1: a ← C ⊗ x
2: b ← x a
3: xh ← a ⊕ b
4: xl ← x xh
5: return (xh , xl )
Ensure: x = xh + xl

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 66
Precise Splitting EFT

Possible implementation
void
Split ( const double x , const int delta ,
double * const x_h , double * const x_l ) {
// No unsafe optimizations !
double c = ( double )((1 UL << delta ) + 1);
* x_h = ( c * x ) + ( x - ( c * x ));
* x_l = x - * x_h ;
return ;
}

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 67
Precise Multiplication

• Given floating-point numbers x and y, determine

floating-point numbers s and t such that a × b = s + t where
s = a ⊗ b and

t = ((((xh ⊗ yh ) s) ⊕ (xh ⊗ yl )) ⊕ (xl ⊗ yh )) ⊕ (xl ⊗ yl ).

• Known as Dekker’s algorithm

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 68
Precise Multiplication EFT

The algorithm is much simpler using FMA

1: s←x⊗y
2: t ← F M A(x, y, −s)
3: return (s, t)
Ensure: x ∗ y = s + t where s = x ⊗ y and t are floating-point
numbers
Possible implementation
void
Prod ( const double a , const double b ,
double * const s , double * const t ) {
// No unsafe optimizations !
*s = a * b;
* t = FMA (a , b , -* s );
return ;
} 2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 69
Summation Techniques

• Traditional
• Sorting and Insertion
• Compensated
• Reference: Higham: Accuracy and Stability of Numerical
Algorithms

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 70
Summation Techniques

Condition number: P
|xi |
Csum = Pi
| i xi |

• If Csum is not too large, the problem is not ill-conditioned and

traditional methods may be sufficient
• If Csum is too large, we need to have results appropriate to a
higher precision without actually using a higher precision
• Obviously, if higher precision is readily available, use it

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 71
Traditional Summation

n−1
X
s= xi
i=0

double
Sum ( const double * x , const unsigned int n )
{ // No unsafe optimizations !
double sum = x [0]
for ( unsigned int i = 1; i < n ; i ++) {
sum += x [ i ];
}
return ;
}

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 72
Sorting and Insertion

• Reorder the operands

• By value or magnitude
• Increasing or decreasing
• Insertion
• First sort by magnitude
• Remove x1 and x2 and compute their sum
• Insert that value into the list keeping the list sorted
• Repeat until only one element is in the list
• Many Variations
• If lots of cancellations, sorting by decreasing magnitude may
be better but not always

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 73
Compensated Summation

• Based on FastTwoSum and TwoSum techniques

• Knowledge of the exact rounding error in a floating-point
addition is used to correct the summation
• Developed by William Kahan

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 74
Compensated (Kahan) Summation

Function Kahan (x,n)

Input: n > 0
s ← x0
t←0
for i = 1 to n − 1 do
y ← xi − t // Apply correction
z ← s + y // New sum
t ← (z − s) − y // New correction ≈ low part of y
s ← z // Update sum
end
return s

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 75
Compensated (Kahan) Summation

double
Kahan ( const double * x , const unsigned int n )
{ // No unsafe optimizations !
double s = x [0];
double t = 0.0;
for ( int i = 1; i < n_values ; i ++ ) {
double y = x [ i ] - t ;
double z = s + y ;
t = ( z - s ) - y;
s = z;
}
return s ;
}

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 76
Compensated Summation

Many variations known. Consult the literature for papers with

these authors:
• William M Kahan
• Donald Knuth
• Douglas Priest
• S M Rump, T Ogita and S Oishi
• Jonathan Shewchuk
• AriC project (CNRS/ENS Lyon/INRIA)

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 77
Choice of Summation Technique

• Performance
• Error Bound
• Is it (weakly) dependent on n?
• Condition Number
• Is it known?
• Is it difficult to determine?
• Some algorithms allow it to be determined simultaneously with
an estimate of the sum
• Permits easy evaluation of the suitability of the result
• No one technique fits all situations all the time

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 78
Dot Product

S = x| y
n−1
X
= x i · yi
i=0

where x and y are vectors of length n.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 79
Dot Product

Traditional algorithm

Require: x and y are n-dimensional vectors with n ≥ 0

1: s ← 0
2: for i = 0 to n − 1 do
3: s ← s ⊕ (xi ⊗ yi )
4: end for
5: return s

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 80
Dot Product

The error in the result is proportional to the condition number:

P
i |xi | · |yi |
Cdot product = 2 × P
| i xi · yi |

• If C is not too large, a traditional algorithm can be used

• If C is large, more accurate methods are required
• E.g., lots of cancellation
How to tell? Compute the condition number simultaneously when
computing the dot product

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 81
Dot Product

FMA can be used in the traditional computation

Require: x and y are n-dimensional vectors with n ≥ 0

1: s ← 0
2: for i = 0 to n − 1 do
3: s ← F M A(xi , yi , s)
4: end for
5: return s

Although there are fewer rounded operations than in the traditional

scheme, using FMA does not improve the worst case accuracy.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 82
Dot Product

Recall
• Sum(x, y) computes s and t with x + y = s + t and s = x ⊕ y
• P rod(x, y) computes s and t with x + y = s + t and s = x ⊗ y

Since each individual product in the sum for the dot product is
transformed using P rod(x, y) into the sum of two floating-point
numbers, the dot product of 2 vectors can be reduced to
computing the sum of 2N floating-point numbers.
To accurately compute that sum, Sum(x, y) is used.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 83
Dot Product

Compensated dot product algorithm

Require: x and y are n-dimensional vectors with n ≥ 0

1: (sh , sl ) ← (0, 0)
2: for i = 0 to n − 1 do
3: (ph , pl ) ← P rod(xi , yi )
4: (sh , a) ← Sum(sh , ph )
5: sl ← sl ⊕ (pl ⊕ a)
6: end for
7: return sh ⊕ sl

The relative accuracy of this algorithm is the same as the

traditional algorithm when computed using twice the precision.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 84
Polynomial Evaluation

Evaluate
n
X
p(x) = ai xi
i=0
= a0 + a1 x + a2 x2 + · · · + an−1 xn−1 + an xn

Note that C(p, x) = 1 for certain combinations of a and x. E.g., if

ai ≥ 0 for all i and x ≥ 0.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 85
Horner’s Scheme

Nested multiplication is a standard method for evaluating p(x):

p(x) = (((an x + an−1 )x + an−2 )x · · · + a1 )x + a0

This is known as Horner’s scheme (although Newton published it

in 1711!)

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 86
Horner’s Scheme

Function Horner (x,p,n)

Input: n ≥ 0
sn ← an
for i = n − 1 downto 0 do
// si ← (si+1 × x) + ai
si ←FMA(si+1 , x, ai )
end
return s0

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 87
Horner’s Scheme

A possible implementation
double
Horner ( const double x ,
const double * const a ,
const int n ) {
double s = a [ n ];
for ( int i = n - 1; i >= 0; i - -} (
// s = s * x + a [ i ];
s = FMA (s , x , a [ i ]);
}
return s ;
}

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 88
Applying EFTs to Horner’s Scheme

Horner’s scheme can be improved by applying the EFTs Sum and

Prod

Function HornerEFT (x,p,n)

Input: n ≥ 0
sn ← an
for i = n − 1 downto 0 do
(pi , πi ) ← Prod(si+1 , x)
(si , σi ) ← Sum(pi , ai )
end
return s0 , π, σ
The value of s0 calculated by this algorithm is the same as that
using the traditional Horner’s scheme.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 89
Applying EFTs to Horner’s Scheme

Let π and σ from HornerEFT be the coefficients of polynomials of

degree n − 1. Then the quantity

s0 + (π(x) + σ(x))

is an improved approximation to
n
X
ai xi
i=0

In fact, the relative error from HornerEFT is the same as that

obtained using the traditional algorithm with twice the precision.
Simultaneous calculation of a dynamic error bound can also be
incorporated into this algorithm.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 90
Second Order Horner’s Scheme

Horner’s scheme is sequential: each step of the calculation depends

on the result of the preceeding step.
Consider

p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 + an xn

= (a0 + a2 x2 + · · · ) + x(a1 + a3 x2 + · · · )
= q(x2 ) + xr(x2 )

• The calculations of q(x2 ) and r(x2 ) can be done in parallel

• This technique may be applied recursively

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 91
Estrin’s Method

Isolate subexpressions of the form (ak + ak+1 x) and x2n from p(x):

p(x) = (ao +a1 x)+(a2 +a3 x)x2 +((a4 +a5 x)+(a6 +a7 x)x2 )x4 +· · ·

The subexpressions (ak + ak+1 x) can be evaluated in parallel

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 92
Value Safety

“Value Safety”refers to transformations which, although

algebraically valid, may affect floating-point results.

Ensuring “Value Safety”requires that no optimizations be done

which could change the result of any series of floating-point
operations as specified by the programming language.

• Changes to underflow or overflow behavior

• Effects of an operand which is not a finite floating-point
number. E.g., ±∞ or a NaN
Transformations which violate “Value Safety”are not error free
transformations.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 93
Value Safety

In “safe”mode, the compiler may not make changes such as

(x + y) + z ⇔ x + (y + z) Reassociations are not value-safe
x ∗ (y + z) ⇔ x ∗ y + x ∗ z Distributions are not value-safe
x ∗ (y ∗ z) ⇔ (x ∗ y) ∗ z May change under-/overflow behavior
x/x ⇔ 1.0 x may be 0, ∞ or a NaN
x+0⇔x x may be −0 or a NaN
x∗0⇔0 x may be −0, ∞ or a NaN

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 94
A Note on Compiler Options

• There are many compiler options which affect floating-point

results
• Not all of them are obvious
• Some of them are enabled/disabled by other options
• -On
• -march and others which specify platform characteristics
• Options differ among compilers

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 95
Optimizations Affecting Value Safety

• Expression rearrangements
• Flush-to-zero
• Approximate division and square root
• Math library accuracy

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 96
Expression Rearrangements

These rearrangements are not value-safe:

• (a ⊕ b) ⊕ c ⇒ a ⊕ (b ⊕ c)
• a ⊗ (b ⊕ c) ⇒ (a ⊗ b) ⊕ (a ⊕ c)
To disallow these changes:
gcc Don’t use -ffast-math
icc Use -fp-model precise
• Recall that options such as -On are “aggregated”or
“composite”options
• they enable/disable many other options
• their composition may change with new compiler releases
Disallowing rearrangements may affect performance

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 97
Subnormal Numbers and Flush-To-Zero

• Subnormal numbers extend the range of floating-point

numbers but with reduced precision and reduced performance
• If you do not require subnormals, disable their generation
• “Flush-To-Zero”means “Replace all generated subnormals
with 0”
• Note that this may affect tests for == 0.0 and != 0.0
• If using SSE or AVX, this replacement is fast since it is done
by the hardware

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 98
Subnormal Numbers and Flush-To-Zero

gcc -ffast-math enables flush-to-zero

gcc But -O3 -ffast-math disables flush-to-zero
icc Done by default at -O1 or higher
icc Use of -no-ftz or fp-model precise to will prevent this
icc Use -fp-model precise -ftz to get both “precise”behavior
and subnormals
• Options must be applied to the program unit containing main
as well

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 99
Reductions

• Summation is an example of a reduction

• Parallel implementations of reductions are inherently
value-unsafe because they may change the order of operations
• the parallel implementation can be through vectorization or
multi-threading or both
• there are OpenMP and TBB options to make reductions
“reproducible”
• For OpenMP KMP DETERMINSTIC REDUCTION=yes
icc use of -fp-model precise disables automatic vectorization
and automatic parallelization via threading

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 100
The Hardware Floating-Point Environment

The hardware floating-point environment is controled by several

CPU control words
• Rounding mode
• Status flags
• Exception mask
• Control of subnormals
If you change anything affecting the assumed state of the processor
with respect to floating-point behavior, you must tell the compiler
• Use #pragma STDC FENV ACCESS ON
icc Use -fp-model strict
#pragma STDC FENV ACCESS ON is required if flags are accessed

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 101
Precise Exceptions

Precise Exceptions: floating-point exceptions are reported exactly

when they occur

To enable precision exceptions

• Use #pragma float control(except, on)
icc Use -fp-model strict or -fp-model except
Enabling precise exceptions disables speculative execution of
floating-point instructions. This will probably affect performance.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 102
Math Library Features – icc

A variety of options to control precision and consistency of results

• -fimf-precision=<high|medium|low>[:funclist]
• -fimf-arch-consistency=<true|false>[:funclist]
• And several more options
• -fimf-absolute-error=<value>[:funclist]
• -fimf-accuracy-bits=<value>[:funclist]
• ...

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 103
Tools

• double-double and quad-double data types

• Implemented in C++
• Fortran 90 interfaces provided
• Available from LBL as qd-X.Y.Z.tar.gz
• ”LBNL-BSD” type license

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 104
Tools

GMP – The GNU Multiple Precision Arithmetic Library

• a C library
• arbitrary precision arithmetic for
• signed integers
• rational numbers
• floating-point numbers
• used by gcc and g++ compilers
• C++ interfaces
• GNU LGPL license

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 105
Tools

MPFR
• a C library for multiple-precision floating-point computations
• all results are correctly rounded
• used by gcc and g++
• C++ interface available
• free with a GNU LGPL license

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 106
Tools

CRlibm
• a C library
• all results are correctly rounded
• C++ interface available
• Python bindings available
• free with a GNU LGPL license

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 107
Tools

• limits
• defines characteristics of arithmetic types
• provides the template for the class numeric limits
• #include <limits>
• requires -std=c++11
• specializations for each fundamental type
• compiler and platform specific

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 108
Tools

• cmath
• functions to compute common mathematical operations and
transformations
• #include <cmath>
• frexp
• get exponent and significand
• ldexp
• create value from exponent and significand
• Note: frexp and ldexp assume a different
“normalization”than usual: 1/2 ≤ m < 1
• nextafter
• create next representable value
• fpclassify
• returns one of FP INFINITE, FP NAN, FP ZERO,
F SUBNORMAL, FP NORMAL

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 109
Pitfalls and Gremlins

Catastrophic Cancellation
• x2 − y 2 for x ≈ y
• (x + y)(x − y) may be preferable
• x − y is computed with no round-off error (Sterbenz’s Lemma)
• x + y is computed with relatively small error
• FMA(x,x,-y*y) can be very accurate
• However FMA(x,x,-x*x) is not usually 0!
• similarly 1 − x2 for x ≈ 1
• -FMA(x,x,-1.0) is very accurate

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 110
Pitfalls and Gremlins

“Gratuitous”Overflow
√
Consider x2 + 1 for large x
√
• x2 + 1 → |x| as |x| → ∞
p
• |x| 1 + 1/x2 may be preferable
2
• if xp overflows, 1/x → 0
• |x| 1 + 1/x2 → |x|

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 111
Pitfalls and Gremlins
√
Consider the Newton-Raphson iteration for 1/ x:

yn+1 ← yn (3 − xyn2 )/2

√
where yn ≈ 1/ x. Since xyn2 ≈ 1, there is at most an alignment
shift of 2 when computing 3 − xyn2 , and the final operation consists
of multiplying yn by a computed quantity near 1. (The division by
2 is exact.)

If the iteration is rewritten as

yn + yn (1 − xyn2 )/2,

the final addition involves a large alignment shift between yn and

the correction term yn (1 − xyn2 )/2 avoiding cancellation.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 112
Pitfalls and Gremlins

This situation can be generalized:

When calculating a quantity from other calculated (i.e., inexact)
values, try to formulate the expressions so that the final operation
is an addition of a smaller “correction”term to a value which is
close to the final result.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 113
Pitfalls and Gremlins

Vectorization and Parallelization

These optimizations affect both results and reproducibility
• Results can change because the order of operations may
change
• Vector sizes also affect the order of operations
• Parallalization can change from run to run (e.g., number of
threads available). This impacts both results and reproducibily

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 114
Pitfalls and Gremlins

And finally... CPU manufacturer can impact results. Not all

floating-point instructions execute exactly the same on AMD and
Intel processors
• The rsqrt and rcp instructions differ
• They are not standardized
• Both implementations meet the specification given by Intel
The exact same non-vectorized, non-parallelized, non-threaded
application may give different results on systems with similar
processors each vendor.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 115
Pitfalls and Gremlins

And undoubtedly others, as yet undiscovered.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 116
Bibliography

J.-M. Muller et al, Handbook

of Floating-Point Arithmetic,
Birkäuser, Boston, 2010

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 117
Bibliography

J.-M. Muller, Elementary Functions,

Algorithms and Implementation (2nd
Edition), Birkäuser, Boston, 2006

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 118
Bibliography

N.J. Higham, Accuracy and Stability

of Numerical Algorithms (2nd
Edition), Siam, Philadelphia, 2002.

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 119
Bibliography

• IEEE, IEEE Standard for Floating-Point Arithmetic, IEEE

Computer Society, August 2008.
• D. Goldberg, What every computer scientist should know
about floating-point arithmetic, ACM Computing Surverys,
23(1):5-47, March 1991
• Publications from CNRS/ENS Lyon/INRIA/AriC project
(J.-M. Muller et al).
• Publications from the PEQUAN project at LIP6, Université
Pierre et Marie Curie (Stef Graillat, Christoph Lauter et al).
• Publications from Institut für Zuverlässiges Rechnen (Institute
for Reliable Computing), Technische Universität
Hamburg-Harburg (Siegfried Rump et al).

2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 120
2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 121
2017
c Jeffrey M. Arnold Floating-Point Arithmetic and Computation 122

Georges Pompidou Center
No ratings yet
Georges Pompidou Center
3 pages
TICTOC
No ratings yet
TICTOC
6 pages
Early Thermal Cracking of Concrete: The Highways Agency BD 28/87
100% (1)
Early Thermal Cracking of Concrete: The Highways Agency BD 28/87
13 pages
Control of Blockwork
0% (1)
Control of Blockwork
6 pages
Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier
No ratings yet
Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier
5 pages
Neural Approaches Im Engeineering Matlab Supplement
No ratings yet
Neural Approaches Im Engeineering Matlab Supplement
218 pages
2020fa CS61C 2020fa Module 2 C PDF
No ratings yet
2020fa CS61C 2020fa Module 2 C PDF
106 pages
Fault Detection Classification
No ratings yet
Fault Detection Classification
210 pages
12 Uncertainty in Future Events
No ratings yet
12 Uncertainty in Future Events
54 pages
Clean Code An Agile Guide To Software Craft Kameron H instant download
100% (1)
Clean Code An Agile Guide To Software Craft Kameron H instant download
84 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
A Linear Quadratic Tracking Based Voltage Controller For VSI MVDC Shipboard Power System Application
No ratings yet
A Linear Quadratic Tracking Based Voltage Controller For VSI MVDC Shipboard Power System Application
5 pages
Cyber-Security in Smart Grid Survey and Challenges
No ratings yet
Cyber-Security in Smart Grid Survey and Challenges
14 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
26 pages
GOOD Notes For System Identification and Parameter Estimation
No ratings yet
GOOD Notes For System Identification and Parameter Estimation
103 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
Fuzzy
33% (3)
Fuzzy
13 pages
Machine Learning Techniques - Types of Machine Learning - Applications Mathematical Foundations of Machine Learning
No ratings yet
Machine Learning Techniques - Types of Machine Learning - Applications Mathematical Foundations of Machine Learning
15 pages
Federalist 10 Brutus 1 Analytical Reading
No ratings yet
Federalist 10 Brutus 1 Analytical Reading
4 pages
Simulink Time 1 Simple Oscillator: M y + C y + Ky 0
No ratings yet
Simulink Time 1 Simple Oscillator: M y + C y + Ky 0
16 pages
Notes Prepared For: Mohammed Waseem Raza
No ratings yet
Notes Prepared For: Mohammed Waseem Raza
130 pages
Using Matlab With Python Cheat Sheet
0% (1)
Using Matlab With Python Cheat Sheet
1 page
HW 1 Sol
0% (1)
HW 1 Sol
7 pages
2 - IEEE STD 1547 - 2018-2003 - Redline
No ratings yet
2 - IEEE STD 1547 - 2018-2003 - Redline
227 pages
Simulink Design Optimization - User's Guide
No ratings yet
Simulink Design Optimization - User's Guide
411 pages
Anns
No ratings yet
Anns
19 pages
500-1984 pt1
100% (1)
500-1984 pt1
500 pages
Fulltext
No ratings yet
Fulltext
11 pages
8021AE-2018-Media Access Control (MAC) Security
No ratings yet
8021AE-2018-Media Access Control (MAC) Security
239 pages
IQMath Fixed Vs Floating PDF
No ratings yet
IQMath Fixed Vs Floating PDF
30 pages
Handbook of Floating-Point Arithmetic
No ratings yet
Handbook of Floating-Point Arithmetic
11 pages
Articulo HVDC 1991
No ratings yet
Articulo HVDC 1991
7 pages
Instant download (Ebook) A Guide to the Automation Body of Knowledge by Nicolas Sands (editor), Ian Verhappen (editor) ISBN 9781941546918, 1941546919 pdf all chapter
100% (9)
Instant download (Ebook) A Guide to the Automation Body of Knowledge by Nicolas Sands (editor), Ian Verhappen (editor) ISBN 9781941546918, 1941546919 pdf all chapter
46 pages
Solutions Assignment1 Seg3155 2011w
No ratings yet
Solutions Assignment1 Seg3155 2011w
5 pages
Ann Chapter 2
No ratings yet
Ann Chapter 2
240 pages
The Science of Deep Learning
0% (1)
The Science of Deep Learning
2 pages
5118.numerical Computing With IEEE Floating Point Arithmetic by Michael L. Overton
No ratings yet
5118.numerical Computing With IEEE Floating Point Arithmetic by Michael L. Overton
121 pages
(Grewal B. S.) (If Required Password-Eduinformer) Higher Engineering Mathematics
No ratings yet
(Grewal B. S.) (If Required Password-Eduinformer) Higher Engineering Mathematics
1,327 pages
A Course in Advanced Signal Processing
No ratings yet
A Course in Advanced Signal Processing
16 pages
ANN Supervised Learning (Compatibility Mode)
No ratings yet
ANN Supervised Learning (Compatibility Mode)
73 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
22 pages
3 Matlab Anfis
No ratings yet
3 Matlab Anfis
28 pages
Active Ethernet Micro Controller Users Manual v1
No ratings yet
Active Ethernet Micro Controller Users Manual v1
220 pages
IEEE 338-87
No ratings yet
IEEE 338-87
18 pages
Presentation 2
No ratings yet
Presentation 2
19 pages
Optimal Control Matlab
No ratings yet
Optimal Control Matlab
25 pages
Ngspice
No ratings yet
Ngspice
202 pages
Cplus Faq
No ratings yet
Cplus Faq
287 pages
Robot Programming in "C": Tak Auyeung, Ph.D. February 15, 2006
No ratings yet
Robot Programming in "C": Tak Auyeung, Ph.D. February 15, 2006
160 pages
IEEE 13 Node Test Feeder
No ratings yet
IEEE 13 Node Test Feeder
11 pages
MATLAB Function
No ratings yet
MATLAB Function
27 pages
Instant Ebooks Textbook Grokking Artificial Intelligence Algorithms First Edition Rishal Hurbans Download All Chapters
100% (2)
Instant Ebooks Textbook Grokking Artificial Intelligence Algorithms First Edition Rishal Hurbans Download All Chapters
62 pages
Iso 10206
No ratings yet
Iso 10206
230 pages
Neural Network Sliding-Mode Position Controller For Induction Servo Drive
No ratings yet
Neural Network Sliding-Mode Position Controller For Induction Servo Drive
12 pages
LSTM
No ratings yet
LSTM
42 pages
Lecture Notes SC
No ratings yet
Lecture Notes SC
21 pages
Iot Merged
No ratings yet
Iot Merged
132 pages
Divergence PDF
No ratings yet
Divergence PDF
7 pages
Ponto Flutuante
No ratings yet
Ponto Flutuante
87 pages
f31 Book Arith Pres Pt5
No ratings yet
f31 Book Arith Pres Pt5
93 pages
Demystifying Floating Point - John Farrier - CppCon 2015
No ratings yet
Demystifying Floating Point - John Farrier - CppCon 2015
61 pages
Slide n2 Appendix Posted
No ratings yet
Slide n2 Appendix Posted
21 pages
Floating Point
No ratings yet
Floating Point
3 pages
Mail Format
No ratings yet
Mail Format
1 page
1st April 2015 AMBA AXI Protocol Interview Questions Xilinx, Sicon, Ensilica and Mobiveil
No ratings yet
1st April 2015 AMBA AXI Protocol Interview Questions Xilinx, Sicon, Ensilica and Mobiveil
2 pages
Xilinx Answer 65444 Xilinx PCI Express DMA Drivers and Software Guide
No ratings yet
Xilinx Answer 65444 Xilinx PCI Express DMA Drivers and Software Guide
7 pages
Digital Design Interview Questions
No ratings yet
Digital Design Interview Questions
2 pages
I
No ratings yet
I
2 pages
NA NA Catherine George Fefemale 2018
No ratings yet
NA NA Catherine George Fefemale 2018
4 pages
Heart Y Welco ME : St. Albert'S College (Autonomous)
No ratings yet
Heart Y Welco ME : St. Albert'S College (Autonomous)
2 pages
GP Raw
No ratings yet
GP Raw
20 pages
Tropical Design Theories Concepts and ST PDF
No ratings yet
Tropical Design Theories Concepts and ST PDF
2 pages
How To Reset The Administrator Password in Business Intelligence Platform 4
No ratings yet
How To Reset The Administrator Password in Business Intelligence Platform 4
3 pages
40 Years of Transit Oriented Development
No ratings yet
40 Years of Transit Oriented Development
74 pages
Vs Code Installation - Steps
No ratings yet
Vs Code Installation - Steps
11 pages
Grade (Slope) - Wikipedia, The Free Encyclopedia
No ratings yet
Grade (Slope) - Wikipedia, The Free Encyclopedia
7 pages
4000A - 2x100x12mm - CU (OPEN D)
No ratings yet
4000A - 2x100x12mm - CU (OPEN D)
4 pages
BOQ Ato Abdulkader Muzemil
100% (4)
BOQ Ato Abdulkader Muzemil
5 pages
Apple Has Been Sued Because Iphones Often Don'T Deliver Text Messages To Android Users
No ratings yet
Apple Has Been Sued Because Iphones Often Don'T Deliver Text Messages To Android Users
14 pages
Lecture 05-Development Length, Lap Splices and Curtailment of Reinforcement
No ratings yet
Lecture 05-Development Length, Lap Splices and Curtailment of Reinforcement
17 pages
Arsenal Gui
No ratings yet
Arsenal Gui
8 pages
Strategic Network Design: A Case Study of Network Upgrade in St. Andrew's Hospital
No ratings yet
Strategic Network Design: A Case Study of Network Upgrade in St. Andrew's Hospital
14 pages
IKEA Kitchens2010sg en
No ratings yet
IKEA Kitchens2010sg en
31 pages
Baeldung 2014 Spring Development Report
No ratings yet
Baeldung 2014 Spring Development Report
9 pages
DSCP and IPPATH Knowledge For GSM IP Network
No ratings yet
DSCP and IPPATH Knowledge For GSM IP Network
3 pages
Twyfords Spec
No ratings yet
Twyfords Spec
116 pages
Floor Plan
No ratings yet
Floor Plan
2 pages
yasser
No ratings yet
yasser
12 pages
Submittal Review Comments Form
No ratings yet
Submittal Review Comments Form
2 pages
Data Hazards
No ratings yet
Data Hazards
29 pages
Nortel Bes
100% (1)
Nortel Bes
242 pages
Pfs PPT 3
0% (1)
Pfs PPT 3
16 pages
Al Mulla - GRD
No ratings yet
Al Mulla - GRD
73 pages
Flexible Pavement
100% (1)
Flexible Pavement
59 pages
Unit 4 MNA Intermediate 2022 - Ryz
No ratings yet
Unit 4 MNA Intermediate 2022 - Ryz
9 pages
Sangfor HCI
No ratings yet
Sangfor HCI
8 pages
Standard Schedule of Rates in Tamil Nadu
No ratings yet
Standard Schedule of Rates in Tamil Nadu
91 pages