ELEC2041 Microprocessors and Interfacing Lectures 21: Floating Point Number Representation - III
ELEC2041 Microprocessors and Interfacing Lectures 21: Floating Point Number Representation - III
au/
Overview Special Floating Point Numbers: NaN, Denorms IEEE Rounding modes Floating Point fallacies, hacks Using floating point in C and ARM Multi Dimensional Array layouts
ELEC2041 lec21-fp-III.2
Saeid Nooshabadi
Big Idea: Instructions determine meaning of data; nothing inherent inside the data
ELEC2041 lec21-fp-III.3 Saeid Nooshabadi
Example Meaning Comments fadds s0,s1,s2 s0=s1+s2 Fl. Pt. Add (single) faddd d0,d1,d2 d0=d1+d2 Fl. Pt. Add (double) fsubs s0,s1,s2 s0=s1 s2 Fl. Pt. Sub (single) fsubd d0,d1,d2 d0=d1 d2 Fl. Pt. Sub (double) fmuls s0,s1,s2 s0=s1 s2 Fl. Pt. Mul (single) fmuld d0,d1,d2 d0=d1 d2 Fl. Pt. Mul (double) fdivs s0,s1,s2 s0=s1 s2 Fl. Pt. Div (single) fdivd d0,d1,d2 d0=d1 d2 Fl. Pt. Div (double) fcmps s0,s1 FCPSR flags = s0 s1 Fl. Pt.Compare (single) fcmpd d0,d1 FCPSR flags = d0 d1 Fl. Pt.Compare (double)
Z = 1 if s0 = s1, (d0 = d1) N = 1 if s0 < s1, (d0 < d1) C = 1 if s0 = s1, (d0 = d1); s0 > s1, (d0 > d1), or unordered V = 1 if unordered Unordered? Next slide
Saeid Nooshabadi
ELEC2041 lec21-fp-III.5
(Single
ELEC2041 lec21-fp-III.8
Saeid Nooshabadi
Representation for Denorms (#1/2) Problem: Theres a gap among representable FP numbers around 0
Significand = 0, Exp = 0 (2-127) 0 Smallest representable positive num:
- a = 1.0 2 * 2-126 = 2-126
Gap!
0 a
+
Saeid Nooshabadi
ELEC2041 lec21-fp-III.10
+
Saeid Nooshabadi
(Single
25% - 30% of the code is to get the operations on denorms right In most hardware implementations denorms are flushed to zero, or implemented in software via exceptions
ELEC2041 lec21-fp-III.12 Saeid Nooshabadi
Rounding When we perform math on real numbers, we have to worry about rounding The actual hardware for Floating Point Representation carries two extra bits of precision, and then round to get the proper value Rounding also occurs when converting a double to a single precision value, or converting a floating point number to an integer
ELEC2041 lec21-fp-III.13
Saeid Nooshabadi
Truncate
Just drop the last digits (round towards 0); 1.9999 1.9, -1.9999 -1.9
Round to Even Round like you learned in high school Except if the value is right on the borderline, in which case we round to the nearest EVEN number 2.55 2.6 3.45 3.4 Insures fairness on calculation
This way, half the time we round up on tie, the other half time we round down Ask statistics Prof.
ELEC2041 lec21-fp-III.15
Saeid Nooshabadi
(float) exp
converts integer to nearest floating point f = f + (float) i; fsitos (int floating) In ARM
ELEC2041 lec21-fp-III.16 Saeid Nooshabadi
int float int if (i == (int)((float) i)) { printf(true); } Will not always work Large values of integers dont have exact floating point representations Similarly, we may round to the wrong value
ELEC2041 lec21-fp-III.17
Saeid Nooshabadi
float int float if (f == (float)((int) f)) { printf(true); } Will not always work Small values of floating point dont good integer representations Also rounding errors have
ELEC2041 lec21-fp-III.18
Saeid Nooshabadi
x = 1.5 x 1038, y = 1.5 x 1038, and z = 1.0 x + (y + z) = 1.5x1038 + (1.5x1038 + 1.0) = 1.5x1038 + (1.5x1038) = 0.0 (x + y) + z = (1.5x1038 + 1.5x1038) + 1.0 = (0.0) + 1.0 = 1.0
Sept: Math Prof. discovers, puts on WWW Nov: Front page trade paper, then NY Times
Intel: several dozen people that this would affect. So far, we've only heard from one. Intel claims customers see 1 error/27000 years for random set of Ft. Pt. Inputs. Does not explain why anybody wants to use Ff. Pt. No. in random IBM claims 1 error/month, stops shipping
Check for gradual underflow and treating denomrs makes it much harder Beyond Prof. Kahan very few really understand it! It was finally approved as IEEE 754 after 10 years of controversy in 1983
Denorm was the most controversial aspect The visitors to the US were advised of 3 most interesting places to visit: Las Vegas, Great Canyon and IEEE committee rooms!
ELEC2041 lec21-fp-III.22 Saeid Nooshabadi
Reading Material
ARM Architecture Reference Manual 2nd Ed, AddisonWesley, 2001, ISBN: 0-201-73719-1, Part C, Vector Floating Point Architecture, chapters C1 C5
Steve Furber: ARM System On-Chip; 2nd Ed, AddisonWesley, 2000, ISBN: 0-201-67519-6. chapter 6 (NOT up to date)
ELEC2041 lec21-fp-III.23
Saeid Nooshabadi
j i
Row 32
Col 32
j i
ELEC2041 lec21-fp-III.24
Saeid Nooshabadi
Example: Matrix with Fl Pt, Multiply, Add in C void mm(double x[][32],double y[] [32], double z[][32]){ int i, j, k; for (i=0; i<32; i=i+1) for (j=0; j<32; j=j+1) for (k=0; k<32; k=k+1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; Why pass in # of cols? } Starting addresses are parameters in a1, a2, and a3. Integer variables are in v2, v3, v4. Arrays 32 x 32
ELEC2041 lec21-fp-III.25
Saeid Nooshabadi
Address 0 A0,0 A0,1 A0,2 A0,3 A1,0 A1,1 A1,2 A1,3 A2,0 A2,1 A2,2 A2,3 Address
Saeid Nooshabadi
A2,1 = (2 x 4 + 1) x 4 = 36
ELEC2041 lec21-fp-III.26
ARM code for first piece: initilialize, x[][] Initailize Loop Variables
mm: ... mov v1, mov L1: mov L2: mov
stmfd sp!, {v1-v4}
ARM code for second piece: z[][], y[][] Like before, but load y[i][k] into d1
L3: add ip,v4,v2, lsl #5 ;ip = i*25+k add ip,a2,ip, lsl #3 ;ip = a2 +ip*8 ;(i,k byte addr.) fldd d1, [ip] ; d1 = y[i][k] add ip,v3,v4, lsl #5 ;ip = k*25+j add ip,a3,ip, lsl #3 ;ip = a3 +ip*8 ;(k,j byte addr.) fldd d2, [ip] ; d2 = z[k][j]
ELEC2041 lec21-fp-III.29
ELEC2041 lec21-fp-III.30
Saeid Nooshabadi
And in Conclusion.. Exponent = 255, Significand nonzero Represents NaN Finite precision means we have to cope with round off error (arithmetic with inexact values) and truncation error (large values overwhelming small ones). In NaN representation of Ft. Pt. Exponent = 255 and Significand 0 In Denorm representation of Ft. Pt. Exponent = 0 and Significand 0 In Denorm representation of Ft. Pt. numbers there no hidden 1.
ELEC2041 lec21-fp-III.31 Saeid Nooshabadi