主要涉及計算機中數的表示法:
(1)整數: two's complement,即補碼表示法
假設用N位bit表示整數w: 其中最左邊一位為符號位,符號位為0,表示正數,為1表示負數。
(2)浮點數: 浮點數采用類似科學計數法的方式
以float為例:編碼分為三部分:首位為符號位S,然后是8位指數位exp,最后是23位有效數位frac。
即: x = S*M*2^E
例如: -1.10110 × 2^10
其中:
通常 E = exp - bais, 對於float, bais = 2^(8-1)-1 = 127; M = 1 + frac。
浮點數根據exp的不同有不同解碼方式
A. 當exp = 0xFF 時,若frac全為0,表示±∞;若frac不全為0,則表示NaN(Not A Num).
B. 當exp = 0x00 時, 為非規格化的,此時exp=0, 但是 E ≠ 0 - bais 而是規定 E = 1 - bais,
另外,M也不是1+frac, 而是 M=frac, 所以當exp=0且frac=0時,表示±0;
C. 當exp≠0xFF,也≠0x00時位規格化的,此時才有E = exp - bais, M = 1 + frac。
需要說明的是:B中這種設計的特點:
第一,可以編碼0,第二,在0附近的數是均勻分布的,最后,從非規格數到規格數是平滑過度的。
(例子參考下面 datalab中的 float_twice)
浮點數的舍入
浮點數的frac部分長度有限,因此精度就有限,比如float精度最大為23位,
若有超過這個精度的數轉換為float數,就存在舍入的問題。 一般浮點數舍入遵循兩點:
就近舍入(round-to-nearest)和向偶數舍入(round-to-even).
(例子參考下面 datalab中的 float_i2f)
另外給出一個例子:
1 int main(int argc, char *argv[]){ 2 double dt = 0x0.0000008p+0; 3 double d0 = 0x1.0000010p+0; 4 for (int i = 0; i < 6; ++i) { 5 printf("=======\n"); 6 printf("double: %a \n", d0); 7 printf("float: %a \n", (float)d0); 8 d0 += dt; 9 } 10 }
結果:
1 ======= 2 double: 0x1.000001p+0 3 float: 0x1p+0 4 ======= 5 double: 0x1.0000018p+0 6 float: 0x1.000002p+0 7 ======= 8 double: 0x1.000002p+0 9 float: 0x1.000002p+0 10 ======= 11 double: 0x1.0000028p+0 12 float: 0x1.000002p+0 13 ======= 14 double: 0x1.000003p+0 15 float: 0x1.000004p+0 16 ======= 17 double: 0x1.0000038p+0 18 float: 0x1.000004p+0
解釋略去。
data lab
1 /* 2 * CS:APP Data Lab 3 * 4 * <Please put your name and userid here> 5 * 6 * bits.c - Source file with your solutions to the Lab. 7 * This is the file you will hand in to your instructor. 8 * 9 * WARNING: Do not include the <stdio.h> header; it confuses the dlc 10 * compiler. You can still use printf for debugging without including 11 * <stdio.h>, although you might get a compiler warning. In general, 12 * it's not good practice to ignore compiler warnings, but in this 13 * case it's OK. 14 */ 15 16 #if 0 17 /* 18 * Instructions to Students: 19 * 20 * STEP 1: Read the following instructions carefully. 21 */ 22 23 You will provide your solution to the Data Lab by 24 editing the collection of functions in this source file. 25 26 INTEGER CODING RULES: 27 28 Replace the "return" statement in each function with one 29 or more lines of C code that implements the function. Your code 30 must conform to the following style: 31 32 int Funct(arg1, arg2, ...) { 33 /* brief description of how your implementation works */ 34 int var1 = Expr1; 35 ... 36 int varM = ExprM; 37 38 varJ = ExprJ; 39 ... 40 varN = ExprN; 41 return ExprR; 42 } 43 44 Each "Expr" is an expression using ONLY the following: 45 1. Integer constants 0 through 255 (0xFF), inclusive. You are 46 not allowed to use big constants such as 0xffffffff. 47 2. Function arguments and local variables (no global variables). 48 3. Unary integer operations ! ~ 49 4. Binary integer operations & ^ | + << >> 50 51 Some of the problems restrict the set of allowed operators even further. 52 Each "Expr" may consist of multiple operators. You are not restricted to 53 one operator per line. 54 55 You are expressly forbidden to: 56 1. Use any control constructs such as if, do, while, for, switch, etc. 57 2. Define or use any macros. 58 3. Define any additional functions in this file. 59 4. Call any functions. 60 5. Use any other operations, such as &&, ||, -, or ?: 61 6. Use any form of casting. 62 7. Use any data type other than int. This implies that you 63 cannot use arrays, structs, or unions. 64 65 66 You may assume that your machine: 67 1. Uses 2s complement, 32-bit representations of integers. 68 2. Performs right shifts arithmetically. 69 3. Has unpredictable behavior when shifting an integer by more 70 than the word size. 71 72 EXAMPLES OF ACCEPTABLE CODING STYLE: 73 /* 74 * pow2plus1 - returns 2^x + 1, where 0 <= x <= 31 75 */ 76 int pow2plus1(int x) { 77 /* exploit ability of shifts to compute powers of 2 */ 78 return (1 << x) + 1; 79 } 80 81 /* 82 * pow2plus4 - returns 2^x + 4, where 0 <= x <= 31 83 */ 84 int pow2plus4(int x) { 85 /* exploit ability of shifts to compute powers of 2 */ 86 int result = (1 << x); 87 result += 4; 88 return result; 89 } 90 91 FLOATING POINT CODING RULES 92 93 For the problems that require you to implent floating-point operations, 94 the coding rules are less strict. You are allowed to use looping and 95 conditional control. You are allowed to use both ints and unsigneds. 96 You can use arbitrary integer and unsigned constants. 97 98 You are expressly forbidden to: 99 1. Define or use any macros. 100 2. Define any additional functions in this file. 101 3. Call any functions. 102 4. Use any form of casting. 103 5. Use any data type other than int or unsigned. This means that you 104 cannot use arrays, structs, or unions. 105 6. Use any floating point data types, operations, or constants. 106 107 108 NOTES: 109 1. Use the dlc (data lab checker) compiler (described in the handout) to 110 check the legality of your solutions. 111 2. Each function has a maximum number of operators (! ~ & ^ | + << >>) 112 that you are allowed to use for your implementation of the function. 113 The max operator count is checked by dlc. Note that '=' is not 114 counted; you may use as many of these as you want without penalty. 115 3. Use the btest test harness to check your functions for correctness. 116 4. Use the BDD checker to formally verify your functions 117 5. The maximum number of ops for each function is given in the 118 header comment for each function. If there are any inconsistencies 119 between the maximum ops in the writeup and in this file, consider 120 this file the authoritative source. 121 122 /* 123 * STEP 2: Modify the following functions according the coding rules. 124 * 125 * IMPORTANT. TO AVOID GRADING SURPRISES: 126 * 1. Use the dlc compiler to check that your solutions conform 127 * to the coding rules. 128 * 2. Use the BDD checker to formally verify that your solutions produce 129 * the correct answers. 130 */ 131 132 133 #endif 134 135 /* 136 * bitAnd - x&y using only ~ and | 137 * Example: bitAnd(6, 5) = 4 138 * Legal ops: ~ | 139 * Max ops: 8 140 * Rating: 1 141 */ 142 int bitAnd(int x, int y) { 143 return ~((~x) | (~y)); 144 } 145 146 /* 147 * getByte - Extract byte n from word x 148 * Bytes numbered from 0 (LSB) to 3 (MSB) 149 * Examples: getByte(0x12345678,1) = 0x56 150 * Legal ops: ! ~ & ^ | + << >> 151 * Max ops: 6 152 * Rating: 2 153 */ 154 int getByte(int x, int n) { 155 int y = x >> (n << 3); 156 return y & 0xFF; 157 } 158 159 /* 160 * logicalShift - shift x to the right by n, using a logical shift 161 * Can assume that 0 <= n <= 31 162 * Examples: logicalShift(0x87654321,4) = 0x08765432 163 * Legal ops: ! ~ & ^ | + << >> 164 * Max ops: 20 165 * Rating: 3 166 */ 167 int logicalShift(int x, int n) { 168 int y = x >> n; 169 170 int helper = (1 << 31) >> n; 171 helper = ~(helper << 1); 172 return y & helper; 173 } 174 175 /* 176 * bitCount - returns count of number of 1's in word 177 * Examples: bitCount(5) = 2, bitCount(7) = 3 178 * Legal ops: ! ~ & ^ | + << >> 179 * Max ops: 40 180 * Rating: 4 181 */ 182 int bitCount(int x) { 183 int mk1, mk2, mk3, mk4, mk5, result; 184 mk5 = 0xff | (0xff << 8); 185 mk4 = 0xff | (0xff << 16); 186 mk3 = 0x0f | (0x0f << 8); 187 mk3 = mk3 | (mk3 << 16); 188 mk2 = 0x33 | (0x33 << 8); 189 mk2 = mk2 | (mk2 << 16); 190 mk1 = 0x55 | (0x55 << 8); 191 mk1 = mk1 | (mk1 << 16); 192 193 // 先把16個相鄰兩位有幾個1,並用這兩位表示,然后以此類推, 194 // 即: 32->16, 16->8, 8->4, 4->2, 2->1 195 result = (mk1 & x) + (mk1 & (x >> 1)); 196 result = (mk2 & result) + (mk2 & (result >> 2)); 197 result = mk3 & (result + (result >> 4)); 198 result = mk4 & (result + (result >> 8)); 199 result = mk5 & (result + (result >> 16)); 200 return result; 201 } 202 203 /* 204 * bang - Compute !x without using ! 205 * Examples: bang(3) = 0, bang(0) = 1 206 * Legal ops: ~ & ^ | + << >> 207 * Max ops: 12 208 * Rating: 4 209 */ 210 int bang(int x) { 211 return ((x | (~x + 1)) >> 31) + 1; 212 } 213 214 /* 215 * tmin - return minimum two's complement integer 216 * Legal ops: ! ~ & ^ | + << >> 217 * Max ops: 4 218 * Rating: 1 219 */ 220 int tmin(void) { 221 return 1 << 31; 222 } 223 224 /* 225 * fitsBits - return 1 if x can be represented as an 226 * n-bit, two's complement integer. 227 * 1 <= n <= 32 228 * Examples: fitsBits(5,3) = 0, fitsBits(-4,3) = 1 229 * Legal ops: ! ~ & ^ | + << >> 230 * Max ops: 15 231 * Rating: 2 232 */ 233 int fitsBits(int x, int n) { 234 /* 235 n 能表示的數,除去符號位,剩下n-1位,對應到32位int數中: 236 正數應該是前32-(n-1)位都是0,負數應該是32-(n-1)位都是1。 237 */ 238 int signX = x >> 31; 239 int y = x >> (n + (~0)); 240 return !(signX ^ y); 241 } 242 243 /* 244 * divpwr2 - Compute x/(2^n), for 0 <= n <= 30 245 * Round toward zero 246 * Examples: divpwr2(15,1) = 7, divpwr2(-33,4) = -2 247 * Legal ops: ! ~ & ^ | + << >> 248 * Max ops: 15 249 * Rating: 2 250 */ 251 int divpwr2(int x, int n) { 252 int signX = x >> 31; 253 int bias = (1 << n) + (~0); 254 bias = signX & bias; 255 return (x + bias) >> n; 256 } 257 258 /* 259 * negate - return -x 260 * Example: negate(1) = -1. 261 * Legal ops: ! ~ & ^ | + << >> 262 * Max ops: 5 263 * Rating: 2 264 */ 265 int negate(int x) { 266 return (~x) + 1; 267 } 268 269 /* 270 * isPositive - return 1 if x > 0, return 0 otherwise 271 * Example: isPositive(-1) = 0. 272 * Legal ops: ! ~ & ^ | + << >> 273 * Max ops: 8 274 * Rating: 3 275 */ 276 int isPositive(int x) { 277 return !((x >> 31) | (!x)); 278 } 279 280 /* 281 * isLessOrEqual - if x <= y then return 1, else return 0 282 * Example: isLessOrEqual(4,5) = 1. 283 * Legal ops: ! ~ & ^ | + << >> 284 * Max ops: 24 285 * Rating: 3 286 */ 287 int isLessOrEqual(int x, int y) { 288 int signX = x >> 31; 289 int signY = y >> 31; 290 int signSame = !(signX ^ signY); 291 int diff = x + (~y) + 1; 292 int diffNegZero = (diff >> 31) | (!diff); 293 return (signSame & diffNegZero) | ((!signSame) & signX); 294 } 295 296 /* 297 * ilog2 - return floor(log base 2 of x), where x > 0 298 * Example: ilog2(16) = 4 299 * Legal ops: ! ~ & ^ | + << >> 300 * Max ops: 90 301 * Rating: 4 302 */ 303 int ilog2(int x) { 304 int bn = (!!(x >> 16)) << 4; 305 bn = bn + ((!!(x >> (bn + 8))) << 3); 306 bn = bn + ((!!(x >> (bn + 4))) << 2); 307 bn = bn + ((!!(x >> (bn + 2))) << 1); 308 bn = bn + (!!(x >> (bn + 1))); 309 return bn; 310 } 311 312 /* 313 * float_neg - Return bit-level equivalent of expression -f for 314 * floating point argument f. 315 * Both the argument and result are passed as unsigned int's, but 316 * they are to be interpreted as the bit-level representations of 317 * single-precision floating point values. 318 * When argument is NaN, return argument. 319 * Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while 320 * Max ops: 10 321 * Rating: 2 322 */ 323 unsigned float_neg(unsigned uf) { 324 /* 325 * s111 1111 1xxx xxxx xxxx xxxx xxxx xxxx 326 * s is sign bit, when xs are all ZERO, this represents inf, 327 * and when xs are not all ZERO, it's NaN. 328 */ 329 unsigned fracMask, expMask; 330 unsigned fracPart, expPart; 331 fracMask = (1 << 23) - 1; 332 expMask = 0xff << 23; 333 fracPart = uf & fracMask; 334 expPart = uf & expMask; 335 if ((expMask == expPart) && fracPart) { 336 return uf; 337 } 338 339 return (1 << 31) + uf; 340 } 341 342 /* 343 * float_i2f - Return bit-level equivalent of expression (float) x 344 * Result is returned as unsigned int, but 345 * it is to be interpreted as the bit-level representation of a 346 * single-precision floating point values. 347 * Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while 348 * Max ops: 30https://www.linuxmint.com/start/sarah/ 349 * Rating: 4 350 */ 351 unsigned float_i2f(int x) { 352 unsigned signX, expPart, fracPart; 353 unsigned absX; 354 unsigned hp = 1 << 31; 355 unsigned shiftLeft = 0; 356 unsigned roundTail; 357 unsigned result; 358 if (0 == x) { 359 return 0; 360 } 361 absX = x; 362 signX = 0; 363 if (x < 0) { 364 absX = -x; 365 signX = hp; 366 } 367 while (0 == (hp & absX)) { 368 absX = absX << 1; 369 shiftLeft += 1; 370 } 371 expPart = 127 + 31 - shiftLeft; 372 roundTail = absX & 0xff; 373 fracPart = (~(hp >> 8)) & (absX >> 8); 374 result = signX | (expPart << 23) | fracPart; 375 // 離大數更近時,進位;離小數更近時,舍位。 376 if (roundTail > 0x80) { 377 result += 1; 378 } else if (0x80 == roundTail) { 379 // 離兩邊同樣近時,根據左邊一位舍入到偶數,左邊一位為1則進,為0則舍。 380 if (fracPart & 1) { 381 result += 1; 382 } 383 } 384 return result; 385 } 386 387 /* 388 * float_twice - Return bit-level equivalent of expression 2*f for 389 * floating point argument f. 390 * Both the argument and result are passed as unsigned int's, but 391 * they are to be interpreted as the bit-level representation of 392 * single-precision floating point values. 393 * When argument is NaN, return argument 394 * Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while 395 * Max ops: 30 396 * Rating: 4 397 */ 398 unsigned float_twice(unsigned uf) { 399 unsigned signX, expPart, fracPart; 400 unsigned helper = 1 << 31; 401 unsigned fracMask = (1 << 23) - 1; 402 if (0 == uf) { // positive 0 403 return 0; 404 } 405 if (helper == uf) { // negative 0 406 return helper; 407 } 408 signX = uf & helper; 409 expPart = (uf >> 23) & 0xff; 410 if (expPart == 0xff) { 411 return uf; 412 } 413 fracPart = uf & fracMask; 414 if (0 == expPart) { // 非規格化值 415 fracPart = fracPart << 1; 416 if (fracPart & (1 << 23)) { 417 fracPart = fracPart & fracMask; 418 expPart += 1; 419 } 420 } else { 421 expPart += 1; 422 } 423 return signX | (expPart << 23) | fracPart; 424 }