README 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427
  1. +---------------------------------------------------------------------------+
  2. | wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. |
  3. | |
  4. | Copyright (C) 1992,1993,1994,1995,1996,1997,1999 |
  5. | W. Metzenthen, 22 Parker St, Ormond, Vic 3163, |
  6. | Australia. E-mail [email protected] |
  7. | |
  8. | This program is free software; you can redistribute it and/or modify |
  9. | it under the terms of the GNU General Public License version 2 as |
  10. | published by the Free Software Foundation. |
  11. | |
  12. | This program is distributed in the hope that it will be useful, |
  13. | but WITHOUT ANY WARRANTY; without even the implied warranty of |
  14. | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
  15. | GNU General Public License for more details. |
  16. | |
  17. | You should have received a copy of the GNU General Public License |
  18. | along with this program; if not, write to the Free Software |
  19. | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |
  20. | |
  21. +---------------------------------------------------------------------------+
  22. wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
  23. which was my 80387 emulator for early versions of djgpp (gcc under
  24. msdos); wm-emu387 was in turn based upon emu387 which was written by
  25. DJ Delorie for djgpp. The interface to the Linux kernel is based upon
  26. the original Linux math emulator by Linus Torvalds.
  27. My target FPU for wm-FPU-emu is that described in the Intel486
  28. Programmer's Reference Manual (1992 edition). Unfortunately, numerous
  29. facets of the functioning of the FPU are not well covered in the
  30. Reference Manual. The information in the manual has been supplemented
  31. with measurements on real 80486's. Unfortunately, it is simply not
  32. possible to be sure that all of the peculiarities of the 80486 have
  33. been discovered, so there is always likely to be obscure differences
  34. in the detailed behaviour of the emulator and a real 80486.
  35. wm-FPU-emu does not implement all of the behaviour of the 80486 FPU,
  36. but is very close. See "Limitations" later in this file for a list of
  37. some differences.
  38. Please report bugs, etc to me at:
  39. [email protected]
  40. or [email protected]
  41. For more information on the emulator and on floating point topics, see
  42. my web pages, currently at http://www.suburbia.net/~billm/
  43. --Bill Metzenthen
  44. December 1999
  45. ----------------------- Internals of wm-FPU-emu -----------------------
  46. Numeric algorithms:
  47. (1) Add, subtract, and multiply. Nothing remarkable in these.
  48. (2) Divide has been tuned to get reasonable performance. The algorithm
  49. is not the obvious one which most people seem to use, but is designed
  50. to take advantage of the characteristics of the 80386. I expect that
  51. it has been invented many times before I discovered it, but I have not
  52. seen it. It is based upon one of those ideas which one carries around
  53. for years without ever bothering to check it out.
  54. (3) The sqrt function has been tuned to get good performance. It is based
  55. upon Newton's classic method. Performance was improved by capitalizing
  56. upon the properties of Newton's method, and the code is once again
  57. structured taking account of the 80386 characteristics.
  58. (4) The trig, log, and exp functions are based in each case upon quasi-
  59. "optimal" polynomial approximations. My definition of "optimal" was
  60. based upon getting good accuracy with reasonable speed.
  61. (5) The argument reducing code for the trig function effectively uses
  62. a value of pi which is accurate to more than 128 bits. As a consequence,
  63. the reduced argument is accurate to more than 64 bits for arguments up
  64. to a few pi, and accurate to more than 64 bits for most arguments,
  65. even for arguments approaching 2^63. This is far superior to an
  66. 80486, which uses a value of pi which is accurate to 66 bits.
  67. The code of the emulator is complicated slightly by the need to
  68. account for a limited form of re-entrancy. Normally, the emulator will
  69. emulate each FPU instruction to completion without interruption.
  70. However, it may happen that when the emulator is accessing the user
  71. memory space, swapping may be needed. In this case the emulator may be
  72. temporarily suspended while disk i/o takes place. During this time
  73. another process may use the emulator, thereby perhaps changing static
  74. variables. The code which accesses user memory is confined to five
  75. files:
  76. fpu_entry.c
  77. reg_ld_str.c
  78. load_store.c
  79. get_address.c
  80. errors.c
  81. As from version 1.12 of the emulator, no static variables are used
  82. (apart from those in the kernel's per-process tables). The emulator is
  83. therefore now fully re-entrant, rather than having just the restricted
  84. form of re-entrancy which is required by the Linux kernel.
  85. ----------------------- Limitations of wm-FPU-emu -----------------------
  86. There are a number of differences between the current wm-FPU-emu
  87. (version 2.01) and the 80486 FPU (apart from bugs). The differences
  88. are fewer than those which applied to the 1.xx series of the emulator.
  89. Some of the more important differences are listed below:
  90. The Roundup flag does not have much meaning for the transcendental
  91. functions and its 80486 value with these functions is likely to differ
  92. from its emulator value.
  93. In a few rare cases the Underflow flag obtained with the emulator will
  94. be different from that obtained with an 80486. This occurs when the
  95. following conditions apply simultaneously:
  96. (a) the operands have a higher precision than the current setting of the
  97. precision control (PC) flags.
  98. (b) the underflow exception is masked.
  99. (c) the magnitude of the exact result (before rounding) is less than 2^-16382.
  100. (d) the magnitude of the final result (after rounding) is exactly 2^-16382.
  101. (e) the magnitude of the exact result would be exactly 2^-16382 if the
  102. operands were rounded to the current precision before the arithmetic
  103. operation was performed.
  104. If all of these apply, the emulator will set the Underflow flag but a real
  105. 80486 will not.
  106. NOTE: Certain formats of Extended Real are UNSUPPORTED. They are
  107. unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,
  108. and Unnormals. None of these will be generated by an 80486 or by the
  109. emulator. Do not use them. The emulator treats them differently in
  110. detail from the way an 80486 does.
  111. Self modifying code can cause the emulator to fail. An example of such
  112. code is:
  113. movl %esp,[%ebx]
  114. fld1
  115. The FPU instruction may be (usually will be) loaded into the pre-fetch
  116. queue of the CPU before the mov instruction is executed. If the
  117. destination of the 'movl' overlaps the FPU instruction then the bytes
  118. in the prefetch queue and memory will be inconsistent when the FPU
  119. instruction is executed. The emulator will be invoked but will not be
  120. able to find the instruction which caused the device-not-present
  121. exception. For this case, the emulator cannot emulate the behaviour of
  122. an 80486DX.
  123. Handling of the address size override prefix byte (0x67) has not been
  124. extensively tested yet. A major problem exists because using it in
  125. vm86 mode can cause a general protection fault. Address offsets
  126. greater than 0xffff appear to be illegal in vm86 mode but are quite
  127. acceptable (and work) in real mode. A small test program developed to
  128. check the addressing, and which runs successfully in real mode,
  129. crashes dosemu under Linux and also brings Windows down with a general
  130. protection fault message when run under the MS-DOS prompt of Windows
  131. 3.1. (The program simply reads data from a valid address).
  132. The emulator supports 16-bit protected mode, with one difference from
  133. an 80486DX. A 80486DX will allow some floating point instructions to
  134. write a few bytes below the lowest address of the stack. The emulator
  135. will not allow this in 16-bit protected mode: no instructions are
  136. allowed to write outside the bounds set by the protection.
  137. ----------------------- Performance of wm-FPU-emu -----------------------
  138. Speed.
  139. -----
  140. The speed of floating point computation with the emulator will depend
  141. upon instruction mix. Relative performance is best for the instructions
  142. which require most computation. The simple instructions are adversely
  143. affected by the FPU instruction trap overhead.
  144. Timing: Some simple timing tests have been made on the emulator functions.
  145. The times include load/store instructions. All times are in microseconds
  146. measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
  147. ms-dos, the next two columns are for emulators running with the djgpp
  148. ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
  149. using libm4.0 (hard).
  150. function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu
  151. + 60.5 154.8 76.5 139.4
  152. - 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7
  153. * 71.0 190.8 79.6 146.6
  154. / 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1
  155. sin() 310.8 4692.0 319.0 398.5
  156. cos() 284.4 4855.2 308.0 388.7
  157. tan() 495.0 8807.1 394.9 504.7
  158. atan() 328.9 4866.4 601.1 419.5-491.9
  159. sqrt() 128.7 crashed 145.2 227.0
  160. log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1
  161. exp() 479.1 6619.2 469.1 850.8
  162. The performance under Linux is improved by the use of look-ahead code.
  163. The following results show the improvement which is obtained under
  164. Linux due to the look-ahead code. Also given are the times for the
  165. original Linux emulator with the 4.1 'soft' lib.
  166. [ Linus' note: I changed look-ahead to be the default under linux, as
  167. there was no reason not to use it after I had edited it to be
  168. disabled during tracing ]
  169. wm-FPU-emu w original w
  170. look-ahead 'soft' lib
  171. + 106.4 190.2
  172. - 108.6-111.6 192.4-216.2
  173. * 113.4 193.1
  174. / 108.8-124.4 700.1-706.2
  175. sin() 390.5 2642.0
  176. cos() 381.5 2767.4
  177. tan() 496.5 3153.3
  178. atan() 367.2-435.5 2439.4-3396.8
  179. sqrt() 195.1 4732.5
  180. log() 358.0-387.5 3359.2-3390.3
  181. exp() 619.3 4046.4
  182. These figures are now somewhat out-of-date. The emulator has become
  183. progressively slower for most functions as more of the 80486 features
  184. have been implemented.
  185. ----------------------- Accuracy of wm-FPU-emu -----------------------
  186. The accuracy of the emulator is in almost all cases equal to or better
  187. than that of an Intel 80486 FPU.
  188. The results of the basic arithmetic functions (+,-,*,/), and fsqrt
  189. match those of an 80486 FPU. They are the best possible; the error for
  190. these never exceeds 1/2 an lsb. The fprem and fprem1 instructions
  191. return exact results; they have no error.
  192. The following table compares the emulator accuracy for the sqrt(),
  193. trig and log functions against the Turbo C "emulator". For this table,
  194. each function was tested at about 400 points. Ideal worst-case results
  195. would be 64 bits. The reduced Turbo C accuracy of cos() and tan() for
  196. arguments greater than pi/4 can be thought of as being related to the
  197. precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
  198. accurate to 64 bits can result in a relative accuracy in cos() of
  199. about 64 + log2(cos(x)) = 31 bits.
  200. Function Tested x range Worst result Turbo C
  201. (relative bits)
  202. sqrt(x) 1 .. 2 64.1 63.2
  203. atan(x) 1e-10 .. 200 64.2 62.8
  204. cos(x) 0 .. pi/2-(1e-10) 64.4 (x <= pi/4) 62.4
  205. 64.1 (x = pi/2-(1e-10)) 31.9
  206. sin(x) 1e-10 .. pi/2 64.0 62.8
  207. tan(x) 1e-10 .. pi/2-(1e-10) 64.0 (x <= pi/4) 62.1
  208. 64.1 (x = pi/2-(1e-10)) 31.9
  209. exp(x) 0 .. 1 63.1 ** 62.9
  210. log(x) 1+1e-6 .. 2 63.8 ** 62.1
  211. ** The accuracy for exp() and log() is low because the FPU (emulator)
  212. does not compute them directly; two operations are required.
  213. The emulator passes the "paranoia" tests (compiled with gcc 2.3.3 or
  214. later) for 'float' variables (24 bit precision numbers) when precision
  215. control is set to 24, 53 or 64 bits, and for 'double' variables (53
  216. bit precision numbers) when precision control is set to 53 bits (a
  217. properly performing FPU cannot pass the 'paranoia' tests for 'double'
  218. variables when precision control is set to 64 bits).
  219. The code for reducing the argument for the trig functions (fsin, fcos,
  220. fptan and fsincos) has been improved and now effectively uses a value
  221. for pi which is accurate to more than 128 bits precision. As a
  222. consequence, the accuracy of these functions for large arguments has
  223. been dramatically improved (and is now very much better than an 80486
  224. FPU). There is also now no degradation of accuracy for fcos and fptan
  225. for operands close to pi/2. Measured results are (note that the
  226. definition of accuracy has changed slightly from that used for the
  227. above table):
  228. Function Tested x range Worst result
  229. (absolute bits)
  230. cos(x) 0 .. 9.22e+18 62.0
  231. sin(x) 1e-16 .. 9.22e+18 62.1
  232. tan(x) 1e-16 .. 9.22e+18 61.8
  233. It is possible with some effort to find very large arguments which
  234. give much degraded precision. For example, the integer number
  235. 8227740058411162616.0
  236. is within about 10e-7 of a multiple of pi. To find the tan (for
  237. example) of this number to 64 bits precision it would be necessary to
  238. have a value of pi which had about 150 bits precision. The FPU
  239. emulator computes the result to about 42.6 bits precision (the correct
  240. result is about -9.739715e-8). On the other hand, an 80486 FPU returns
  241. 0.01059, which in relative terms is hopelessly inaccurate.
  242. For arguments close to critical angles (which occur at multiples of
  243. pi/2) the emulator is more accurate than an 80486 FPU. For very large
  244. arguments, the emulator is far more accurate.
  245. Prior to version 1.20 of the emulator, the accuracy of the results for
  246. the transcendental functions (in their principal range) was not as
  247. good as the results from an 80486 FPU. From version 1.20, the accuracy
  248. has been considerably improved and these functions now give measured
  249. worst-case results which are better than the worst-case results given
  250. by an 80486 FPU.
  251. The following table gives the measured results for the emulator. The
  252. number of randomly selected arguments in each case is about half a
  253. million. The group of three columns gives the frequency of the given
  254. accuracy in number of times per million, thus the second of these
  255. columns shows that an accuracy of between 63.80 and 63.89 bits was
  256. found at a rate of 133 times per one million measurements for fsin.
  257. The results show that the fsin, fcos and fptan instructions return
  258. results which are in error (i.e. less accurate than the best possible
  259. result (which is 64 bits)) for about one per cent of all arguments
  260. between -pi/2 and +pi/2. The other instructions have a lower
  261. frequency of results which are in error. The last two columns give
  262. the worst accuracy which was found (in bits) and the approximate value
  263. of the argument which produced it.
  264. frequency (per M)
  265. ------------------- ---------------
  266. instr arg range # tests 63.7 63.8 63.9 worst at arg
  267. bits bits bits bits
  268. ----- ------------ ------- ---- ---- ----- ----- --------
  269. fsin (0,pi/2) 547756 0 133 10673 63.89 0.451317
  270. fcos (0,pi/2) 547563 0 126 10532 63.85 0.700801
  271. fptan (0,pi/2) 536274 11 267 10059 63.74 0.784876
  272. fpatan 4 quadrants 517087 0 8 1855 63.88 0.435121 (4q)
  273. fyl2x (0,20) 541861 0 0 1323 63.94 1.40923 (x)
  274. fyl2xp1 (-.293,.414) 520256 0 0 5678 63.93 0.408542 (x)
  275. f2xm1 (-1,1) 538847 4 481 6488 63.79 0.167709
  276. Tests performed on an 80486 FPU showed results of lower accuracy. The
  277. following table gives the results which were obtained with an AMD
  278. 486DX2/66 (other tests indicate that an Intel 486DX produces
  279. identical results). The tests were basically the same as those used
  280. to measure the emulator (the values, being random, were in general not
  281. the same). The total number of tests for each instruction are given
  282. at the end of the table, in case each about 100k tests were performed.
  283. Another line of figures at the end of the table shows that most of the
  284. instructions return results which are in error for more than 10
  285. percent of the arguments tested.
  286. The numbers in the body of the table give the approx number of times a
  287. result of the given accuracy in bits (given in the left-most column)
  288. was obtained per one million arguments. For three of the instructions,
  289. two columns of results are given: * The second column for f2xm1 gives
  290. the number cases where the results of the first column were for a
  291. positive argument, this shows that this instruction gives better
  292. results for positive arguments than it does for negative. * In the
  293. cases of fcos and fptan, the first column gives the results when all
  294. cases where arguments greater than 1.5 were removed from the results
  295. given in the second column. Unlike the emulator, an 80486 FPU returns
  296. results of relatively poor accuracy for these instructions when the
  297. argument approaches pi/2. The table does not show those cases when the
  298. accuracy of the results were less than 62 bits, which occurs quite
  299. often for fsin and fptan when the argument approaches pi/2. This poor
  300. accuracy is discussed above in relation to the Turbo C "emulator", and
  301. the accuracy of the value of pi.
  302. bits f2xm1 f2xm1 fpatan fcos fcos fyl2x fyl2xp1 fsin fptan fptan
  303. 62.0 0 0 0 0 437 0 0 0 0 925
  304. 62.1 0 0 10 0 894 0 0 0 0 1023
  305. 62.2 14 0 0 0 1033 0 0 0 0 945
  306. 62.3 57 0 0 0 1202 0 0 0 0 1023
  307. 62.4 385 0 0 10 1292 0 23 0 0 1178
  308. 62.5 1140 0 0 119 1649 0 39 0 0 1149
  309. 62.6 2037 0 0 189 1620 0 16 0 0 1169
  310. 62.7 5086 14 0 646 2315 10 101 35 39 1402
  311. 62.8 8818 86 0 984 3050 59 287 131 224 2036
  312. 62.9 11340 1355 0 2126 4153 79 605 357 321 1948
  313. 63.0 15557 4750 0 3319 5376 246 1281 862 808 2688
  314. 63.1 20016 8288 0 4620 6628 511 2569 1723 1510 3302
  315. 63.2 24945 11127 10 6588 8098 1120 4470 2968 2990 4724
  316. 63.3 25686 12382 69 8774 10682 1906 6775 4482 5474 7236
  317. 63.4 29219 14722 79 11109 12311 3094 9414 7259 8912 10587
  318. 63.5 30458 14936 393 13802 15014 5874 12666 9609 13762 15262
  319. 63.6 32439 16448 1277 17945 19028 10226 15537 14657 19158 20346
  320. 63.7 35031 16805 4067 23003 23947 18910 20116 21333 25001 26209
  321. 63.8 33251 15820 7673 24781 25675 24617 25354 24440 29433 30329
  322. 63.9 33293 16833 18529 28318 29233 31267 31470 27748 29676 30601
  323. Per cent with error:
  324. 30.9 3.2 18.5 9.8 13.1 11.6 17.4
  325. Total arguments tested:
  326. 70194 70099 101784 100641 100641 101799 128853 114893 102675 102675
  327. ------------------------- Contributors -------------------------------
  328. A number of people have contributed to the development of the
  329. emulator, often by just reporting bugs, sometimes with suggested
  330. fixes, and a few kind people have provided me with access in one way
  331. or another to an 80486 machine. Contributors include (to those people
  332. who I may have forgotten, please forgive me):
  333. Linus Torvalds
  334. [email protected]
  335. [email protected]
  336. Nick Holloway, [email protected]
  337. Hermano Moura, [email protected]
  338. Jon Jagger, [email protected]
  339. Lennart Benschop
  340. Brian Gallew, [email protected]
  341. Thomas Staniszewski, [email protected]
  342. Martin Howell, [email protected]
  343. M Saggaf, [email protected]
  344. Peter Barker, [email protected]
  345. [email protected]
  346. Dan Russel, [email protected]
  347. Daniel Carosone, [email protected]
  348. [email protected]
  349. Hamish Coleman, [email protected]
  350. Bruce Evans, [email protected]
  351. Timo Korvola, [email protected]
  352. Rick Lyons, [email protected]
  353. Rick, [email protected]
  354. ...and numerous others who responded to my request for help with
  355. a real 80486.