9

I've been looking at some of the packages from the High perf task view dealing with GPU computations, and given that most GPU seem to be an order of magnitude stronger at performing single precision arithmetics than DP ones, I was wondering:

  1. why none of the packages gives more control to the user on the type of precision required? I can see many applications in statistics where SP arithmetics (i.e., number coded with 7 digit accuracy) are good enough for practical use (if I am overestimating the gains involved, let me know).
  2. is Python more flexible on this? If so, why? I don't see why the absence of a 'single' type in R would make including such an option (together with a warning) in say GPUtools or magma impossible (though I'll be happy to be shown wrong).

PS: I'm specifically thinking of applications where the numbers are already dimension-wise scaled and centered (so that Chebychev's inequality is binding) dimension-wise.

Shayan Shafiq
  • 633
  • 6
  • 17
user603
  • 21,225
  • 3
  • 71
  • 135
  • 2
    I confess to being mystified by this, despite struggling several times to make sense of it: is there a question here? "So bad" is vague and has no referent. What exactly are you seeking to understand or find out? – whuber Oct 10 '10 at 16:19
  • @Whuber:> My question was poorly worded. It probably was due to it being borne out of ignorance: i had read some white papers on use of GPU , (although, unfortunately it turns out, not the R command reference of GPUtools) and could not understand why all the tests were carried out in DP. I will re-phrase the question (and the title). – user603 Oct 11 '10 at 14:02

5 Answers5

6
  1. Because before GPUs there was no practical sense of using single reals; you never have too much accuracy and memory is usually not a problem. And supporting only doubles made R design simpler. (Although R supports reading/writing single reals.)
  2. Yes, because Python is aimed to be more compatible with compiled languages. Yet you are right that it is possible for R libraries' wrappers to do in-fly conversion (this of course takes time but this is a minor problem); you can try e-mailing GPU packages' maintainers requesting such changes.
  • I would say single vs double precision performance started mattering before GPUs, when memory bandwidth became a bottleneck. I'd agree that it didn't matter in the early years of R (and presumably Python). – Thomas Lumley Nov 15 '20 at 22:50
5

From the GPUtools help file, it seems that useSingle=TRUE is the default for the functions.

ars
  • 12,160
  • 1
  • 36
  • 54
  • @kwak: I find the answer above helpful, but it really doesn't answer the question posed - "is single precision so bad?" Perhaps you should reword your question? – csgillespie Oct 10 '10 at 07:57
  • @csgellespie: you are totally correct. I will reword this question so as it can be used by future readers. Indeed, the wording was particularly poor. – user603 Oct 11 '10 at 14:01
4

I presume that by GPU programming, you mean programming nvidia cards? In which case the underlying code calls from R and python are to C/CUDA.


The simple reason that only single precision is offered is because that is what most GPU cards support.

However, the new nvidia Fermi architecture does support double precision. If you bought a nvidia graphics card this year, then it's probably a Fermi. Even here things aren't simple:

  • You get a slight performance hit if you compile with double precision (a factor of two if I remember correctly).
  • On the cheaper cards Fermi cards, nvidia intentionally disabled double precision. However, it is possible to get round this and run double precision programs. I managed to do this on my GeForce GTX 465 under linux.

To answer the question in your title, "Is single precision OK?", it depends on your application (sorry crap answer!). I suppose everyone now uses double precision because it no longer gives a performance hit.

When I dabbled with GPUs, programming suddenly became far more complicated. You have to worry about things like:

  • warpsize and arranging your memory properly.
  • #threads per kernel.
  • debugging is horrible - there's no print statement in the GPU kernel statements
  • lack of random number generators
  • Single precision.
csgillespie
  • 11,849
  • 9
  • 56
  • 85
  • @ccgillespie:> i think my question may have been poorly worded. In the package i see (GPUtools, magma) double precision seems to be used as standard (with the loss of performance you describe). I was wondering why single precision is not offered as an option. – user603 Oct 09 '10 at 20:46
  • @kwak: The double precision values must be converted to single precision by the wrapper. The wrapper was just trying to be helpful. – csgillespie Oct 09 '10 at 20:51
  • @ccgillespie:> yes, but it seems the wrapper comes with performance costs exceeding the factor 2 you cite (again, correct me if i'm wrong on this) and in some cases no tangible benefits (i can think of many application in stat were SP FP arithmetics would be okay). I was wondering whether it makes sense to ask for an option to switch off said wrapper. – user603 Oct 09 '10 at 20:54
  • 2
    @kwak: Glancing at the GPUtools help file, it seems that `useSingle=TRUE` seems to be the default in the functions. Am I missing something here? – ars Oct 09 '10 at 20:56
  • @csgillespie: Remember, until relatively recently most nvidia cards simply **couldn't** do double precision computation. The factor of 2 hit is what I observed using raw C/CUDA code. Having a python/R wrapper may make this worst. – csgillespie Oct 09 '10 at 21:08
  • @ars:> argh. I didn't have a NVdia cart at home, so i didn't bother to read the manual. The white papers (and the GPU+R site) are all about double precision arithmetic. Would you be so kind to post you comment as a response so i can close the question ? – user603 Oct 09 '10 at 22:36
  • @kwak: ah, got it. I posted a separate answer. – ars Oct 09 '10 at 22:57
  • As noted above, the Fermi cards support double precision. The GTX 2xx series also support double precision -- but the Fermi cards GTX 4xx have approximated twice the double precision performance of the 2xx series. __However__, the Tesla 20xx product is four times faster still. Here is a [discussion](http://www.vizworld.com/2010/04/geforce-gtx-480-18supthsup-double-precision-performance/) about the crippled double perf. of GTX 480. [Specs for Tesla 20xx](http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html) (515 GFlops DP Perf) – M. Tibbits Oct 10 '10 at 00:12
  • [Specs for GTX 480](http://www.nvidia.com/object/product_geforce_gtx_480_us.html) -- 168 GFlops Peak DP Perf [Specs for GTX 285](http://www.nvidia.com/object/product_geforce_gtx_285_us.html) -- 88.5 GFlops Peak DP Perf. The performance numbers are not listed on the nvidia website for the GTX products, I found them [here.](http://techreport.com/articles.x/18682) – M. Tibbits Oct 10 '10 at 00:18
1

OK, a new answer to an old question but even more relevant now. The question you're asking has to do with finite precision, normally the domain of signal analysis and experimental mathematics.

Double precision (DP) floats let us pretend that finite precision problems don't exist, the same as we do with most real-world mathematical problems. In experimental math there is no pretending.

Single precision (SP) floats force us to consider quantization noise. If our machine learning models inherently reject noise, such as neural nets (NN), convolutional nets (CNN), residual nets (ResN), etc, then SP most often gives similar results to DP.

Half precision (HP) floats (now supported in cuda toolkit 7.5) require that quantization effects (noise and rounding) be considered. Most likely we'll soon see HP floats in the common machine learning toolkits.

There is recent work to create lower precision computations in floats as well as fixed precision numbers. Stochastic rounding has enabled convergence to procede with CNNs whereas the solution diverges without it. These papers will help you to improve your understanding of the problems with the use of finite precision numbers in machine learning.

To address your questions:

SP is not so bad. As you point out it's twice as fast, but it also allows you to put more layers into memory. A bonus is in saving overhead getting data on and off the gpu. The faster computations and the lower overhead result in lower convergence times. That said, HP, for some problems, will be better in some parts of the network and not in others.

  1. It seems to me that many of the machine learning toolkits handle SPs and DPs. Perhaps someone else with a wider range of experience with the toolkits will add their nickle.
  2. Python will support what the gpu toolkit supports. You don't want to use python data types because then you'll be running an interpreted script on the cpu.

Note that the trend in neural networks now is to go with very deep layers, with runs of more than a few days common on the fastest gpu clusters.

r3mnant
  • 111
  • 2
1

The vast majority of GPUs in circulation only support single precision floating point.

As far as the title question, you need to look at the data you'll be handling to determine if single precision is enough for you. Often, you'll find that singles are perfectly acceptable for >90% of the data you handle, but will fail spectacularly for that last 10%; unless you have an easy way of determining whether your particular data set will fail or not, you're stuck using double precision for everything.

  • Can you elaborate a bit? It seems some iterative algorithm (matrix invert, QR decomposition) seem to work well. I'm also curious as to whether the inaccuracy of SP becomes more of a problem for operations involving larger arrays. – user603 Oct 10 '10 at 08:23
  • There are two parts to it: 1) What does the data represent? 2) How do you process the data? If you're looking at thousands of points of data from a medical study, single precision would likely be plenty for quantifying patient wellness, and I doubt you would ever need double. Geometry, on the other hand, could require either single or double precision depending on your scale & zoom. Calculating the trajectory of a probe to Saturn would always require doubles, as even small errors could drastically effect the result. You need to look at the data and decide what your tolerances are. – Benjamin Chambers Oct 10 '10 at 22:55
  • 1
    It will depend on the numerical stability of the algorithm you are using and how well-conditioned the problem is. Remember that double precision gives you access to smaller numbers as well as larger ones. – James Oct 11 '10 at 11:18
  • 1
    Not necessarily smaller or larger numbers; remember, we're dealing with floating point. Rather, it lets you use larger and smaller numbers in relation to each other, while preserving the significant digits. – Benjamin Chambers Oct 11 '10 at 21:54