Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.
Questions tagged [floating-point]
449 questions
76
votes
4 answers
Why does the google calculator give $\tan 90^{\circ} = 1.6331779e^{+16}$?
I typed in $\tan 90^{\circ}$ in Google and it gave $1.6331779\mathrm{E}16$. How did it come to this answer? Limits? Some magic?
Gizmo
- 929
- 1
- 7
- 12
57
votes
1 answer
Show that floating point $\sqrt{x \cdot x} \geq x$ for all long $x$.
I verified experimentally that in Java the equality
Math.sqrt(x*x) = x
holds for all long x such that x*x doesn't overflow. Here, Java long is a $64$ bit signed type and double is a IEEE binary floating point type with at least $53$ bits mantissa…
maaartinus
- 1,381
- 1
- 11
- 17
24
votes
5 answers
Accurate floating-point linear interpolation
I want to perform a simple linear interpolation between $A$ and $B$ (which are binary floating-point values) using floating-point math with IEEE-754 round-to-nearest-or-even rounding rules, as accurately as possible. Please note that speed is not a…
Pedro Gimeno
- 743
- 6
- 12
20
votes
1 answer
Why is 'catastrophic cancellation' called so?
I was studying Numerical Analysis by K. Mukherjee; there he discussed Loss of Significant Figures by Subtraction, as followed:
In the subtraction of two approximate numbers, a serious type of error may be present when the numbers are nearly equal.…
user142971
18
votes
3 answers
Polynomial for very large number of roots
Update: Please also see this solution here provided by MattL.
I have the roots for a very large order polynomial (>100), and from those alone wish to recreate the polynomial and run into numerical challenges when using the poly function for doing…
Dan Boschen
- 485
- 2
- 13
16
votes
4 answers
Solving a quadratic equation with precision when using floating point variables
I know how to solve a basic quadratic equation with the formula
$$t_{1,2}=\dfrac{-b\pm\sqrt{b^2-4ac}}{2a}$$
but I learned that if $b \approx \sqrt{b^2-4ac}$ floating point precision may give slightly wrong results and this approach is better. It…
John Smith
- 685
- 2
- 6
- 15
16
votes
1 answer
Explain why catastrophic cancellation happens
After my own research, the following picture emerges as the most frequently used example of catastrophic cancellation (It is indeed used in my class).
Could anyone explain why the plot takes that shape? (i.e., the widely fluctuating jagged line…
Heisenberg
- 337
- 3
- 14
15
votes
3 answers
Plotting $\left(1+\frac{1}{x^n}\right)^{x^n}$.
When I plot the following function, the graph behaves strangely:
$$f(x) = \left(1+\frac{1}{x^{16}}\right)^{x^{16}}$$
While $\lim_{x\to +\infty} f(x) = e$ the graph starts to fade at $x \approx 6$. What's going on here? (plotted on my trusty old 32…
dietervdf
- 4,416
- 3
- 22
- 46
14
votes
1 answer
Why is the maximum value of my calculator $10^{100}$?
The maximum value I can have in my calculator is $10^{100}$. At first it seems right but when we consider that the calculator should stores its number as binary, this seems weird.
Why isn't the maximum exponent a base 2 number? Is this limit set…
Winter
- 902
- 6
- 23
13
votes
1 answer
Moving point along the vector
I'm making a video game in which a ball is moving towards a player. So, I have a point $P$ describing the point where the player is, and a point $B$ describing where the ball is. I know that we can represent the direction from $B$ to $A$ as a…
nullPointer2
- 153
- 1
- 1
- 5
12
votes
2 answers
Algebraic structure for the floating-point arithmetic.
Is it possible to consider floating-point arithmetic as some common algebraic structure?
For example, consider something like simplified IEEE754 single precision binary floating-point subset of numbers that consist of 1 sign bit, 23-bit mantissa…
Max Malysh
- 605
- 2
- 6
- 17
12
votes
1 answer
What's special about the number $1.000000015047466$E+$30$?
I'm a programmer by trade by I've run into a weirdly special number and need some help deciphering its significance.
I was writing some machine learning code that compiles into GPU kernel code and the compiler output the number 1.000000015047466E+30…
Shrey Gupta
- 223
- 1
- 6
11
votes
3 answers
How does rewriting $x^2 -y^2$ as $(x+y)(x-y)$ avoid catastrophic cancellation?
Why is rewriting $x^2 -y^2$ as $(x+y)(x-y)$ a way to avoid catastrophic cancellation?
We are still doing $(x-y)$. Is it because the last operation in the second form is a multiplication?
anonymous
- 121
- 1
- 1
- 3
9
votes
0 answers
Theory of floating point math
We learn about groups, rings and fields in algebra - but floating point numbers (like double in many modern programming languages) do not form one of the above algebraic entities because they are not associative.
Is there an algebraic theory behind…
J Fabian Meier
- 482
- 4
- 13
8
votes
2 answers
Conditioning of the linear systems in the inverse or Rayleigh quotient iteration algorithms
I'm working through the book Numerical Linear Algebra by Trefethen and Bau. In Lecture 27 (and exercise 27.5), the following claim is made about the inverse iteration algorithm:
Let $ A $ be a real, symmetric matrix. Solving the system $ (A - \mu…
user477739
- 303
- 1
- 8