To understand the logic behind this, its good to have understanding of the storage of variables in the memory. This is described below under "Number
Representation". For the user who know this or directly want to go through the root cause, skip "Number Representation" and refer "Root Cause Discussion".
Number Representation
The storage of char, int, or long in memory is simple and little straight forward. The value is simply converted into binary and stored in memory, e.g., if the char is 'A' its numeric equivalent is 65 and it is stored in memory as the binary of 65 (i.e., 01000001).
The conversion is little different when storing signed char, signed int, or signed long. In such case the first bit represent the sign of the value (0 for positive and 1 for negative value) and rest of the bits represent the binary equivalent of the number (binary equivalent is converted into 2's compliment form for negative values).
The storage of float and double need some better representation as float and double contains both integer and decimal part. And if we simply distribute the total allocated memory for float and double the range will be small. For example, for the 4 bytes allocated for float if we distribute 2 byte for integer part and 2 byte for decimal part and represent it in the same way as we represent char, int or long then the value for float will range from -32768.65535 to 32767.65535, which is too small.
Various representation were suggested for efficient storage of float and doubles in memory. The most widely accepted/used representation is IEEE 754 floating point representation.
IEEE 754 floating point representation define four format as single precision (32bit), double precision (64bit), single extended precision (>= 43bit), and double extended precision (>=79bits). Out of these single precision and double precision are widely used.
For the storage of floating point number the total memory allocated for the floating point number is divided in to three parts as sign, exponent, and fraction. For single precision, sign is 1bit, exponent is 8bit and fraction is 23bit. For double precision, exponent bias is 1023, exponent is 11bit and fraction is 52bit.
When we represent any floating point number in this format, there are many numbers whose fraction part can't be converted into an exact binary representation and hence there is loss of some precision in storage of floating point numbers. As the double precision have more bits to represent fraction so the loss of precision is less in case of double precision than single precision.
For more details on this refer
<a href="https://en.wikipedia.org/wiki/IEEE_floating_point" target="_blank" rel="nofollow noopener noreferrer">Ieee Floating Point</a>
and
<a href="https://www.validlab.com/goldberg/paper.pdf" target="_blank" rel="nofollow noopener noreferrer">PDF</a>
Root Cause Discussion
The float of C language is 32bit and stores data in single precision format, whereas double of C language is 64bit and stores data in double precision format. Hence, when number is in represented double its more accurate than the number represented in float. And when we assign a double to a float there may be some loss of precision (for the numbers which can be represented without loss of precision in float will also not suffer ant precision loss when assigned from float to double).
Consider the below code.
#include "stdio.h"
int main() {
float f = 1.2;
double d = 1.2;
printf("Float: %64.63e\n", f);
printf("Double: %64.63e\n", d);
return 0;
}
The output of the code is
Float: 1.200000047683715820312500000000000000000000000000000000000000000e+00
Double: 1.199999999999999955591079014993738383054733276367187500000000000e+00
As you can see above, though we assign same value '1.2' to both float and double, the actual value that got stored is different with loss of some precision.
Also as double used double precision and should contain more accurate value. This is reflected in above example. The value represented by double is more close to '1.2' then the value represented by float.
Consider another code given below.
#include "stdio.h"
int main() {
float f = 1.5;
double d = 1.5;
printf("Float: %64.63e\n", f);
printf("Double: %64.63e\n", d);
return 0;
}
The output of the code is
Float: 1.500000000000000000000000000000000000000000000000000000000000000e+00
Double: 1.500000000000000000000000000000000000000000000000000000000000000e+00
So for the values that can be converted into floating point representation without loss of precision will not have differences in float and double.
Consider one more example which will make things more clear.
#include "stdio.h"
int main() {
float f1 = 1.2;
float f2 = 1.2f;
double d1 = 1.2;
double d2 = 1.2f;
printf("Float f1: %64.63e\n", f1);
printf("Float f2: %64.63e\n", f2);
printf("1.2f: %64.63e\n", 1.2f);
printf("Double d1: %64.63e\n", d1);
printf("Double d2: %64.63e\n", d2);
printf("1.2: %64.63e\n", 1.2);
return 0;
}
The output of the code is
Float f1: 1.200000047683715820312500000000000000000000000000000000000000000e+00
Float f2: 1.200000047683715820312500000000000000000000000000000000000000000e+00
1.2f: 1.200000047683715820312500000000000000000000000000000000000000000e+00
Double d1: 1.199999999999999955591079014993738383054733276367187500000000000e+00
Double d2: 1.200000047683715820312500000000000000000000000000000000000000000e+00
1.2: 1.199999999999999955591079014993738383054733276367187500000000000e+00
Here when we assign '1.2' or '1.2f' to float or print '1.2f' it always display output as
1.200000047683715820312500000000000000000000000000000000000000000e+00.
Whereas when we assign '1.2' or '1.2f' to double or print '1.2', the output for '1.2f' assigned to double is different from others.
Reason being, in C language when we say '1.2' it by default represents double value. For representing float we need to append 'f' at the end as '1.2f' or need to use explicit type conversion as '(float)1.2'.
- In "float f1 = 1.2;" we try to assign double to float, there is loss of some extra precision due to internal type casting and final value that it holds is of 1.2 as float.
- In "float f2 = 1.2f;" we assign float to float and hence the value it holds is of 1.2 as float.
- In double d1 = 1.2; we assign double to double and hence the value that it holds is of 1.2 as double.
- In double d2 = 1.2f; we try to assign float to double, the float has already lost its precision which we can't get back when assigning to double and hence the value that it holds is of 1.2 as float.
- In "printf("1.2f: %64.63e\n", 1.2f);" we print the value of 1.2 as float.
- In "printf("1.2: %64.63e\n", 1.2);" we print the value of 1.2 as double.
I feel by now most of the things should be clear, so lets take the actual code.
#include<stdio.h>
main()
{
float a=1.2;
if(a==1.2) printf("Equal");
else printf("Not Equal");
}
Here we are assigning double '1.2' to a float and hence there is loss of some extra precision. Then we compare the float value of 1.2 with double value of 1.2 which are different and hence it output the result as "Not Equal".
Where as when we have "a=1.5", corresponding float and double value is same and thus comparison return "Equal".
Suggestions
When doing floating point arithmetic (or any arithmetic in general) take care of type and the implicit typecasting that will happen. For example if the above code is written as:
#include<stdio.h>
main()
{
float a=1.2f;
if(a==1.2f) printf("Equal");
else printf("Not Equal");
}
It will output "Equal" as we will be comparing float of 1.2 with float of 1.2 here.
Let me know if any item need more clarification.
-Pradeep