All Questions

Tagged with
43 questions with no upvoted or accepted answers
Filter by
Sorted by
Tagged with
57views

Generate all numbers of the binary system (B=2, t=3, L=-2, U=3)

Suppose we have the following binary system (B=2, t=3, L=-2, U=3) where B is the base of the system, since it's a binary system, B is of course 2. t is the precision of the number, usually refers to ...
• 13.6k
1vote
60views

How to deterministically divide floats when there is a known deviation from IEEE 754?

DirectX 11 allows GPU manufacturers to deviate from the rounding behavior specified in the IEEE 754 standard. I cannot enable IEEE strictness, because I don't control the shader compilation process. ...
• 2,253
1vote
69views

9 Bit Floating Point to Hex

I've been running into an issue where I'm trying to convert a 9-bit floating point number to hex, where the floating point scheme is 1 sign-bit, followed by 4 bit exponent, and then a 4-bit mantissa. ...
1vote
35views

Decimal-module is not working on binary level? How?

I'm a beginner when it comes to what's going on in the background when I start the program. Right now my focus is on the difference between "decimal" and "float" in "Python&...
1vote
76views

Converting a Twos-complement number to its binary representation

I am doing some bitwise manipulation. I am adding a 32 bit number to another 32 bit number with |= instead of += because I was thinking the number might be getting messed up when it converts to 32 ...
1vote
125views

PHP unpack float return unexpected answer

I have some kind of binary data. I try to get the Float from that data. We already have the Java version for that program. So we already know what result should we get. When we try with the following ...
• 4,946
1vote
341views

largest integer that can be stored in a double such that all integers less than can be accurately stored as well

This is some more clarification to the question that was already answered some time ago here: biggest integer that can be stored in a double The top answer mentions that "the largest integer such ...
• 61
1vote
626views

Adding two IEEE floating point numbers in Java

I am having a difficult time figuring out how to correctly add two IEEE floating point numbers using Java. I'm not sure how to proceed in actually adding the mantissas together because I don't get how ...
1vote
729views

Create a function that converts decimals into IEEE 754 floating point precision numbers using MATLAB?

I need to create a function that does exactly as the title says and outputs a 32 character string. So far I can get the signed bit and the exponential part correctly. What I'm struggling with is ...
1vote
3kviews

C: convert a real number to 64 bit floating point binary

I'm trying to write a code that converts a real number to a 64 bit floating point binary. In order to do this, the user inputs a real number (for example, 547.4242) and the program must output a 64 ...
• 17
1vote
2kviews

Converting from Double to Binary

I've been a lot of trouble figuring this class problem. My due date is tomorrow and I still don't know how to do it. I made a code when the input put by the user is converted into binary, octal, and ...
30views

Is anyone able to understand how to add these minifloat (1 bit sign, 3 bit exponent, 4 bit fraction) bit numbers?

I have a task where we add two binary numbers, but I do not understand the solution. Previously, it is so we have given binary numbers, in minifloat format. In our case, the minifloat is defined like ...
40views

Binary Floating point to decimal JavaScript function

To answer the original question that was closed due to an invalid moderation. Binary Floating point to decimal JavaScript fucntion This function below basically does the math based on the floating ...
• 17
96views

Manually calculating IEEE-754 floating point fractions and splitting up the bits - Python

I'm trying to come up with a way to do this: Lets say the fraction portion of my IEEE-754 floating point number is 0b10110011001100110011010 I'm trying to take each bit and multiply it by a power of 2 ...
119views

I get confused because of the hidden bit in the mantissa. From what i know: Subtract the two exponent, find the smaller number and shift the mantissa with the hidden bit (?) by the result of the ...
• 37
383views

IEEE-754 Floating Point Standard: Representing Numbers

Given an IEEE-754 standard floating point number with 6 bits of exponent, and 25 bits of mantissa. 1: What's the smallest non-infinite positive integer this representation CANNOT represent? My answer: ...
435views

How to convert DEC 64bit double precision floating point to IEEE-754 (DEC is not decimal)

To clarify here DEC in this case is Digital Equipment Corporation NOT decimal. What I have is a binary representation of DEC with seeeeeeeefffff....(total f's is 54) s = sign bit, e = exponent, f = ...
• 45
175views

Floating-point mantissa and exponent base 2

I'm trying to understand how to get the mantissa and the exponent in this case. Here's an example I have in my book. I have this formula (-1)^s * (1 + M) * b^E-e = x s = 0 or 1 (the sign) M = mantissa ...
• 13
269views

Calculating smallest postive floating point number

A 16-bit floating point representation is defined like this: 1 bit for sign, 6 bits for exponent and 9 bits for significand. Floating point number must be normalized (in the form 0.1... × 2exp). ...
351views

Convert periodic binary number to decimal

I have a simple question that is confusing me. Convert the periodic binary number (0.1011)_2 (imagine a bar over the digits after the comma) to a decimal representation. If it weren't periodic, no ...
• 1
403views

Decimal fraction to binary with precision

Consider the following fraction: 9.8765 How do I go about converting it to precise floating point binary without losing any value? Now consider the following fraction 9.87654321 Again, how would ...
• 55
36views

Floating point multiplication in JavaScript

I tried the below JavaScript in Chrome console: // Case 1 0.11 * 10000 // 1100 0.14 * 10000 // 1400.000...2 // Both of 0.11 and 0.14 are infinite in binary number // Case 2 0.14 * 10 // 1.4000...1 0....
• 462
51views

Verify floating point representation

I want to encode -(263.125) in base 10. I encoded it and arrived at this solution : 11000011110000011100100000000000 I just want to make sure that it is correct. Thank you in advance.
• 42
59views

Why are floats denormalized to deal with underflow instead of overflow?

When denormalized couldn't the exponent on, let's say single point precision floats, represent 128 (instead of -126) and the mantissa (with an added 1 at the end) just be multiplied by 2^{128}? This ...
679views

How to convert decimal(with float point) to binary in Swift 3? (self written code without third-party library and Foundation)

I am looking for a simple way to convert a decimal with floating point to binary with floating point in Swift 3. For example, this code converts decimal to binary without any problems. func ...
• 77
212views

Converting float to binary using IEEE754 standard

How to convert 32-bits float binary using IEEE754 standard in Elixir. Converting integer is possible by using Integer.to_string/2 and passing base 2 as the second option iex> Integer.to_string(5, ...
• 1,100
32views

Should the exponent be larger than or equal 2^52 for the 64 bit floating point to be an integer irrespective of the value of the mantissa

I'm trying to understand if the exponent should be larger than 2^52 for the 64 bit floating point to always get integer. It seems so to me. Here is my reasoning: If mantissa has 52 bits, then the ...
• 92.8k
77views

How to does rounding off binary number works?

I am currently learning floating points and my question is: if I have a number like 0.01 and I want to round it off then I have 0.1 but if I have a number , 0.000000001, does that mean I have 0.1 as ...
• 365
47views

what is the result of (54.125) - (184)10

I am practicing for midterm and apprently there's no answer key for it. However, I practiced and got a result but not sure if this is correct since the solution is really long. perfrom the following ...
207views

Is it possible to modify the bits of a float?

I'm wondering if there is a way to get the value of the mantissa or the value of the exponent and modify them in order to create a new float variable, for example say I want a float to have exponent ...
100views

Extracting bits and reconstructing in C

For a class project I'm trying to extract 3 bit fields from an IEEE floating point number, multiply it by 0.5, and reconstruct the number. I've gotten extraction mostly working by pushing them into ...
• 864
147views

How to convert a hexadecimal number to floating point binary by hand?

How does one convert a hexadecimal number to its floating point binary equivalent (assuming that one exists, IEEE 754) by hand? I'd prefer simplified working rules (since this is not the main focus ...
72views

Calculate range for any n-Bit long extended floating point

I am trying to imagine an 80-bit extended precision form with a 1 bit sign, 16-bit exponent,and 63-bit fraction excluding the implied 1 before the binary point. I know that to calculate the bias for ...
• 323
163views

Convert 1.0 x 2^-140 to single precision floating point

So I'm trying to understand how I would convert such a number like 1.0 x 2^-140 in single floating point precision considering it is out of range. Any tips how I would go about solving this in binary?
• 371
3kviews

read float and double from binary data in C++

I need to be able to read in a float or double from binary data in C++, similarly to Python's struct.unpack function. My issue is that the data I am receiving will always be big-endian. I have dealt ...
• 18.4k
78views

How would I write this in IEEE standards?

I would like to know how to write 5/32 in IEEE754 standard. Is there a shortcut to do the fraction part? The answer is 0 10000010 00100000000000000000000. But there has to be an easier way to write 5/...
• 679
371views

fractional binary subtraction

I am having difficulty understanding why the following binary subtraction gives the result that it does. I keep getting a different answer. I am trying to compute 0.1-x such that x is 0....
135views

Floating point number representation in binary

I'm working on a problem out of Cracking The Coding Interview that asks: Given a 2-D graph with points on it, find a line which passes the most number of points. The solution is to: Draw an infinite ...
• 1,460
301views

reading float values from binary file (in after effects script)

I have a binary file containing data recorded using a c program. the data stored in files are float values. Now I need to retrieve the float numbers from binary file in after effects script. This is ...
775views

how to convert a binary string into two's complement and IEEE in C

SO I have been tasked to write a program that scans in two 32-character binary strings from the user, which will always be 32 characters long. After reading the numbers in, i'm supposed to ask the ...
• 935