I'm looking for an equivalent for (int) log2(X) that does not suffer from floating point accuracy problems.

I have found several discussions about this problem (both in stackoverflow and outside), but none were able to satisfy all of the following three conditions:

  • Portability: Should be able to run in multiple OS's and architectures.
  • Efficiency: Constant time would be best.
  • Accuracy: Should not have accuracy errors.

A few solutions I have found or thought include (For the time complexities I'm using N for the quantity of bits, not the value of X):

  • Use GCC's __builtin_clz. This solution is compiler dependent.
  • Use the bsr instruction. This solution is architecture dependent.
  • Iterate through the powers of 2. This solution is O(N).
  • Binary search through the powers of 2. This solution is O(log(N)). Also, it might present some overhead compared to a simpler solution.
  • Use C's log2 to get an initial guess and iterate through the powers of 2 from it. I'm not sure if this solves the accuracy problem, but maybe this would work. Still, using floats through log2 probably wouldn't beat a simpler integer-only solution.

Maybe I have overlooked some aspect of the proposed solutions, but all of them will break one of the three conditions.

Is there a solution that will satisfy all three conditions? (Note that I'm not necessarily asking for an easy/simple solution, although that would be preferred)

  • 3
    Bit twiddling hacks to the rescue! See also the 4 sections above that one.
    – Kninnug
    Jun 12, 2018 at 9:53
  • That's amazing! Even though it's not O(1), it's incredibly optimized. I would still like to know if there is a way to satisfy the three conditions at the same time, but this is already much better than the binary search solution I listed in the question. Thanks! Jun 13, 2018 at 6:43
  • In practice, on every mainstream architecture you'll have a compiler built-in function that does it. And that's that. And that's what you're supposed to be using. Furthermore, your use of O notation is very problematic: such notation is useless in understanding performance of short loops (e.g. here we have a few dozen steps in the most naive implementation, or just a couple when it's optimized). The compilers will unroll naive loops, and processors will parallelize microinstructions that don't have data dependencies. Those act like O(1). A call to log2 will be the slowest. Dec 2, 2020 at 7:13
  • In fact, when you implement any bit-twiddling solution, the performance will be determined by the need for serializing instruction execution due to data dependencies, and it's likely that code that does more work but has fewer data dependencies will be faster than "simpler" code that has to be serialized by the CPU. All naive "textbook" approaches to estimating the complexity/cost will be basically no better than random guessing in terms of their ability to predict actual performance on actual hardware. Your idea of cost applies on Arduino, not on desktop/mobile CPUs. Dec 2, 2020 at 7:16


Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Browse other questions tagged or ask your own question.