2

I am working on a personal project involving the manipulation of linear algebra concepts. This project involves, in the first instance, replicating (translating) an R (64bit version) code into a Python code. The problem described here attempts to reproduce the problem I am facing.

Based on some internet search I have done, this problem seems to be related to the handling of decimal numbers with great precision.

Here are the two pieces of code:

R code:

library(base)

# load, subsetting and convert df to matrix
df <- read.csv('hansen99.csv') 
arr <- data.matrix(df[c(4:20)])

# crossprod leads to a square matrix and do inverse calculation
A <- t(arr) %*% arr # crossprod leads to a square matrix
A_inv <- solve(A)   

# affichage avec 100 décimales
sprintf("%.100f",A_inv[1,1]) 
# '0.0023559636604639148764472889041599046322517096996307373046875000000000000000000000000000000000000000'

Python code:

import numpy as np
import numpy.linalg as npla
import pandas as pd

# load, subsetting and convert df to array
df = pd.read_csv('hansen99.csv')
arr = df.iloc[:, 3:21].values

# crossprod leads to a square matrix and do inverse calculation
A = np.transpose(arr) @ arr
A_inv = npla.inv(A)

# affichage avec 100 décimales
"{:.100f}".format(A_inv[0,0])
# '0.0023559636604639157438090268925634518382139503955841064453125000000000000000000000000000000000000000'

A_inv is therefore the inverse of a given matrix A (csv file). If I take the first element of this matrix and print it with 100 decimal places, I obtain :

R      : '0.0023559636604639*1*48764472889041599046322517096996307373046875000000000000000000000000000000000000000'

Python : '0.0023559636604639*1*57438090268925634518382139503955841064453125000000000000000000000000000000000000000'

We can see that after the 17th decimal place (between stars) the sequence of numbers displayed is different between R and Python.

I've done some research and it seems to be related to the decimal resolution of types.

In python, I have the command for the float64 type (Don't know how to get it in R):

np.finfo(np.float64)
# finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)

Notes:

This difference does not seem to be related to a difference between the respective functions of the two languages. I called the R functions in Python and the Python functions in R to make sure.

Data:

If you are interested in data follow https://github.com/ceteuf/PSTR/tree/main/data/hansen99.csv

Questions:

1) What do you think is the reason for this difference? (need clarification, and possibly good resources to learn more)

2) Is it possible, by modifying a configuration or whatever, to reproduce exactly the same result and behaviour as R?

2b) Or was I completely wrong? In that case, do you have any leads?

9
  • In R, you can find resolution/precision information in .Machine.
    – r2evans
    Apr 23, 2021 at 15:35
  • Hello, I know this method but I can't see the equivalent resolution item as in Python.
    – ce.teuf
    Apr 23, 2021 at 15:39
  • 1
    .Machine$double.xmax ?
    – KU99
    Apr 23, 2021 at 16:01
  • 1
    @r2evans I have edited the misleading "not found for R"
    – ce.teuf
    Apr 23, 2021 at 16:23
  • 3
    They should both be using 64-bit IEEE floats. Whatever the problem is, I doubt that it is one of precision. It could be simply one of display. Without using the decimal module in Python, something like "%.100f" is just pointless. IEEE floats don't have that much precision in either R or Python. Apr 23, 2021 at 16:29

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Browse other questions tagged or ask your own question.