Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should '$' and '[' behave differently for pandas DataFrames? #251

Open
kevinushey opened this issue May 8, 2018 · 4 comments
Open

should '$' and '[' behave differently for pandas DataFrames? #251

kevinushey opened this issue May 8, 2018 · 4 comments

Comments

@kevinushey
Copy link
Collaborator

kevinushey commented May 8, 2018

E.g. in Python:

```{python}
import pandas as pd
pdf = pd.DataFrame({"pop": [1]})
print(pdf["pop"]) # accesses pop column
print(pdf.pop)    # accesses pop DataFrame method
```

However, for Python objects exposed to R, [[ attempts to first access attributes rather than columns:

> library(reticulate)
> df <- data.frame(pop = 1)
> pdf <- r_to_py(df)
> pdf$pop
<bound method DataFrame.pop of    pop
0  1.0>
> pdf[["pop"]]
<bound method DataFrame.pop of    pop
0  1.0>

Should [[ prefer accessing items rather than attributes for DataFrames?

I believe a similar question exists for Python dictionaries, and other objects implementing __getitem__ in general.

@kevinushey
Copy link
Collaborator Author

After a chat with @jjallaire, we agree that we should try to migrate the semantics such that:

  • $ is analogous to Python's . operator; that is, it attempts to retrieve attributes (typically methods) on the object;
  • [[ and [ are analogous to Python's [ operator; that is, it is used for accessing items (__getitem__).

We'll plan to issue a warning if the use of the $ operator ended up resolving an item rather than an attribute, just so existing user code has a path for migration.

@flying-sheep
Copy link
Contributor

flying-sheep commented Feb 4, 2019

I think another possibility would have been to make $ and [[ equivalent to getattr and [ equivalent to __getitem__. That would have the advantage that it’s easy to get attributes with calculated names, e.g.

for (attr_name in attrs) print(py_obj[[attr_name]])

however I think it would be quite confusing to let [[ and [ do completely different things, and it’s still possible to get aforementioned functionality via the more obscure

for (attr_name in attrs) print(`$`(py_obj, attr_name))

@dfalbel
Copy link
Member

dfalbel commented Aug 15, 2023

With #1431 the ambiguity is solved with:

py_run_string('
import pandas as pd
pdf = pd.DataFrame({"pop": [1]})
print(pdf["pop"]) # accesses pop column
print(pdf.pop)    # accesses pop DataFrame method
')

py$pdf$pop
py$pdf@pop

@t-kalinowski
Copy link
Member

After taking a closer look, I see that $ already prefers accessing attributes.

The benefit of adding a @ method would be that it would remove the potential for silent errors, where a call like x@foo would raise an attribute error, while x$foo would fall back silently to getitem().

  library(reticulate)
  py_run_string("import pandas as pd")
  pdf <- py_eval('pd.DataFrame({"pop": [1], "abc": [2]})', convert = F)
  pdf$pop
#> <bound method DataFrame.pop of    pop  abc
#> 0    1    2>
  pdf@pop
#> <bound method DataFrame.pop of    pop  abc
#> 0    1    2>
  pdf["pop"]
#> 0    1
#> Name: pop, dtype: int64
  pdf[["pop"]]
#> 0    1
#> Name: pop, dtype: int64
  
  pdf$abc
#> 0    2
#> Name: abc, dtype: int64
  pdf@abc
#> 0    2
#> Name: abc, dtype: int64
  pdf["abc"]
#> 0    2
#> Name: abc, dtype: int64
  pdf[["abc"]]
#> 0    2
#> Name: abc, dtype: int64

Created on 2023-08-15 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants