Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][Docs] Document behavior of to_pandas with flat and nested timezone arrays #41643

Open
amoeba opened this issue May 13, 2024 · 1 comment · May be fixed by #44437
Open

[Python][Docs] Document behavior of to_pandas with flat and nested timezone arrays #41643

amoeba opened this issue May 13, 2024 · 1 comment · May be fixed by #44437

Comments

@amoeba
Copy link
Member

amoeba commented May 13, 2024

Describe the enhancement requested

In #41162 it was reported that PyArrow's to_pandas method silently drops timezone information from nested Timestamp arrays. For example,

import pandas as pd
import pyarrow as pa

ts = pandas.Timestamp('2024-01-01 12:00:00+0000', tz = 'Europe/Paris')

# unnested, we get a timezone-aware result
pa.Array.from_pandas([myts]).to_pandas()[0]
# => Timestamp('2024-01-01 13:00:00+0100', tz='Europe/Paris')

# nested, we get a timezone-naive result
pa.Array.from_pandas([[myts]]).to_pandas()[0][0]
# => numpy.datetime64('2024-01-01T12:00:00.000000')

The reason for this is explained the comments of #41162 and the upshot is of that is that we may not change the behavior at the moment. Therefore, I think it would be good to at least document the current behavior, including what workarounds may exist.

Component(s)

Documentation, Python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants