将3D XARRAY数据集转换为DataFrame

发布于 2025-02-02 21:42:03 字数 2251 浏览 3 评论 0 原文

我已经导入了这样的Xarray数据集,并从CSV文件中的区域定义的坐标处提取值,以及由日期范围定义的时间段(A(LON,LAT)网格的30天,每个坐标的某些环境值) 。

from xgrads import open_CtlDataset
ds_Snow = open_CtlDataset(path + 'file')
ds_Snow = ds_Snow.sel(lat = list(set(zones['lat'])), lon = list(set(zones['lon'])), 
time = period, method = 'nearest')

当我查找DS_SNOW的信息时,这就是我得到的:

Dimensions:  (lat: 12, lon: 12, time: 30)
Coordinates:
  * time     (time) datetime64[ns] 2000-09-01 2000-09-02 ... 2000-09-30
  * lat      (lat) float32 3.414e+06 3.414e+06 3.414e+06 ... 3.414e+06 3.414e+06
  * lon      (lon) float32 6.873e+05 6.873e+05 6.873e+05 ... 6.873e+05 6.873e+05
Data variables:
    spre     (time, lat, lon) float32 dask.array<chunksize=(1, 12, 12), meta=np.ndarray>
Attributes:
    title:    SnowModel
    undef:    -9999.0 type : <class 'xarray.core.dataset.Dataset'>

我想将其作为数据框架,尊重初始尺寸(时间,LAT,LON)。 所以我这样做了:

df_Snow = ds_Snow.to_dataframe()

但是这里是数据框的维度:

print(df_Snow)
lat       lon        time            
3414108.0 687311.625 2000-09-01   0.0
                     2000-09-02   0.0
                     2000-09-03   0.0
                     2000-09-04   0.0
                     2000-09-05   0.0
...                               ...
                     2000-09-26   0.0
                     2000-09-27   0.0
                     2000-09-28   0.0
                     2000-09-29   0.0
                     2000-09-30   0.0

[4320 rows x 1 columns]

看起来所有数据都刚刚放入一个列中。 我已经尝试按照一些文档解释的尺寸订单:

df_Snow = ds_Snow.to_dataframe(dim_order = ['time', 'lat', 'lon'])

但是它没有改变任何内容,并且我似乎无法在论坛或文档中找到答案。我想知道一种将数组配置保留在数据框架中的方法。

编辑:我找到了一个解决方案

,而不是转换Xarray,我选择使用pd.的每个属性构建我的dataframe:

ds_Snow = ds_Snow.sel(lat = list(set(station_list['lat_utm'])),lon = list(set(station_list['lon_utm'])), time = Ind_Run_ERA5_Land, method = 'nearest')
time = pd.Series(ds_Snow.coords["time"].values)
lon = pd.Series(ds_Snow.coords["lon"].values)
lat = pd.Series(ds_Snow.coords["lat"].values)
spre = pd.Series(ds_Snow['spre'].values[:,0,0])
frame = { 'spre': spre, 'time': time, 'lon' : lon,  'lat' : lat}
df_Snow = pd.DataFrame(frame)

I have imported a xarray dataset like this and extracted the values at coordinates defined by zones from a csv file, and a time period defined by a date range (30 days of a (lon,lat) grid with some environmental values for every coordinates).

from xgrads import open_CtlDataset
ds_Snow = open_CtlDataset(path + 'file')
ds_Snow = ds_Snow.sel(lat = list(set(zones['lat'])), lon = list(set(zones['lon'])), 
time = period, method = 'nearest')

When i look for the information of ds_Snow, this is what I get :

Dimensions:  (lat: 12, lon: 12, time: 30)
Coordinates:
  * time     (time) datetime64[ns] 2000-09-01 2000-09-02 ... 2000-09-30
  * lat      (lat) float32 3.414e+06 3.414e+06 3.414e+06 ... 3.414e+06 3.414e+06
  * lon      (lon) float32 6.873e+05 6.873e+05 6.873e+05 ... 6.873e+05 6.873e+05
Data variables:
    spre     (time, lat, lon) float32 dask.array<chunksize=(1, 12, 12), meta=np.ndarray>
Attributes:
    title:    SnowModel
    undef:    -9999.0 type : <class 'xarray.core.dataset.Dataset'>

I would like to make it a dataframe, respecting the initial dimensions (time, lat, lon).
So I did this :

df_Snow = ds_Snow.to_dataframe()

But here are the dimensions of the dataframe :

print(df_Snow)
lat       lon        time            
3414108.0 687311.625 2000-09-01   0.0
                     2000-09-02   0.0
                     2000-09-03   0.0
                     2000-09-04   0.0
                     2000-09-05   0.0
...                               ...
                     2000-09-26   0.0
                     2000-09-27   0.0
                     2000-09-28   0.0
                     2000-09-29   0.0
                     2000-09-30   0.0

[4320 rows x 1 columns]

It looks like all the data just got put in a single column.
I have tried giving the dimensions orders as some documentation explained :

df_Snow = ds_Snow.to_dataframe(dim_order = ['time', 'lat', 'lon'])

But it does not change anything, and I can't seem to find an answer in forums or the documentation. I would like to know a way to keep the array configuration in the dataframe.

EDIT : I found a solution

Instead of converting the xarray, I chose to build my dataframe with pd.Series of each attributes like this :

ds_Snow = ds_Snow.sel(lat = list(set(station_list['lat_utm'])),lon = list(set(station_list['lon_utm'])), time = Ind_Run_ERA5_Land, method = 'nearest')
time = pd.Series(ds_Snow.coords["time"].values)
lon = pd.Series(ds_Snow.coords["lon"].values)
lat = pd.Series(ds_Snow.coords["lat"].values)
spre = pd.Series(ds_Snow['spre'].values[:,0,0])
frame = { 'spre': spre, 'time': time, 'lon' : lon,  'lat' : lat}
df_Snow = pd.DataFrame(frame)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

沉鱼一梦 2025-02-09 21:42:03

这是预期的行为。来自 docs

索引坐标的笛卡尔产物索引数据框#pandas.multiindex“ rel =“ nofollow noreferrer”> pandas.multiindex )。其他坐标作为列中包括在数据框中。

数据集中只有一个变量, spre 。其他属性,“坐标”已成为索引。由于有几个坐标( lat lon time ),因此数据框架具有层次结构

您可以通过 或,如果要更改索引数据框的方式,则可以使用

This is the expected behaviour. From the docs:

The DataFrame is indexed by the Cartesian product of index coordinates (in the form of a pandas.MultiIndex). Other coordinates are included as columns in the DataFrame.

There is only one variable, spre, in the dataset. The other properties, the 'coordinates' have become the index. Since there were several coordinates (lat, lon, and time), the DataFrame has a hierarchical MultiIndex.

You can either get the index data through tools like get_level_values or, if you want to change how the DataFrame is indexed, you can use reset_index().

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文