将3D XARRAY数据集转换为DataFrame

发布于 2025-02-02 21:42:03 字数 2251 浏览 3 评论 0 原文

我已经导入了这样的Xarray数据集，并从CSV文件中的区域定义的坐标处提取值，以及由日期范围定义的时间段（A（LON，LAT）网格的30天，每个坐标的某些环境值）。

from xgrads import open_CtlDataset
ds_Snow = open_CtlDataset(path + 'file')
ds_Snow = ds_Snow.sel(lat = list(set(zones['lat'])), lon = list(set(zones['lon'])), 
time = period, method = 'nearest')

当我查找DS_SNOW的信息时，这就是我得到的：

Dimensions:  (lat: 12, lon: 12, time: 30)
Coordinates:
  * time     (time) datetime64[ns] 2000-09-01 2000-09-02 ... 2000-09-30
  * lat      (lat) float32 3.414e+06 3.414e+06 3.414e+06 ... 3.414e+06 3.414e+06
  * lon      (lon) float32 6.873e+05 6.873e+05 6.873e+05 ... 6.873e+05 6.873e+05
Data variables:
    spre     (time, lat, lon) float32 dask.array<chunksize=(1, 12, 12), meta=np.ndarray>
Attributes:
    title:    SnowModel
    undef:    -9999.0 type : <class 'xarray.core.dataset.Dataset'>

我想将其作为数据框架，尊重初始尺寸（时间，LAT，LON）。所以我这样做了：

df_Snow = ds_Snow.to_dataframe()

但是这里是数据框的维度：

print(df_Snow)
lat       lon        time            
3414108.0 687311.625 2000-09-01   0.0
                     2000-09-02   0.0
                     2000-09-03   0.0
                     2000-09-04   0.0
                     2000-09-05   0.0
...                               ...
                     2000-09-26   0.0
                     2000-09-27   0.0
                     2000-09-28   0.0
                     2000-09-29   0.0
                     2000-09-30   0.0

[4320 rows x 1 columns]

看起来所有数据都刚刚放入一个列中。我已经尝试按照一些文档解释的尺寸订单：

df_Snow = ds_Snow.to_dataframe(dim_order = ['time', 'lat', 'lon'])

但是它没有改变任何内容，并且我似乎无法在论坛或文档中找到答案。我想知道一种将数组配置保留在数据框架中的方法。

编辑：我找到了一个解决方案

，而不是转换Xarray，我选择使用pd.的每个属性构建我的dataframe：

ds_Snow = ds_Snow.sel(lat = list(set(station_list['lat_utm'])),lon = list(set(station_list['lon_utm'])), time = Ind_Run_ERA5_Land, method = 'nearest')
time = pd.Series(ds_Snow.coords["time"].values)
lon = pd.Series(ds_Snow.coords["lon"].values)
lat = pd.Series(ds_Snow.coords["lat"].values)
spre = pd.Series(ds_Snow['spre'].values[:,0,0])
frame = { 'spre': spre, 'time': time, 'lon' : lon,  'lat' : lat}
df_Snow = pd.DataFrame(frame)

原文

I have imported a xarray dataset like this and extracted the values at coordinates defined by zones from a csv file, and a time period defined by a date range (30 days of a (lon,lat) grid with some environmental values for every coordinates).

from xgrads import open_CtlDataset
ds_Snow = open_CtlDataset(path + 'file')
ds_Snow = ds_Snow.sel(lat = list(set(zones['lat'])), lon = list(set(zones['lon'])), 
time = period, method = 'nearest')

When i look for the information of ds_Snow, this is what I get :

Dimensions:  (lat: 12, lon: 12, time: 30)
Coordinates:
  * time     (time) datetime64[ns] 2000-09-01 2000-09-02 ... 2000-09-30
  * lat      (lat) float32 3.414e+06 3.414e+06 3.414e+06 ... 3.414e+06 3.414e+06
  * lon      (lon) float32 6.873e+05 6.873e+05 6.873e+05 ... 6.873e+05 6.873e+05
Data variables:
    spre     (time, lat, lon) float32 dask.array<chunksize=(1, 12, 12), meta=np.ndarray>
Attributes:
    title:    SnowModel
    undef:    -9999.0 type : <class 'xarray.core.dataset.Dataset'>

I would like to make it a dataframe, respecting the initial dimensions (time, lat, lon).
So I did this :

df_Snow = ds_Snow.to_dataframe()

But here are the dimensions of the dataframe :

print(df_Snow)
lat       lon        time            
3414108.0 687311.625 2000-09-01   0.0
                     2000-09-02   0.0
                     2000-09-03   0.0
                     2000-09-04   0.0
                     2000-09-05   0.0
...                               ...
                     2000-09-26   0.0
                     2000-09-27   0.0
                     2000-09-28   0.0
                     2000-09-29   0.0
                     2000-09-30   0.0

[4320 rows x 1 columns]

It looks like all the data just got put in a single column.
I have tried giving the dimensions orders as some documentation explained :

df_Snow = ds_Snow.to_dataframe(dim_order = ['time', 'lat', 'lon'])

But it does not change anything, and I can't seem to find an answer in forums or the documentation. I would like to know a way to keep the array configuration in the dataframe.

EDIT : I found a solution

Instead of converting the xarray, I chose to build my dataframe with pd.Series of each attributes like this :

ds_Snow = ds_Snow.sel(lat = list(set(station_list['lat_utm'])),lon = list(set(station_list['lon_utm'])), time = Ind_Run_ERA5_Land, method = 'nearest')
time = pd.Series(ds_Snow.coords["time"].values)
lon = pd.Series(ds_Snow.coords["lon"].values)
lat = pd.Series(ds_Snow.coords["lat"].values)
spre = pd.Series(ds_Snow['spre'].values[:,0,0])
frame = { 'spre': spre, 'time': time, 'lon' : lon,  'lat' : lat}
df_Snow = pd.DataFrame(frame)

分享到QQ

分享到微博