熊猫 - 在数据集名称中的索引和索引匹配列表值,并将列表值与新列列表值

发布于 2025-02-10 06:14:41 字数 859 浏览 2 评论 0原文

我有一个类似的熊猫数据框架:

Dataset                             Volume_ft3
Sonar_Raster_0.tif                  2055
Sonar_Raster_1.tif                  6784
Sonar_Raster_FocalMean_5x5_0.tif    2045
Sonar_Raster_FocalMean_5x5_1.tif    6752

我想附加一个名为“ sonar_points”的新列,该列是根据数据集名称中的唯一数值标识符匹配从列表到数据集的值。

我的列表是[5525,4374],我需要在数据集名称中查找索引号,将其匹配到列表索引,然后在新列中输出该值,以便于此,该值是结果的数据框:

Dataset                             Volume_ft3    Sonar_Points
Sonar_Raster_0.tif                  2055          5525
Sonar_Raster_1.tif                  6784          4374
Sonar_Raster_FocalMean_5x5_0.tif    2045          5525
Sonar_Raster_FocalMean_5x5_1.tif    6752          4374

我已经尝试了下面的代码,但没有考虑具有相同索引的数据集。

df = df.append(pd.DataFrame(Sonar_pts_List, columns=['Sonar_Points']),ignore_index=False)

I have a Pandas dataframe like so:

Dataset                             Volume_ft3
Sonar_Raster_0.tif                  2055
Sonar_Raster_1.tif                  6784
Sonar_Raster_FocalMean_5x5_0.tif    2045
Sonar_Raster_FocalMean_5x5_1.tif    6752

I want to append a new column called "Sonar_Points" that matches values from a list to the dataset based on the unique numerical identifier in the Dataset name.

My list is [5525,4374] and I need to look for the index number in the Dataset name, match it to the list index, then output that value in a new column to where this is the resulting dataframe:

Dataset                             Volume_ft3    Sonar_Points
Sonar_Raster_0.tif                  2055          5525
Sonar_Raster_1.tif                  6784          4374
Sonar_Raster_FocalMean_5x5_0.tif    2045          5525
Sonar_Raster_FocalMean_5x5_1.tif    6752          4374

I've tried the below code, but it doesn't account for datasets with the same index.

df = df.append(pd.DataFrame(Sonar_pts_List, columns=['Sonar_Points']),ignore_index=False)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

油饼 2025-02-17 06:14:41

您可以将数据集标识符提取到单独的列中,然后将其用于MERGE带有声纳点值列表的数据:

df['spi'] = df['Dataset'].str.extract(r'_(\d+)\.').astype(int)
df = df.merge(pd.DataFrame(l, columns=['Sonar_Points']), left_on='spi', right_index=True).drop('spi', axis=1).sort_index()

输出:

                            Dataset  Volume_ft3  Sonar_Points
0                Sonar_Raster_0.tif        2055          5525
1                Sonar_Raster_1.tif        6784          4374
2  Sonar_Raster_FocalMean_5x5_0.tif        2045          5525
3  Sonar_Raster_FocalMean_5x5_1.tif        6752          4374

You could extract the dataset identifier into a separate column and then use that to merge the data with the list of Sonar Points values:

df['spi'] = df['Dataset'].str.extract(r'_(\d+)\.').astype(int)
df = df.merge(pd.DataFrame(l, columns=['Sonar_Points']), left_on='spi', right_index=True).drop('spi', axis=1).sort_index()

Output:

                            Dataset  Volume_ft3  Sonar_Points
0                Sonar_Raster_0.tif        2055          5525
1                Sonar_Raster_1.tif        6784          4374
2  Sonar_Raster_FocalMean_5x5_0.tif        2045          5525
3  Sonar_Raster_FocalMean_5x5_1.tif        6752          4374
我不在是我 2025-02-17 06:14:41

一种使用pandas.series.str.stract的方法。

注意:如果有超过列表长度的索引,这将失败

l = [5525, 4374]

df["Sonar_Points"] =  [l[i] for i in 
                       df["Dataset"].str.extract("_(\d+)\.", expand=False).astype(int)]
print(df)

输出:

                            Dataset  Volume_ft3  Sonar_Points
0                Sonar_Raster_0.tif        2055          5525
1                Sonar_Raster_1.tif        6784          4374
2  Sonar_Raster_FocalMean_5x5_0.tif        2045          5525
3  Sonar_Raster_FocalMean_5x5_1.tif        6752          4374

One way using pandas.Series.str.extract.

Note: this will fail if there is an index that exceeds the length of list.

l = [5525, 4374]

df["Sonar_Points"] =  [l[i] for i in 
                       df["Dataset"].str.extract("_(\d+)\.", expand=False).astype(int)]
print(df)

Output:

                            Dataset  Volume_ft3  Sonar_Points
0                Sonar_Raster_0.tif        2055          5525
1                Sonar_Raster_1.tif        6784          4374
2  Sonar_Raster_FocalMean_5x5_0.tif        2045          5525
3  Sonar_Raster_FocalMean_5x5_1.tif        6752          4374
盛夏尉蓝 2025-02-17 06:14:41

您可以轻松地使用pd.concat解决此问题,因为每个数据框架的长度匹配。

df = pd.concat([df, pd.DataFrame(Sonar_pts_List, columns='Sonar_Points')], axis=1)

您还可以使用以下内容将新列分配给A dataframe ,并使用匹配索引:

df['Sonar_Points'] = Sonar_pts_List

如果从上面发生错误。只需克隆一个小列(即便宜的值,以至于不占用过多的资源),即可 backfill 新值。

一个例子:

df['Sonar_Points'] = df['Volume_ft3']
df['Sonar_Points'] = Sonar_pts_List

You can easily solve this issue with pd.concat since the length of each dataframe match.

df = pd.concat([df, pd.DataFrame(Sonar_pts_List, columns='Sonar_Points')], axis=1)

you can also assign new columns to a dataframe with matching indexes using the following:

df['Sonar_Points'] = Sonar_pts_List

if an error occurs from the above. Simply clone a small column (that is a column with cheap values as to not take up too much resources) to be able to backfill the new values.

An example:

df['Sonar_Points'] = df['Volume_ft3']
df['Sonar_Points'] = Sonar_pts_List
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文