基于一个列中的值和另一个数据框中的边界值添加列的快速方法

发布于 2025-01-21 23:55:39 字数 940 浏览 0 评论 0原文

我正在尝试这样做。我有一个DF,DF_A,带有一个列的“循环”,可单调增加值。我还有另一个DF,DF_B,带有2列,“ cycle_bound”和“ name”。我想做的是在df_a中创建一个列“名称”,以便对于所有循环的值< cycle_bound(并且大于以前的cycle_bound),df_a中的“名称”来自df_b中的“名称”。下面的一个示例,请原谅语法,不确定如何在文本中表示

df_A['cycle'] = {0,2,3,6,8,10,35,36}
df_B['cycle_bound','name'] = {(3,one),(11,two),(40,three)}

我要创建的

df_A['cycle','name'] = {(0,one),(2,one),(3,two),(6,two),(8,two),(10,two),(35,three),(36,three)}

文字,我已经使用apply/lambda方法来完成此操作,并调用在df_b上使用itrows()的函数,但仍然相当慢。我的DF_A大约有100万行,而DF_B大约有十行。我正在尝试查看是否有更快的方法,也许是一种矢量化 / numpy方法,但找不到特定于这种情况的任何东西,或者也许我无法很好地搜索。

我的代码现在看起来像这样(我首先添加了一个下限列,以便于DF_B中的轻松):

df_A['Name'] = df_A.apply(lambda x: findName(x['cycle']), axis=1)

def findName(cycle):
  for index, l_row in df_B.iterrows():
    if cycle >= l_row['cycle_lowerbound'] and cycle < l_row['cycle_upperbound']:
      return l_row['Name']

谢谢!

I am trying to do something like this. I have a df, df_A with one column, "cycle", of monotonically increasing values. I have another df, df_B with 2 columns, "cycle_bound" and "name". What I want to do is create a column in df_A, "name" such that for all values of cycle < cycle_bound (and greater than the previous cycle_bound), "name" in df_A is filled with "name" from df_B. An example below, please excuse syntax, not sure how to represent that in text

df_A['cycle'] = {0,2,3,6,8,10,35,36}
df_B['cycle_bound','name'] = {(3,one),(11,two),(40,three)}

I want to create

df_A['cycle','name'] = {(0,one),(2,one),(3,two),(6,two),(8,two),(10,two),(35,three),(36,three)}

I have done this using apply/lambda approach and calling a function that uses iterrows() over df_B, but it is still fairly slow. My df_A has about a million rows and df_B has about ten. I am trying to see if there is a faster approach, maybe a vectorization / numpy approach, but couldn't find anything online specific to this case or maybe I am unable to search well enough.

My code looks something like this right now (I added a lower bound column first for ease in df_B):

df_A['Name'] = df_A.apply(lambda x: findName(x['cycle']), axis=1)

def findName(cycle):
  for index, l_row in df_B.iterrows():
    if cycle >= l_row['cycle_lowerbound'] and cycle < l_row['cycle_upperbound']:
      return l_row['Name']

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

心凉 2025-01-28 23:55:39

您想要一个合并

具体确保将方向设置为“向前”,以便它在正确的边界之间合并,我明确设置了allow_exact_matches = false在上限处强制执行&lt; =。

import pandas as pd

df_A = pd.DataFrame({'cycle': [0,2,3,6,8,10,35,36]})
df_B = pd.DataFrame({'cycle_bound': [3, 11, 40],
                     'cycle_name': ['one', 'two', 'three']})

pd.merge_asof(df_A, df_B, 
              left_on='cycle', right_on='cycle_bound',
              direction='forward', allow_exact_matches=False)

   cycle  cycle_bound cycle_name
0      0            3        one
1      2            3        one
2      3           11        two
3      6           11        two
4      8           11        two
5     10           11        two
6     35           40      three
7     36           40      three

You want an asof merge.

Specifically make sure to set the direction to 'forward' so that it merges between the correct bounds and I explicitly set allow_exact_matches=False to enforce the <, not <=, at the upper bound.

import pandas as pd

df_A = pd.DataFrame({'cycle': [0,2,3,6,8,10,35,36]})
df_B = pd.DataFrame({'cycle_bound': [3, 11, 40],
                     'cycle_name': ['one', 'two', 'three']})

pd.merge_asof(df_A, df_B, 
              left_on='cycle', right_on='cycle_bound',
              direction='forward', allow_exact_matches=False)

   cycle  cycle_bound cycle_name
0      0            3        one
1      2            3        one
2      3           11        two
3      6           11        two
4      8           11        two
5     10           11        two
6     35           40      three
7     36           40      three
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文