属性更改具有可变的时间步长的数量

发布于 2025-02-04 20:45:39 字数 2208 浏览 1 评论 0原文

我想模拟可变天数的个人变化和死亡率的个人变化。我的数据框的格式如下...

    import pandas as pd

    data = {'unique_id':  ['2', '4', '5', '13'],
            'length': ['27.7', '30.2', '25.4', '29.1'],
            'no_fish': ['3195', '1894', '8', '2774'],
            'days_left': ['253', '253', '254', '256'],
            'growth': ['0.3898', '0.3414', '0.4080', '0.3839']
           }

    df = pd.DataFrame(data)

    print(df)

      unique_id length no_fish days_left  growth
    0         2   27.7    3195       253  0.3898
    1         4   30.2    1894       253  0.3414
    2         5   25.4       8       254  0.4080
    3        13   29.1    2774       256  0.3839

理想情况下，我希望最初的长度（即，长度）增加一年中剩余的每一天（即days_left）的每日增长率（即增长）。

    df['final'] = df['length'] + (df['days_left'] * df['growth']

但是，我还想使用特定于大小的方程式每天更新每个人所代表的鱼类（即NO_FISH）。我是Python的新手，所以我最初认为使用循环（我不确定是否还有另一种更有效的方法）。我的代码如下：

# keep track of run time - START
start_time = time.perf_counter()

df['z'] = 0.0
for indx in range(len(df)): 
    count = 1
    while count <= int(df.days_to_forecast[indx]):
   
        # (1) update individual length
        df.lgth[indx] = df.lgth[indx] + df.linearGR[indx]
    
        # (2) estimate daily size-specific mortality 
        if df.lgth[indx] > 50.0:
            df.z[indx] = 0.01
        else:
            if df.lgth[indx] <= 50.0:
                df.z[indx] = 0.052857-((0.03/35)*df.lgth[indx])
            elif df.lgth[indx] < 15.0:
                df.z[indx] = 0.728*math.exp(-0.1892*df.lgth[indx])
    
        df['no_fish'].round(decimals = 0)
        if df.no_fish[indx] < 1.0:
            df.no_fish[indx] = 0.0
        elif df.no_fish[indx] >= 1.0:
            df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
    
        # (3) reduce no. of days left in forecast by 1
        count = count + 1

# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))

上述代码现在正常工作，但是每人40,000个人在200多天的时间内运行效率仍然很远。

我真的很感谢有关如何修改以下代码以使其使其使其变得Pythonic 的任何建议。

谢谢

原文

I would like to simulate individual changes in growth and mortality for a variable number of days. My dataframe is formatted as follows...

    import pandas as pd

    data = {'unique_id':  ['2', '4', '5', '13'],
            'length': ['27.7', '30.2', '25.4', '29.1'],
            'no_fish': ['3195', '1894', '8', '2774'],
            'days_left': ['253', '253', '254', '256'],
            'growth': ['0.3898', '0.3414', '0.4080', '0.3839']
           }

    df = pd.DataFrame(data)

    print(df)

      unique_id length no_fish days_left  growth
    0         2   27.7    3195       253  0.3898
    1         4   30.2    1894       253  0.3414
    2         5   25.4       8       254  0.4080
    3        13   29.1    2774       256  0.3839

Ideally, I would like the initial length (i.e., length) to increase by the daily growth rate (i.e., growth) for each of the days remaining in the year (i.e., days_left).

    df['final'] = df['length'] + (df['days_left'] * df['growth']

However, I would also like to update the number of fish that each individual represents (i.e., no_fish) on a daily basis using a size-specific equation. I'm fairly new to python so I initially thought to use a for-loop (I'm not sure if there is another, more efficient way). My code is as follows:

# keep track of run time - START
start_time = time.perf_counter()

df['z'] = 0.0
for indx in range(len(df)): 
    count = 1
    while count <= int(df.days_to_forecast[indx]):
   
        # (1) update individual length
        df.lgth[indx] = df.lgth[indx] + df.linearGR[indx]
    
        # (2) estimate daily size-specific mortality 
        if df.lgth[indx] > 50.0:
            df.z[indx] = 0.01
        else:
            if df.lgth[indx] <= 50.0:
                df.z[indx] = 0.052857-((0.03/35)*df.lgth[indx])
            elif df.lgth[indx] < 15.0:
                df.z[indx] = 0.728*math.exp(-0.1892*df.lgth[indx])
    
        df['no_fish'].round(decimals = 0)
        if df.no_fish[indx] < 1.0:
            df.no_fish[indx] = 0.0
        elif df.no_fish[indx] >= 1.0:
            df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
    
        # (3) reduce no. of days left in forecast by 1
        count = count + 1

# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))

The above code now works correctly, but it is still far to inefficient to run for 40,000 individuals each for 200+ days.

I would really appreciate any advice on how to modify the following code to make it pythonic.

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

云巢 2025-02-11 20:45:39

向我建议的另一个选择是使用 pd.dataframe.apply 函数。这大大减少了整个运行时间，将来可能对其他人有用。

### === RUN SIMULATION === ###
start_time = time.perf_counter()          # keep track of run time -- START

#-------------------------------------------------------------------------#
def function_to_apply( df ):

    df['z_instantMort'] = ''           
 
    for indx in range(int(df['days_left'])):
    
        # (1) update individual length
        df['length'] = df['length'] + df['growth']
      
        # (2) estimate daily size-specific mortality 
        if df['length'] > 50.0:
            df['z_instantMort'] = 0.01
        else:
            if df['length'] <= 50.0:
                df['z_instantMort'] = 0.052857-((0.03/35)*df['length'])
            elif df['length'] < 15.0:
                df['z_instantMort'] = 0.728*np.exp(-0.1892*df['length'])
  
        whole_fish = round(df['no_fish'], 0)
        if whole_fish < 1.0:
            df['no_fish'] = 0.0
        elif whole_fish >= 1.0:
            df['no_fish'] = df['no_fish']*np.exp(-(df['z_instantMort']))
          
return df
#-------------------------------------------------------------------------#

sim_results = df.apply(function_to_apply, axis=1)

total_elapsed_time = round(time.perf_counter() - start_time, 2)       # END
print("Forecast iteration completed in {} seconds".format(total_elapsed_time)) 
print(sim_results)
### ====================== ###

输出为...

Forecast iteration completed in 0.05 seconds
   unique_id    length     no_fish  days_left  growth  z_instantMort
0        2.0  126.3194  148.729190      253.0  0.3898           0.01
1        4.0  116.5742   93.018465      253.0  0.3414           0.01
2        5.0  129.0320    0.000000      254.0  0.4080           0.01
3       13.0  127.3784  132.864757      256.0  0.3839           0.01

Another option that was suggested to me is to use the pd.dataframe.apply function. This dramatically reduced the overall the run time and could be useful to someone else in the future.

### === RUN SIMULATION === ###
start_time = time.perf_counter()          # keep track of run time -- START

#-------------------------------------------------------------------------#
def function_to_apply( df ):

    df['z_instantMort'] = ''           
 
    for indx in range(int(df['days_left'])):
    
        # (1) update individual length
        df['length'] = df['length'] + df['growth']
      
        # (2) estimate daily size-specific mortality 
        if df['length'] > 50.0:
            df['z_instantMort'] = 0.01
        else:
            if df['length'] <= 50.0:
                df['z_instantMort'] = 0.052857-((0.03/35)*df['length'])
            elif df['length'] < 15.0:
                df['z_instantMort'] = 0.728*np.exp(-0.1892*df['length'])
  
        whole_fish = round(df['no_fish'], 0)
        if whole_fish < 1.0:
            df['no_fish'] = 0.0
        elif whole_fish >= 1.0:
            df['no_fish'] = df['no_fish']*np.exp(-(df['z_instantMort']))
          
return df
#-------------------------------------------------------------------------#

sim_results = df.apply(function_to_apply, axis=1)

total_elapsed_time = round(time.perf_counter() - start_time, 2)       # END
print("Forecast iteration completed in {} seconds".format(total_elapsed_time)) 
print(sim_results)
### ====================== ###

output being...

Forecast iteration completed in 0.05 seconds
   unique_id    length     no_fish  days_left  growth  z_instantMort
0        2.0  126.3194  148.729190      253.0  0.3898           0.01
1        4.0  116.5742   93.018465      253.0  0.3414           0.01
2        5.0  129.0320    0.000000      254.0  0.4080           0.01
3       13.0  127.3784  132.864757      256.0  0.3839           0.01

回复收藏 0 原文

Spring初心 2025-02-11 20:45:39

正如我在评论中所说的那样，在这种情况下，可选的循环替代方案是使用向量操作。例如，运行代码：

import pandas as pd
import time
import math
import numpy as np

data = {'unique_id':  [2, 4, 5, 13],
        'length': [27.7, 30.2, 25.4, 29.1],
        'no_fish': [3195, 1894, 8, 2774],
        'days_left': [253, 253, 254, 256],
        'growth': [0.3898, 0.3414, 0.4080, 0.3839]
       }

df = pd.DataFrame(data)

print(df)

# keep track of run time - START
start_time = time.perf_counter()

df['z'] = 0.0
for indx in range(len(df)): 
    count = 1
    while count <= int(df.days_left[indx]):
   
        # (1) update individual length
        df.length[indx] = df.length[indx] + df.growth[indx]
    
        # (2) estimate daily size-specific mortality 
        if df.length[indx] > 50.0:
            df.z[indx] = 0.01
        else:
            if df.length[indx] <= 50.0:
                df.z[indx] = 0.052857-((0.03/35)*df.length[indx])
            elif df.length[indx] < 15.0:
                df.z[indx] = 0.728*math.exp(-0.1892*df.length[indx])
    
        df['no_fish'].round(decimals = 0)
        if df.no_fish[indx] < 1.0:
            df.no_fish[indx] = 0.0
        elif df.no_fish[indx] >= 1.0:
            df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
    
        # (3) reduce no. of days left in forecast by 1
        count = count + 1

# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(df)

使用输出：

   unique_id  length  no_fish  days_left  growth
0          2    27.7     3195        253  0.3898
1          4    30.2     1894        253  0.3414
2          5    25.4        8        254  0.4080
3         13    29.1     2774        256  0.3839
Forecast iteration completed in 31.75 seconds
   unique_id    length     no_fish  days_left  growth     z
0          2  126.3194  148.729190        253  0.3898  0.01
1          4  116.5742   93.018465        253  0.3414  0.01
2          5  129.0320    0.000000        254  0.4080  0.01
3         13  127.3784  132.864757        256  0.3839  0.01

现在使用向量操作，您可以执行类似的操作：

# keep track of run time - START
start_time = time.perf_counter()
df['z'] = 0.0
for day in range(1, df.days_left.max() + 1):
    update = day <= df['days_left']
    # (1) update individual length
    df[update]['length'] = df[update]['length'] + df[update]['growth']

    # (2) estimate daily size-specific mortality
    df[update]['z'] = np.where( df[update]['length'] > 50.0, 0.01, 0.052857-( ( 0.03 / 35)*df[update]['length'] ) )
    df[update]['z'] = np.where( df[update]['length'] < 15.0, 0.728 * np.exp(-0.1892*df[update]['length'] ), df[update]['z'] )
                        
                    
                            
    df[update]['no_fish'].round(decimals = 0)
    df[update]['no_fish'] = np.where(df[update]['no_fish'] < 1.0, 0.0, df[update]['no_fish'] * np.exp(-(df[update]['z'])))                          
# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(df)

输出

Forecast iteration completed in 1.32 seconds
   unique_id    length     no_fish  days_left  growth    z
0          2  126.3194  148.729190        253  0.3898  0.0
1          4  116.5742   93.018465        253  0.3414  0.0
2          5  129.0320    0.000000        254  0.4080  0.0
3         13  127.3784  132.864757        256  0.3839  0.0

As I said in my comment, a preferable alternative to for loops in this setting is using vector operations. For instance, running your code:

import pandas as pd
import time
import math
import numpy as np

data = {'unique_id':  [2, 4, 5, 13],
        'length': [27.7, 30.2, 25.4, 29.1],
        'no_fish': [3195, 1894, 8, 2774],
        'days_left': [253, 253, 254, 256],
        'growth': [0.3898, 0.3414, 0.4080, 0.3839]
       }

df = pd.DataFrame(data)

print(df)

# keep track of run time - START
start_time = time.perf_counter()

df['z'] = 0.0
for indx in range(len(df)): 
    count = 1
    while count <= int(df.days_left[indx]):
   
        # (1) update individual length
        df.length[indx] = df.length[indx] + df.growth[indx]
    
        # (2) estimate daily size-specific mortality 
        if df.length[indx] > 50.0:
            df.z[indx] = 0.01
        else:
            if df.length[indx] <= 50.0:
                df.z[indx] = 0.052857-((0.03/35)*df.length[indx])
            elif df.length[indx] < 15.0:
                df.z[indx] = 0.728*math.exp(-0.1892*df.length[indx])
    
        df['no_fish'].round(decimals = 0)
        if df.no_fish[indx] < 1.0:
            df.no_fish[indx] = 0.0
        elif df.no_fish[indx] >= 1.0:
            df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
    
        # (3) reduce no. of days left in forecast by 1
        count = count + 1

# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(df)

with output:

   unique_id  length  no_fish  days_left  growth
0          2    27.7     3195        253  0.3898
1          4    30.2     1894        253  0.3414
2          5    25.4        8        254  0.4080
3         13    29.1     2774        256  0.3839
Forecast iteration completed in 31.75 seconds
   unique_id    length     no_fish  days_left  growth     z
0          2  126.3194  148.729190        253  0.3898  0.01
1          4  116.5742   93.018465        253  0.3414  0.01
2          5  129.0320    0.000000        254  0.4080  0.01
3         13  127.3784  132.864757        256  0.3839  0.01

Now with vector operations, you could do something like:

# keep track of run time - START
start_time = time.perf_counter()
df['z'] = 0.0
for day in range(1, df.days_left.max() + 1):
    update = day <= df['days_left']
    # (1) update individual length
    df[update]['length'] = df[update]['length'] + df[update]['growth']

    # (2) estimate daily size-specific mortality
    df[update]['z'] = np.where( df[update]['length'] > 50.0, 0.01, 0.052857-( ( 0.03 / 35)*df[update]['length'] ) )
    df[update]['z'] = np.where( df[update]['length'] < 15.0, 0.728 * np.exp(-0.1892*df[update]['length'] ), df[update]['z'] )
                        
                    
                            
    df[update]['no_fish'].round(decimals = 0)
    df[update]['no_fish'] = np.where(df[update]['no_fish'] < 1.0, 0.0, df[update]['no_fish'] * np.exp(-(df[update]['z'])))                          
# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(df)

with output

Forecast iteration completed in 1.32 seconds
   unique_id    length     no_fish  days_left  growth    z
0          2  126.3194  148.729190        253  0.3898  0.0
1          4  116.5742   93.018465        253  0.3414  0.0
2          5  129.0320    0.000000        254  0.4080  0.0
3         13  127.3784  132.864757        256  0.3839  0.0

回复收藏 0 原文

~没有更多了~