当功能嵌套时，Python代码的执行时间大大增加

发布于 2025-01-21 11:14:14 字数 2919 浏览 0 评论 0原文

I have a function as below:

def shuffle_n_split(combined_arr, n):

    start_time = time.time() # adding for testing execution time
    random.shuffle(combined_arr)
    print("--- %s seconds ---" % (time.time() - start_time))

    #random.shuffle(combined_arr)
    X_n = combined_arr[:n]
    X_m = combined_arr[n:]

    return X_n, X_m

I have nested it in another function:

def p_r_test(combined_arr, n, k):
    distribution = []
    #for i in tqdm(range(k)):
    for i in range(k):
        X_n, X_m = shuffle_n_split(combined_arr, n)
        prop_X_n = sum(X_n)/len(X_n)
        prop_X_m = sum(X_m)/len(X_m)
        prop_diff = prop_X_n - prop_X_m
        distribution.append(prop_diff)
    return distribution

At this point, execution time is good:

--- 0.000997304916381836 seconds ---

--- 0.0 seconds ---

--- 0.000997304916381836 seconds ---

-- - 0.0009968280792236328 seconds ---

--- 0.0 seconds ---

--- 0.0009980201721191406 seconds ---

But when I further nest the latter function into another function, like this:

def a_b_test_and_report(df, col_name, dependent_var, dep_var_val, prop_diff, num_bins = 75):
    vals  = df[col_name].unique()

    num_var_1 = len(df[df[col_name] == vals[0]])
    num_var_2 = len(df[df[col_name] == vals[1]])

    print("Number of var1:")
    print(num_var_1)
    print("Number of var2:")
    print(num_var_2)

    cond_var1 = len(df[(df[col_name] == vals[0]) & (df[dependent_var] == dep_var_val)])
    prop_var1 = np.round_((cond_var1/num_var_1), decimals = 4)
    print('Proportion of variable 1 satisfying a condition')
    print(prop_var1)

    cond_var2 = len(df[(df[col_name] == vals[1]) & (df[dependent_var] == dep_var_val)])
    prop_var2 = np.round_((cond_var2/num_var_2), decimals = 4)
    print('Proportion of variable 2 satisfying a condition')
    print(prop_var2)


    prop_diff = np.absolute(np.round_(prop_var1 - prop_var2, decimals = 4))
    print('Difference between the two proportions')
    print(prop_diff)
   
    #return num_var_1, num_var_2, cond_var1, cond_var2, prop_var1, prop_var2, prop_diff
    combined_arr = df[dependent_var] 
    np.random.shuffle(combined_arr)
    X_n = combined_arr[:n]
    X_m = combined_arr[n:]

    n = num_var_1
    k = 100000
    disbn = p_r_test(combined_arr, n, k)

Executing the same code, I get the following execution time:

--- 1.983733892440796 seconds ---

--- 2.2649989128112793 seconds ---

--- 1.9628357887268066 seconds ---

--- 2.021646022796631 seconds ---

--- 1.879605770111084秒----------

1.952327350616455秒

------------------ 1.73837699890137秒----

这似乎没有任何意义，因为代码根本没有任何更改。

我什至更改了改组功能，并尝试了Numpy和随机库。两者都产生相同的效果。为什么会发生这种情况？

原文

I have a function as below:

def shuffle_n_split(combined_arr, n):

    start_time = time.time() # adding for testing execution time
    random.shuffle(combined_arr)
    print("--- %s seconds ---" % (time.time() - start_time))

    #random.shuffle(combined_arr)
    X_n = combined_arr[:n]
    X_m = combined_arr[n:]

    return X_n, X_m

I have nested it in another function:

def p_r_test(combined_arr, n, k):
    distribution = []
    #for i in tqdm(range(k)):
    for i in range(k):
        X_n, X_m = shuffle_n_split(combined_arr, n)
        prop_X_n = sum(X_n)/len(X_n)
        prop_X_m = sum(X_m)/len(X_m)
        prop_diff = prop_X_n - prop_X_m
        distribution.append(prop_diff)
    return distribution

At this point, execution time is good:

--- 0.000997304916381836 seconds ---

--- 0.0 seconds ---

--- 0.000997304916381836 seconds ---

--- 0.0009968280792236328 seconds ---

--- 0.0 seconds ---

--- 0.0009980201721191406 seconds ---

But when I further nest the latter function into another function, like this:

def a_b_test_and_report(df, col_name, dependent_var, dep_var_val, prop_diff, num_bins = 75):
    vals  = df[col_name].unique()

    num_var_1 = len(df[df[col_name] == vals[0]])
    num_var_2 = len(df[df[col_name] == vals[1]])

    print("Number of var1:")
    print(num_var_1)
    print("Number of var2:")
    print(num_var_2)

    cond_var1 = len(df[(df[col_name] == vals[0]) & (df[dependent_var] == dep_var_val)])
    prop_var1 = np.round_((cond_var1/num_var_1), decimals = 4)
    print('Proportion of variable 1 satisfying a condition')
    print(prop_var1)

    cond_var2 = len(df[(df[col_name] == vals[1]) & (df[dependent_var] == dep_var_val)])
    prop_var2 = np.round_((cond_var2/num_var_2), decimals = 4)
    print('Proportion of variable 2 satisfying a condition')
    print(prop_var2)


    prop_diff = np.absolute(np.round_(prop_var1 - prop_var2, decimals = 4))
    print('Difference between the two proportions')
    print(prop_diff)
   
    #return num_var_1, num_var_2, cond_var1, cond_var2, prop_var1, prop_var2, prop_diff
    combined_arr = df[dependent_var] 
    np.random.shuffle(combined_arr)
    X_n = combined_arr[:n]
    X_m = combined_arr[n:]

    n = num_var_1
    k = 100000
    disbn = p_r_test(combined_arr, n, k)

Executing the same code, I get the following execution time:

--- 1.983733892440796 seconds ---

--- 2.2649989128112793 seconds ---

--- 1.9628357887268066 seconds ---

--- 2.021646022796631 seconds ---

--- 1.879605770111084 seconds ---

--- 1.9523327350616455 seconds ---

--- 1.7383837699890137 seconds ---

This doesn't seem to make any sense as code hasn't changes at at all.

I even changed the shuffling function and tried both numpy and random libraries. Both give same effect. Why is this happening?

分享到QQ

分享到微博