当功能嵌套时,Python代码的执行时间大大增加
I have a function as below:
def shuffle_n_split(combined_arr, n):
start_time = time.time() # adding for testing execution time
random.shuffle(combined_arr)
print("--- %s seconds ---" % (time.time() - start_time))
#random.shuffle(combined_arr)
X_n = combined_arr[:n]
X_m = combined_arr[n:]
return X_n, X_m
I have nested it in another function:
def p_r_test(combined_arr, n, k):
distribution = []
#for i in tqdm(range(k)):
for i in range(k):
X_n, X_m = shuffle_n_split(combined_arr, n)
prop_X_n = sum(X_n)/len(X_n)
prop_X_m = sum(X_m)/len(X_m)
prop_diff = prop_X_n - prop_X_m
distribution.append(prop_diff)
return distribution
At this point, execution time is good:
--- 0.000997304916381836 seconds ---
--- 0.0 seconds ---
--- 0.000997304916381836 seconds ---
-- - 0.0009968280792236328 seconds ---
--- 0.0 seconds ---
--- 0.0 seconds ---
--- 0.0 seconds ---
--- 0.0009980201721191406 seconds ---
But when I further nest the latter function into another function, like this:
def a_b_test_and_report(df, col_name, dependent_var, dep_var_val, prop_diff, num_bins = 75):
vals = df[col_name].unique()
num_var_1 = len(df[df[col_name] == vals[0]])
num_var_2 = len(df[df[col_name] == vals[1]])
print("Number of var1:")
print(num_var_1)
print("Number of var2:")
print(num_var_2)
cond_var1 = len(df[(df[col_name] == vals[0]) & (df[dependent_var] == dep_var_val)])
prop_var1 = np.round_((cond_var1/num_var_1), decimals = 4)
print('Proportion of variable 1 satisfying a condition')
print(prop_var1)
cond_var2 = len(df[(df[col_name] == vals[1]) & (df[dependent_var] == dep_var_val)])
prop_var2 = np.round_((cond_var2/num_var_2), decimals = 4)
print('Proportion of variable 2 satisfying a condition')
print(prop_var2)
prop_diff = np.absolute(np.round_(prop_var1 - prop_var2, decimals = 4))
print('Difference between the two proportions')
print(prop_diff)
#return num_var_1, num_var_2, cond_var1, cond_var2, prop_var1, prop_var2, prop_diff
combined_arr = df[dependent_var]
np.random.shuffle(combined_arr)
X_n = combined_arr[:n]
X_m = combined_arr[n:]
n = num_var_1
k = 100000
disbn = p_r_test(combined_arr, n, k)
Executing the same code, I get the following execution time:
--- 1.983733892440796 seconds ---
--- 2.2649989128112793 seconds ---
--- 1.9628357887268066 seconds ---
--- 2.021646022796631 seconds ---
--- 1.879605770111084秒----------
1.952327350616455秒
------------------ 1.73837699890137秒----
这似乎没有任何意义,因为代码根本没有任何更改。
我什至更改了改组功能,并尝试了Numpy和随机库。两者都产生相同的效果。为什么会发生这种情况?
I have a function as below:
def shuffle_n_split(combined_arr, n):
start_time = time.time() # adding for testing execution time
random.shuffle(combined_arr)
print("--- %s seconds ---" % (time.time() - start_time))
#random.shuffle(combined_arr)
X_n = combined_arr[:n]
X_m = combined_arr[n:]
return X_n, X_m
I have nested it in another function:
def p_r_test(combined_arr, n, k):
distribution = []
#for i in tqdm(range(k)):
for i in range(k):
X_n, X_m = shuffle_n_split(combined_arr, n)
prop_X_n = sum(X_n)/len(X_n)
prop_X_m = sum(X_m)/len(X_m)
prop_diff = prop_X_n - prop_X_m
distribution.append(prop_diff)
return distribution
At this point, execution time is good:
--- 0.000997304916381836 seconds ---
--- 0.0 seconds ---
--- 0.000997304916381836 seconds ---
--- 0.0009968280792236328 seconds ---
--- 0.0 seconds ---
--- 0.0 seconds ---
--- 0.0 seconds ---
--- 0.0009980201721191406 seconds ---
But when I further nest the latter function into another function, like this:
def a_b_test_and_report(df, col_name, dependent_var, dep_var_val, prop_diff, num_bins = 75):
vals = df[col_name].unique()
num_var_1 = len(df[df[col_name] == vals[0]])
num_var_2 = len(df[df[col_name] == vals[1]])
print("Number of var1:")
print(num_var_1)
print("Number of var2:")
print(num_var_2)
cond_var1 = len(df[(df[col_name] == vals[0]) & (df[dependent_var] == dep_var_val)])
prop_var1 = np.round_((cond_var1/num_var_1), decimals = 4)
print('Proportion of variable 1 satisfying a condition')
print(prop_var1)
cond_var2 = len(df[(df[col_name] == vals[1]) & (df[dependent_var] == dep_var_val)])
prop_var2 = np.round_((cond_var2/num_var_2), decimals = 4)
print('Proportion of variable 2 satisfying a condition')
print(prop_var2)
prop_diff = np.absolute(np.round_(prop_var1 - prop_var2, decimals = 4))
print('Difference between the two proportions')
print(prop_diff)
#return num_var_1, num_var_2, cond_var1, cond_var2, prop_var1, prop_var2, prop_diff
combined_arr = df[dependent_var]
np.random.shuffle(combined_arr)
X_n = combined_arr[:n]
X_m = combined_arr[n:]
n = num_var_1
k = 100000
disbn = p_r_test(combined_arr, n, k)
Executing the same code, I get the following execution time:
--- 1.983733892440796 seconds ---
--- 2.2649989128112793 seconds ---
--- 1.9628357887268066 seconds ---
--- 2.021646022796631 seconds ---
--- 1.879605770111084 seconds ---
--- 1.9523327350616455 seconds ---
--- 1.7383837699890137 seconds ---
This doesn't seem to make any sense as code hasn't changes at at all.
I even changed the shuffling function and tried both numpy and random libraries. Both give same effect. Why is this happening?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论