python用于循环+熊猫附加
我正在尝试在循环中读取文件,并将它们全部附加到一个数据集中。但是,我的代码似乎正在以罚款读取数据,但是循环没有将数据附加到数据框架上。相反,它仅使用导入的数据集之一(Final_access HR DataFrame)。
我的循环怎么了?为什么要附加我的循环文件? 我的DataFrame Access_hr_attestaion有77个记录,当我在3个文件中阅读时,我期望有2639个记录。
for file in files_path:
mainframe_access_HR = pd.read_pickle(file)
mainframe_access_HR = mainframe_access_HR.astype(str)
if mainframe_access_HR.shape[0]:
application = mainframe_access_HR['Owner'].unique()[0]
filtered_attestation_data = attestation_data[attestation_data['cleaned_MAL_CODE']==application]
final_access_hr = pd.DataFrame()
column_list = pd.DataFrame(['HRACF2'])
for column in range(len(column_list)):
mainframe_access_HR_new = mainframe_access_HR.copy()
#Drop rows containing NAN for column c_ACF2_ID for new merge
mainframe_access_HR_new.dropna(subset=[column_list.iloc[column,0]], inplace = True)
#Creating a new column for merge
mainframe_access_HR_new['ID'] = mainframe_access_HR_new[column_list.iloc[column,0]]
#case folding
mainframe_access_HR_new['ID'] = mainframe_access_HR_new['ID'].str.strip().str.upper()
#Merge data
merged_data = pd.merge(filtered_attestation_data, mainframe_access_HR_new, how='right', left_on=['a','b'], right_on =['a','b'])
#Concatinating all data together
final_access_hr = final_access_hr.append(merged_data)
#Remove duplicates
access_HR_attestaion = final_access_hr.drop_duplicates()
I am trying to read in files in a loop and append them all into one dataset. However my code seems to be reading the data in fine, but the loop is not appending the data to a dataframe. Instead it just uses one of the imported datasets (final_Access hr dataframe).
What is wrong with my loop? why arent my looped files being appended?
My dataframe access_HR_attestaion has 77 records, when I am expecting 2639 records as I am reading in 3 files.
for file in files_path:
mainframe_access_HR = pd.read_pickle(file)
mainframe_access_HR = mainframe_access_HR.astype(str)
if mainframe_access_HR.shape[0]:
application = mainframe_access_HR['Owner'].unique()[0]
filtered_attestation_data = attestation_data[attestation_data['cleaned_MAL_CODE']==application]
final_access_hr = pd.DataFrame()
column_list = pd.DataFrame(['HRACF2'])
for column in range(len(column_list)):
mainframe_access_HR_new = mainframe_access_HR.copy()
#Drop rows containing NAN for column c_ACF2_ID for new merge
mainframe_access_HR_new.dropna(subset=[column_list.iloc[column,0]], inplace = True)
#Creating a new column for merge
mainframe_access_HR_new['ID'] = mainframe_access_HR_new[column_list.iloc[column,0]]
#case folding
mainframe_access_HR_new['ID'] = mainframe_access_HR_new['ID'].str.strip().str.upper()
#Merge data
merged_data = pd.merge(filtered_attestation_data, mainframe_access_HR_new, how='right', left_on=['a','b'], right_on =['a','b'])
#Concatinating all data together
final_access_hr = final_access_hr.append(merged_data)
#Remove duplicates
access_HR_attestaion = final_access_hr.drop_duplicates()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为错误是因为您正在初始化final_access_hr,以供您阅读的每个文件。因此,这将重置您阅读的每个文件。
您可以按线沿files_path的循环移动:
如果它解决了您的问题,请发表评论吗?
I think the bug is because you are initializing final_access_hr for the every file you are reading. So that is getting reset for every file you read.
Can you move following line out the loop of files_path:
and comment if it solves your problem?