Python:如何批次读取Glob Batch的CSV文件,并存储在数据框架中
我在文件夹中有大量 .csv 文件。所有 .csv 文件具有相同的列名。以下代码合并了所有 .csv 文件。但是我必须在下一步11到20之后将前10个 .csv 文件合并。问题/23735529/how-to-to-to-lob-to-to-to-to-to-to-to-limited-of-files-with-numeric-names“>解决方案1 和解决方案2 如果文件名称是数字,但我的情况文件名是不遵循任何模式。
# Merge .csv files in one place
import glob
import os
import pandas as pd
path = r'D:\Course\Research\Data\2017-21'
print(path)
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pd.read_csv(f,encoding='utf8',error_bad_lines=False) for f in all_files)
merged_df = pd.concat(df_from_each_file)
I have a large number of .csv files in a folder. All .csv files have the same column names. The below code merges all the .csv files. But I have to merge the top 10 .csv files in one DataFrame after that 11 to 20 in the next step and so on... The solution 1 and solution 2 are suitable if file names are numeric but in my case file names are not following any pattern.
# Merge .csv files in one place
import glob
import os
import pandas as pd
path = r'D:\Course\Research\Data\2017-21'
print(path)
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pd.read_csv(f,encoding='utf8',error_bad_lines=False) for f in all_files)
merged_df = pd.concat(df_from_each_file)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
除了上面的评论,这是一个更简单的解决方案。
Glob
收集。在当前状态下,该列表是 的排序,但可以根据您的要求,to_csv
示例使用随机的4字节十六进制字符串来确保唯一性 *对输出文件*注意:注意:此IS 不是保证的唯一性,但我正在使用的50个示例数据文件就足够了。
示例代码:
输出:
以下是打印语句的示例输出:
CSV文件列表:
Further to my comment above, here is a more simple solution.
glob
. In its current state, the list is not sorted, but can be according to your requirementsdfm
to_csv
example uses a random 4-byte hex string to ensure uniqueness* over the output files*Note: This is not guaranteed uniqueness, but will suffice with the 50 sample data files I was using.
Sample code:
Output:
The following is a sample output from the print statements:
CSV file list:
这是一个建议使用
islice> islice> islice(islice)()
来自标准库模块itertools
以获取多达10个文件的块:(我还在使用标准库模块
pathlib
pathlib 因为它更方便。Here's a suggestion that is using
islice()
from the standard library moduleitertools
to fetch chunks of up to 10 files:(I'm also using the standard library module
pathlib
because it's more convenient.)