我可以将清洁的DF保存到目标目录
我正在尝试从大文件中删除重复项,但将其保存到其他目录中。我在下面运行了代码,但它将它们保存在根目录中。我知道,如果我切换到 inplace ='false'它不会在根目录中覆盖这些文件,但也不会将它们复制到目标目录中,因此不会帮助。
请建议,谢谢! :)
import os
import pandas as pd
from glob import glob
import csv
from pathlib import Path
root = Path(r'C:\my root directory')
target = Path(r'C:\my root directory\target')
file_list = root.glob("*.csv")
desired_columns = ['ZIP', 'COUNTY', 'COUNTYID']
for csv_file in file_list:
df = pd.read_csv(csv_file)
df.drop_duplicates(subset=desired_columns, keep="first", inplace=True)
df.to_csv(os.path.join(target,csv_file))
例子:
ZIP COUNTYID COUNTY
32609 1 ALACHUA
32609 1 ALACHUA
32666 1 ALACHUA
32694 1 ALACHUA
32694 1 ALACHUA
32694 1 ALACHUA
32666 1 ALACHUA
32666 1 ALACHUA
32694 1 ALACHUA
I am trying to remove duplicates from large files, but save those into a different directory. I ran the code below, but it saved them (overwrote) within the root directory. I know that if I switch to inplace='False' it won't overwrite those files in the root directory, but it also doesn't copy them into the target directory either, so that doesn't help.
Please advise and thank you! :)
import os
import pandas as pd
from glob import glob
import csv
from pathlib import Path
root = Path(r'C:\my root directory')
target = Path(r'C:\my root directory\target')
file_list = root.glob("*.csv")
desired_columns = ['ZIP', 'COUNTY', 'COUNTYID']
for csv_file in file_list:
df = pd.read_csv(csv_file)
df.drop_duplicates(subset=desired_columns, keep="first", inplace=True)
df.to_csv(os.path.join(target,csv_file))
Example:
ZIP COUNTYID COUNTY
32609 1 ALACHUA
32609 1 ALACHUA
32666 1 ALACHUA
32694 1 ALACHUA
32694 1 ALACHUA
32694 1 ALACHUA
32666 1 ALACHUA
32666 1 ALACHUA
32694 1 ALACHUA
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这应该有效,同时还可以减少依赖关系:
请注意,由于
TARGET
相对于您的根目录,因此您可以使用/
运算符加入。This should work, while also reducing your dependencies:
Note that since
target
is relative to your root directory, you can simply join using the/
operator.