快捷方式使用Panda&#x27的重复()带有厚CSV
我无法一次阅读整个5 GB CSV文件,但是使用pandas'read_csv()
带有chuncksize
set似乎是一种快速简便的方法
import pandas as panda
def run_pand(csv_db):
reader = panda.read_csv(csv_db, chunksize=5000)
dup=reader.duplicated(subset=["Region","Country","Ship Date"])
#and after i will write duplicates in new csv
:理解它,在块中阅读不会让我找到副本,如果它们是不同的作品,还是它仍然会吗?
有没有办法使用Pandas方法搜索比赛?
I can't read a whole 5 GB CSV file in one go, but using Pandas' read_csv()
with chuncksize
set seems to be a fast and easy way:
import pandas as panda
def run_pand(csv_db):
reader = panda.read_csv(csv_db, chunksize=5000)
dup=reader.duplicated(subset=["Region","Country","Ship Date"])
#and after i will write duplicates in new csv
As I understand it, reading in chunks will not let me find a duplicate if they are in different pieces, or will it still?
Is there a way to search for matches using a Pandas method?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论