str.replace()的内存有效替代
我有一个具有200k行的CSV文件和大约40列。特定列包含特殊字符'|'我想替换为“ _”。 但是,在进行str.replace然后进行。申请时,我在16GB RAM上遇到了OOM错误,因此必须有更有效的方法。
我的代码:
import os
import pandas as pd
import numpy as np
archive_loc = ('pathname')
data = pd.read_csv(os.path.join(archive_loc,'sample.csv'))
category = data['category'].values
category = category.tolist()
for string in category:
new_string = string.replace("|", "_")
category.append(new_string)
I have a csv file with 200k rows and about 40 columns. Specific column contains special character '|' that i want to replace with '_'.
However while doing str.replace and then .append i encounter OOM error on my 16GB RAM, there must be a more efficient way.
My code:
import os
import pandas as pd
import numpy as np
archive_loc = ('pathname')
data = pd.read_csv(os.path.join(archive_loc,'sample.csv'))
category = data['category'].values
category = category.tolist()
for string in category:
new_string = string.replace("|", "_")
category.append(new_string)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不要转换为列表和循环,请直接在数据框架中进行替换。
Don't convert to a list and loop, do the replacement directly in the dataframe.