函数无法更新逗号之后的间距
之后的间距不一致,例如:
534323,93495443,34234234,3523423423,2342342,2342342,236555,6564354344
我有一个CSV文件在逗号 间距一致,但似乎并没有更新任何内容。打开创建的新文件后,与原始文件没有区别。我写的功能是:
def ensure_consistent_spacing_in_csv(dirpath, original_name, new_name):
with open(dirpath + original_name, "r") as f:
data = f.readlines()
for item in data:
if "," in data:
comma_index = item.index(",")
if item[comma_index + 1] != " ":
item = item.replace(",", ", ")
with open(dirpath + new_name, "w") as f:
f.writelines(data)
我要在哪里出错?
我已经查看了问题的答案在这里,但是我无法使用该方法,因为我需要定界符为“”,这是两个字符,因此不允许。我还试图在sed
中遵循该方法的回答在这里使用process.call
系统,但这也失败了,我不知道bash吧,所以我不愿意走那条路线,并且想要使用纯Python方法。
谢谢你!
I have a csv file that has inconsistent spacing after the comma, like this:
534323, 93495443,34234234, 3523423423, 2342342,236555, 6564354344
I have written a function that tries to read in the file and makes the spacing consistent, but it doesn't appear to update anything. After opening the new file created, there is no difference from the original. The function I've written is:
def ensure_consistent_spacing_in_csv(dirpath, original_name, new_name):
with open(dirpath + original_name, "r") as f:
data = f.readlines()
for item in data:
if "," in data:
comma_index = item.index(",")
if item[comma_index + 1] != " ":
item = item.replace(",", ", ")
with open(dirpath + new_name, "w") as f:
f.writelines(data)
Where am I going wrong?
I have looked at the answer to the question here, but I cannot use that method as I need the delimiter to be ", ", which is two characters and hence not allowed. I also tried to follow the method in the sed
answer to the question here using a process.call
system, but that also failed and I don't know bash well so I'm hesitant to go that route and would like to use a pure python method.
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是我能够从您的示例
note 的字符串中标准化间距的方式:我假设文件的内容不足以超过可用的内存,因为您将其读取到列表中在您的代码中。
注意:使用正则表达式可能并非总是(几乎从来没有读)他是解决问题的最有效方法,但可以完成工作。
将产生
并为文件提供以下上下文:
我运行
并得到了此结果:
您可以使用CSV模块读取行,对于每行,您都会剥离()元素。
upd:
正则可以简化
Here is how I was able to normalize the spacing given a string from your example
NOTE: I am assuming the content of the file isn't large enough to exceed the available memory since you read it into the list in your code.
NOTE: using regular expressions may not always (read almost never) be he most efficient way to solve a problem, but it gets the job done.
will produce
and for the file with the following context:
I ran
and got this result:
Alternatively you could use the csv module to read the rows and for each row you would strip() the element.
UPD:
The regex could be simplified to
原始代码有几个错误:
if“”永远不会评估为true。
中使用数据
是列表,其中列表中的每个项目都是代表文件的一行的字符串。文件中没有一行是,
,因此条件永远不会评估为true。要修复它,请在项目if“”。这样,它正在检查每行是否都有逗号。
item.Index
函数仅返回逗号的第一个实例,因此,如果在一个中有两次不一致的间距不一致,则算法不会捕获它。一个简单的解决方案,不需要正则表达式或
sed
或索引并按字符查看每个单词是:是:
line.replace(“”,“”).replace(“,”,“,”))
首先从行中完全删除所有空间(感谢@megakarg的建议),然后进行当然,每个逗号之后都有一个空间来满足规格。The original code has a couple bugs:
if "," in data
condition never evaluates to true.data
is a list, where each item in the list is a string representing one entire line of the file. No single line in the file is,
, so that condition never evaluates to true. To fix it, useif "," in item
. That way it's checking to see if each line has a comma.item.index
function returns only the first instance of a comma, so if there's inconsistent spacing twice in one the algorithm does not catch it.A simple solution that doesn't require regular expressions or
sed
or indexing and looking at each word character by character is:What this is doing is:
for line in f
reads each line of the file.line.replace(" ", "").replace(",", ", "))
first removes all spaces entirely (thanks to @megakarg for the suggestion) from the line, and then makes sure there's a single space after each comma to meet the spec.