如何从文件中删除特定字符串?
我有一个数据文件(非结构化、混乱的文件),我必须从中删除特定的字符串列表(删除字符串)。
这是我正在做的事情,但没有结果:
infile = r"messy_data_file.txt"
outfile = r"cleaned_file.txt"
delete_list = ["firstname1 lastname1","firstname2 lastname2"....,"firstnamen lastnamen"]
fin=open(infile,"")
fout = open(outfile,"w+")
for line in fin:
for word in delete_list:
line = line.replace(word, "")
fout.write(line)
fin.close()
fout.close()
当我执行该文件时,出现以下错误:
NameError: name 'word' is not defined
I have a data file (unstructured, messy file) from which I have to scrub specific list of strings (delete strings).
Here is what I am doing but with no result:
infile = r"messy_data_file.txt"
outfile = r"cleaned_file.txt"
delete_list = ["firstname1 lastname1","firstname2 lastname2"....,"firstnamen lastnamen"]
fin=open(infile,"")
fout = open(outfile,"w+")
for line in fin:
for word in delete_list:
line = line.replace(word, "")
fout.write(line)
fin.close()
fout.close()
When I execute the file, I get the following error:
NameError: name 'word' is not defined
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
readlines
方法返回行列表,而不是单词,因此您的代码仅在其中一个单词单独位于一行的情况下才有效。由于 文件是行上的迭代器,因此可以完成很多工作更轻松:
The
readlines
method returns a list of lines, not words, so your code would only work where one of your words is on a line by itself.Since files are iterators over lines this can be done much easier:
要删除同一文件中的字符串,我使用了以下代码
To remove the string within the same file, I used this code
根据您的评论“我双击 .py 文件。它似乎调用了 python 应用程序,该应用程序在几秒钟后消失。我没有想到任何错误”我相信您的问题是脚本找不到输入文件。这也是您没有得到任何输出的原因。当你双击它时......我实际上不记得解释器会在哪里查看,但我认为这是安装 python.exe 的地方。
像这样使用完全限定的路径。
另外,为了您的理智,请从命令行运行它,而不是双击。捕获错误/输出会更容易。
Based on your comment "I am double clicking the .py file. It seems to invoke the python application which disappears after a couple of seconds. I dont get any error thought" I believe your issue is the script is not finding the input file. That is also why you are not getting any output. When you double click on it... I actually can't recall where the interpreter is going to look but I think it's where the python.exe is installed.
Use a fully qualified path like so.
Also, for your sanity, run it from the command-line instead of double clicking. It'll be much easier to catch errors/output.
对于OP,
罗斯帕特森的上述方法非常适合我,即
示例:
我有一个名为messy_data_file.txt的文件,其中包含以下单词(动物),不一定在同一行。像这样:
当我修改要读取的代码时(实际上只是将要删除的单词添加到“delete_list”行中):
生成的“cleaned_file.txt”看起来像这样:
“Goat”曾经是(奇怪的是,删除“驴”并没有),但就我的目的而言,这工作得很好。
我还在代码的最后添加了 input("Press Enter to exit...") ,以防止当我双击remove_text.py 文件来运行它,但请注意,这样您不会发现任何错误。
为此,我从命令行运行它(其中 C:\Just_Testing 是我所有文件所在的目录,即remove_text.py 和messy_text.txt)
像这样:
或
工作原理完全相同。
当然,就像编写 HTML 时一样,我想在从您碰巧所在的目录以外的其他地方运行 py 或 python 时使用完全限定的路径永远不会有什么坏处,例如:
当然在代码中它会是
:请小心使用相同的完全限定路径来放置新创建的cleaned_file.txt,否则无论您在哪里都会创建它,并且在查找它时可能会导致混乱。
就我个人而言,我将环境变量中的 PATH 设置为指向所有 Python 安装,即 C:\Python3.5.3、C:\Python2.7.13 等,这样我就可以从任何地方运行 py 或 python。
不管怎样,我希望对 Patterson 先生的这段代码进行微调可以让你得到你所需要的。 :)
。
To the OP,
Ross Patterson's method above works perfectly for me, i.e.
Example:
I have a file named messy_data_file.txt that includes the following words (animals), not necessarily on the same line. Like this:
When I modify the code to read (actually just adding the words to delete to the "delete_list" line):
The resulting "cleaned_file.txt" looks like this:
There is a blank line where "Goat" used to be (where, oddly, removing "Donkey" did not) but for my purposes, this works fine.
I also add input("Press Enter to exit...") the the very end of the code to keep the command line window from opening and slamming shut on me when I'm double-clicking the remove_text.py file to run it, but take note that you'll catch no errors this way.
To do that I run it from the command line (where C:\Just_Testing is the directory where all my files are, i.e. remove_text.py and messy_text.txt)
like this:
or
works exactly the same.
Of course, like when writing HTML, I guess it never hurts to use a fully qualified path when running py or python from somewhere other than the directory you happen to be sitting in, such as:
Of course in the code it would be:
Be careful to use the same fully qualified path to place your newly created cleaned_file.txt in or it will be created wherever you may be and that could cause confusion when looking for it.
Personally, I have the PATH in my Environment Variables set to point to all my Python installs i.e. C:\Python3.5.3, C:\Python2.7.13, etc. so I can run py or python from anywhere.
Anyway, I hope making fine-tuning adjustments to this code from Mr. Patterson can get you exactly what you need. :)
.
也许您可以在 fin 和 fout 变量中添加 encoding='utf-8' 。
这是您可能想要使用的修改后的版本:
这种情况(添加 utf-8)主要发生在 Windows 操作系统上。另外,对于读取、写入和附加文件,这通常不是问题,但对于执行文件的高级操作(例如替换其中的文本等),那么您应该这样做。
希望这对您有帮助。
Maybe you can add encoding='utf-8' in your fin and fout variables.
Here is the modified one you may want to use:
This(adding utf-8) mostly occurs on the OS Windows. Also for reading, writing, and appending the file, this usually isn't a problem but for advanced things to do a file like replacing text in there, etc then you should do this.
Hope this helps you.
下面的代码只是获取旧数据并检查该字符串是否不包含您不需要的字符串,然后继续。 (如果您想删除空行,这也适用)
The code below just gets the old data and checks if the string doesnt contain the string you doesnt want then continues. (this also works if you want to remove empty lines)