在文本文件的不同部分中计算单词
对于一个项目,我必须与Python分析具有200多个简历的TXT文件。我必须搜索文件,并必须计算是否提到了特定键。这是我非常简单的代码:
file = open("CVC.txt")
data=file.read()
occurence = data.count("Biology")
print('Number of occurrences of the word :', occurence)
问题是当我在一个简历中搜索EG Enginnerser几次。但是我只想计算一次。每个简历都以“联系”一词开头。我的问题是如何指定可以区分简历的算法,并且仅计入简历中的特定关键字。
提前致谢!
for a project I have to analyze a txt file with over 200 resumes in it with python. I have to search trough the file and have to count if a specific key is mentioned. This is my very easy code:
file = open("CVC.txt")
data=file.read()
occurence = data.count("Biology")
print('Number of occurrences of the word :', occurence)
The problem is when I search for e.g. Enginnering it is mentioned several times in one CV. But I just want to count it once. Every resume starts with the word 'contact'. My question is how can I specify an Algorithm that can differentiate between the resumes and only counts for a specific keyword ones in the cv.
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
逻辑有些简单。解析文件的每一行,当您看到启动联系人的行时,然后存储该行,然后全部存储直到您看到下一个联系人行。读取文件后,将剩余的行存储为最后一个开始联系的一部分。
我制作了一个这样的示例文件
,此测试输出
以一个联系方式计算单词,您可以通过位置访问
The logic is somewhat straightforward. Parse each line of the file, when you see a line that starts a contact, then store the line and all after until you see the next contact line. When the file is done being read, store the remaining lines as part of the last started contact.
I made an example file like this
And this test output
To count words in one contact, you can access by the position
这是一种具有更简单逻辑的解决方案,创建一个标志,该标志说明1。我们在联系人和2中。如果我们已经在此联系人中看到了该词。
我已经成功地在一个小样本上尝试了它,请在您的输入上尝试一下,看看它是否有效。
Here is a solution with simpler logic, create a flag that tells if 1. We are inside a contact and 2. if we have already seen that word in this contact.
I have tried it on a small sample successfully, try it on your input and see if it works.