从医学期刊创建单词列表

发布于 2025-02-06 08:43:13 字数 520 浏览 0 评论 0原文

我被要求编译外科医生出版物的填字游戏 - 每季度出现。我需要使其以医学为导向,最好使用不同的专业单词。例如,有些将是骨科,一些心脏手术和一些人体解剖学等。 我可以在网上获得外科期刊。

我想为每个专业创建单词列表,并在编译器中使用它们。我将使用填字游戏编译器

我可以在网络上使用期刊文章,也可以下载PDF。我是一名外科医生,并使用熊猫进行数据分析,但是我的python技能有点原始,因此我需要相对简单的解决方案。如何为每个外科专业创建特定单词列表。

它们不需要非常具体的单词,因此,例如,我认为我可以将期刊卷刮掉单词,将它们与常用单词列表进行比较,并删除那些让我列出技术列表的人。可能需要一些反复试验。我以前没有用过美丽的汤,但愿意尝试。

另外,我可以摆脱美丽的汤步骤,并使用endnote下载几百个期刊并导出到TXT。

这是我认为我主要在努力概念化的提取和清单。

I have been asked to compile a crossword for a surgeon's publication, - it comes out quarterly. I need to make it medically oriented, preferably using different specialty words. eg some will be orthopaedics, some cardiac surgery and some human anatomy etc.
I can get surgical journals online.

I want to create word lists for each specialty and use them in the compiler. I will use crossword compiler .

I can use journal articles on the web, or downloaded pdf's. I am a surgeon and use pandas for data analysis but my python skills are a bit primitive so I need relatively simple solutions. How can I create the specific word lists for each surgical specialty.

They don't need to be very specific words, so eg I thought I could scrape the journal volume for words, compare them to a list of common words and delete those leaving me with a technical list. May require some trial and error. I havent used beautiful soup before but willing to try it.

Alternatively I could just get rid of the beautful soup step and use endnote to download a few hundred journals and export to txt.

Its the extraction and list making I think i am mainly struggling to conceptualise.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

叹沉浮 2025-02-13 08:43:13

我创建了这个程序,您可以用来通过.txt文件解析以找到最常见的单词。我还提供了一个代码块,该代码将帮助您将.pdf文件转换为.txt。希望我对解决方案的方法有所帮助,祝外科医生出版物的填字游戏好运!

'''
Find the most common words in a txt file
'''

import collections
# The re module provides regular expression matching operations
import re
'''
Use this if you would like to convert a PDF to a txt file
'''
# import PyPDF2
# pdffileobj=open('textFileName.pdf','rb')
# pdfreader=PyPDF2.PdfFileReader(pdffileobj)
# x=pdfreader.numPages
# pageobj=pdfreader.getPage(x-1)
# text=pageobj.extractText()

# file1=open(r"(folder path)\\textFileName.txt","a")
# file1.writelines(text)
# file1.close()

words = re.findall(r'\w+', open('textFileName.txt').read().lower())
most_common = collections.Counter(words).most_common(10)
print(most_common)

I created this program that you can use to parse through a .txt file to find the most common words. I also included a block of code that will help you to convert a .pdf file to .txt. Hope my approach to the solution helps, good luck with your crossword for the surgeon's publication!

'''
Find the most common words in a txt file
'''

import collections
# The re module provides regular expression matching operations
import re
'''
Use this if you would like to convert a PDF to a txt file
'''
# import PyPDF2
# pdffileobj=open('textFileName.pdf','rb')
# pdfreader=PyPDF2.PdfFileReader(pdffileobj)
# x=pdfreader.numPages
# pageobj=pdfreader.getPage(x-1)
# text=pageobj.extractText()

# file1=open(r"(folder path)\\textFileName.txt","a")
# file1.writelines(text)
# file1.close()

words = re.findall(r'\w+', open('textFileName.txt').read().lower())
most_common = collections.Counter(words).most_common(10)
print(most_common)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文