在允许同义词/缩写的医疗链接列表/表格上实现搜索,并导入这样的东西

发布于 2024-10-11 23:53:11 字数 809 浏览 2 评论 0原文

我正在制作一个简单的可搜索列表,最终将包含大约 100,000 个有关各种医学主题的链接 - 主要是医疗状况/疾病。 现在从表面上看,这听起来很简单......事实上,我已经按以下方式设置了表格:

  • 链接:id、url、名称、主题
  • 主题(例如心脏病学、儿科等):id、名称
  • 条件(例如哮喘、流感等):id、姓名、别名

,可能还有另一个表:

  • Link &条件(因为 1 个链接可以涉及多个条件):链接 id,条件 id

所以基本上,因为医生(包括我自己)非常挑剔,我想这样做,以便如果您正在搜索条件 - 无论它是缩写、英式或美式英语,或替代的古代名称 - 您会得到相关结果(例如“血管性水肿”、“血管性水肿”、“昆克水肿”等会给您相同的结果;与“胃食管反流”“胃食管反流病”类似) “,GERD,GORD,GOR)。此外,在结果的顶部,最好将与搜索字符串匹配的诊断链接分组在一起,然后匹配链接名称,最后匹配主题。

我的主要问题是,有数千个条件,每个条件最多有 20 个同义词/拼写等。一种选择是从 MeSH 这恰好是一种医学同义词库(但仅限美式英语,因此必须有一种从英式英语转换的方法)。问题是他们提供的 XML 太疯狂了,大约有 250mb。为了帮助他们获得数据元素的指南。

老实说,我不知道如何最有效地解决这个问题,因为我刚刚开始编程和使用数据库,而且大多数要做的事情似乎很困难/次优。

想知道是否有人可以帮助我?很乐意澄清任何不清楚的地方。

I'm making a simple searchable list which will end up containing about 100,000 links on various medical topics- mostly medical conditions/diseases.
Now on the surface of things this sounds easy... in fact I've set my tables up in the following way:

  • Links: id, url, name, topic
  • Topics (eg cardiology, paediatrics etc): id, name
  • Conditions (eg asthma, influenza etc): id, name, aliases

And possibly another table:

  • Link & condition (since 1 link can pertain to multiple conditions): link id, condition id

So basically since doctors (including myself) are super fussy, I want to make it so that if you're searching for a condition- whether it be an abbreviation, british or american english, or an alternative ancient name- you get relevant results (eg "angiooedema", "angioedema", "Quincke's edema" etc would give you the same results; similarly with "gastroesophageal reflux" "gastro-oesophageal reflux disease", GERD, GORD, GOR). Additionally, at the top of the results it would be good to group together links for a diagnosis that matches the search string, then have matches to link name, then finally matches to the topic.

My main problem is that there are thousands if not tens of thousands of conditions, each with up to 20 synonyms/spellings etc. One option is to get data from MeSH which happens to be a sort of medical thesaurus (but in american english only so there would have to be a way of converting from british english). The trouble being that the XML they provide is INSANE and about 250mb. To help they have got a guide to what the data elements are.

Honestly, I am at a loss as to how to tackle this most effectively as I've just started programming and working with databases and most of the possibilities of what to do seem difficult/suboptimal.

Was wondering if anyone could give me a hand? Happy to clarify anything that is unclear.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

笙痞 2024-10-18 23:53:11

您的问题非常适合面向文档的存储,例如 Lucene。例如,您可以设计一个架构,例如

Link
话题
条件

  1. 然后你可以编写一个 Lucene 查询,例如 Topic:edema,你应该得到所有结果。
    您可以进行通配符搜索以获取更多信息。

  2. 要匹配英国拼写(甚至拼写错误),您可以使用 ~ 查询来查找特定字符串距离内的术语。例如 edema~0.5 匹配 oedema、oedoema 等...

Apache Lucene 是一个 Java 库,其中的 portt 可用于大多数主要语言。 Apache Solr 是一个使用 Lucene lib 构建的成熟搜索服务器,并且由于它具有 RESTful API,因此可以轻松集成到您选择的平台中。

摘要:我的建议是使用 Apache Solr 作为 MySql 数据库的附件。

Your problem is well suited to a document-oriented store such as Lucene. For example you can design a schema such as

Link
Topic
Conditions

  1. Then you can write a Lucene query such as Topic:edema and you should get all results.
    You can do wildcard search for more.

  2. To match british spellings (or even misspellings) you can use the ~ query which finds terms within a certain string distance. For example edema~0.5 matches oedema, oedoema and so on...

Apache Lucene is a Java library with portts available for most major languages. Apache Solr is a full-fledged search server built using Lucene lib and easily integrable into your platform-of-choice because it has a RESTful API.

Summary: my recommendation is to use Apache Solr as an adjunct to your MySql db.

反差帅 2024-10-18 23:53:11

这很难。最好的选择是使用 MeSH,然后使用 soundex 来匹配英式英语术语。

It's hard. Your best bet is to use MeSH and then perhaps soundex to match on British English terms.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文