如何将单词映射到数据库中使用该单词的句子?
我们正在尝试为我们的学校项目建立一个网站。该网站将让用户输入一个单词并获取该单词所在的句子及其翻译。如何设计一个有效的数据库来将单词映射到使用该单词的句子?我们可以简单地创建一个单词到句子的 ID 表,但这并不比将所有内容写入文件更好。
有什么想法吗?
We are trying to build a website for our school project. The site will be about a user entering a word and getting the sentences the word is used in and their translations. How can an effective database be designed to map a word to sentences the word is used in? We can simply create a word-to-sentence ID's table, but it is not better than writing everything into a file.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为此使用关系数据库将进入“当你只有一把锤子时”的领域。在文本中搜索字符串是一种特殊情况,许多数据库引擎都有专门的工具来处理。
然而,如果您想使用 RDBMS,那么从几个角度来看,保留一个充满句子的表和一个充满单词的表以及连接两者的交集表的想法是有意义的。由于您也在查看翻译,因此您需要一个单词表来链接不同语言的单词。此外,如果您的单词到句子表包含序列号,您可以使用它来查找两个单词彼此靠近(在给定距离内)使用的句子。
Using relational database for this is going into "when all you have is a hammer" territory. Searching for strings in text is a special case that many database engines have special tools for.
Nevertheless, if you want to use RDBMS then your idea of keeping a table full of sentences and a table full of words with an intersection table linking the two makes sense from a couple of perspectives. Since you're also looking at translations you want a word table to link words in different languages. Also, if your word-to-sentence table contains a sequence number you can use that to find sentences where two words are used near (within a given distance of) each other.
您还可以使用 Apache Solr 之类的工具来完成此类任务。这些东西被称为文本搜索引擎,它们是专门为此而设计的。他们甚至可以根据特定语言的规则找到具有前缀或后缀的单词,或者可以匹配具有相同含义的单词等(例如搜索“工具”并查找包含“设备”的句子)。
You can also use something like Apache Solr for this kind of task. These things are called text search engines and they are specifically designed to do this. They can even find words that have prefix or suffixes, according to the specific language's rules, or can match words with the same meaning etc (for example searching for "tool" and finding sentences containing "device").