如何制作像 Wolfram|Alpha 这样的小型引擎?
假设我有三个模型/表:operating_systems
、words
和 programming_languages
:
# operating_systems
name:string created_by:string family:string
Windows Microsoft MS-DOS
Mac OS X Apple UNIX
Linux Linus Torvalds UNIX
UNIX AT&T UNIX
# words
word:string defenitions:string
window (serialized hash of defenitions)
hello (serialized hash of defenitions)
UNIX (serialized hash of defenitions)
# programming_languages
name:string created_by:string example_code:text
C++ Bjarne Stroustrup #include <iostream> etc...
HelloWorld Jeff Skeet h
AnotherOne Jon Atwood imports 'SORULEZ.cs' etc...
当用户搜索 hello
时,系统显示“你好”的定义。这相对容易实现。但是,当用户搜索 UNIX
时,引擎必须选择:word
或 operating_system
。此外,当用户搜索 windows
(小写字母“w”)时,引擎会选择 word
,但也应该显示假设“windows”是一个单词。用作操作系统相反。
谁能通过解析和选择搜索查询主题为我指明正确的方向?谢谢。
注意:它不需要能够像 WA 那样执行计算。
Lets say I have three models/tables: operating_systems
, words
, and programming_languages
:
# operating_systems
name:string created_by:string family:string
Windows Microsoft MS-DOS
Mac OS X Apple UNIX
Linux Linus Torvalds UNIX
UNIX AT&T UNIX
# words
word:string defenitions:string
window (serialized hash of defenitions)
hello (serialized hash of defenitions)
UNIX (serialized hash of defenitions)
# programming_languages
name:string created_by:string example_code:text
C++ Bjarne Stroustrup #include <iostream> etc...
HelloWorld Jeff Skeet h
AnotherOne Jon Atwood imports 'SORULEZ.cs' etc...
When a user searches hello
, the system shows the defenitions of 'hello'. This is relatively easy to implement. However, when a user searches UNIX
, the engine must choose: word
or operating_system
. Also, when a user searches windows
(small letter 'w'), the engine chooses word
, but should also show Assuming 'windows' is a word. Use as an <a href="etc..">operating system</a> instead
.
Can anyone point me in the right direction with parsing and choosing the topic of the search query? Thanks.
Note: it doesn't need to be able to perform calculations as WA can do.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Wolfram Alpha 比你的例子复杂得多...我不确定它的内部工作原理(我对它的阅读很少),但我相信它是一个非常大和复杂的 自动推理系统。它们的实现相当简单(Prolog 基本上是一种通用的,你可以将你需要的任何数据放入其中),但它们很难发挥作用。
Wolfram Alpha is far more complex than your example... I'm not certain of its inner workings (I have done very little reading on it), but I believe it is a very large and complex automated inference system. They're rather trivial to implement (Prolog is basically a general purpose one you can put whatever data you need into), but they're very hard to make useful.
有一个名为
terms
的新索引表,其中包含每个有效术语的标记化版本。这样,您只需搜索一张表。然后您可以看到用户搜索词的匹配程度。即“Windows”将与
2
100% 匹配 - 因此假设如此,但也与1
紧密匹配,因此建议将其作为替代方案。您必须编写自己的规则引擎来决定单词匹配的程度(即“windows”与“Windows”的假设是什么?)如果规则符合,则Priority
字段可能是最终决定者引擎无法决定,理论上可以由用户活动驱动,因此它可以了解用户更有可能参考的内容。Have a new index table called
terms
that contains a tokenised version of each valid term. That way, you only have to search one table.Then you can see how close a match the users search term is. I.e. "Windows" would be a 100% match with
2
- so assume that, but a close match to1
also, so suggest that as an alternative. You've have to write your own rules engine that decided how close a word matches (i.e. what gets assumed with "windows" vs "Windows"?) ThePriority
field could be the final decider if the rules engine can't decide, and could in theory be driven by user activity so it learns what users are more likely referring to.那么如何以数据库表的形式创建一个缓存,其中包含所有关键字。
搜索查询将如下所示:
关键字表将包含对模块的某种引用。
这种方法的优点当然是快速搜索。
您可以使用两个查询来模拟您要求的行为:
And what about to make a cache in form of a database table where all the keywords would be.
The search query would be something like this:
the keywords table would contain some kind of references to your modules.
The advantage of this approarch is of course fast searching.
You may use two queries in order to simulate the behaviour you ask for: