如何制作像 Wolfram|Alpha 这样的小型引擎?

发布于 2024-08-31 08:17:15 字数 1104 浏览 16 评论 0原文

假设我有三个模型/表:operating_systemswordsprogramming_languages

# operating_systems
name:string created_by:string family:string
Windows     Microsoft         MS-DOS
Mac OS X    Apple             UNIX
Linux       Linus Torvalds    UNIX
UNIX        AT&T              UNIX

# words
word:string defenitions:string
window      (serialized hash of defenitions)
hello       (serialized hash of defenitions)
UNIX        (serialized hash of defenitions)

# programming_languages
name:string created_by:string example_code:text
C++         Bjarne Stroustrup #include <iostream> etc...
HelloWorld  Jeff Skeet        h
AnotherOne  Jon Atwood        imports 'SORULEZ.cs' etc...

当用户搜索 hello 时,系统显示“你好”的定义。这相对容易实现。但是,当用户搜索 UNIX 时,引擎必须选择:wordoperating_system。此外,当用户搜索 windows(小写字母“w”)时,引擎会选择 word,但也应该显示假设“windows”是一个单词。用作操作系统相反。

谁能通过解析和选择搜索查询主题为我指明正确的方向?谢谢。


注意:它不需要能够像 WA 那样执行计算。

Lets say I have three models/tables: operating_systems, words, and programming_languages:

# operating_systems
name:string created_by:string family:string
Windows     Microsoft         MS-DOS
Mac OS X    Apple             UNIX
Linux       Linus Torvalds    UNIX
UNIX        AT&T              UNIX

# words
word:string defenitions:string
window      (serialized hash of defenitions)
hello       (serialized hash of defenitions)
UNIX        (serialized hash of defenitions)

# programming_languages
name:string created_by:string example_code:text
C++         Bjarne Stroustrup #include <iostream> etc...
HelloWorld  Jeff Skeet        h
AnotherOne  Jon Atwood        imports 'SORULEZ.cs' etc...

When a user searches hello, the system shows the defenitions of 'hello'. This is relatively easy to implement. However, when a user searches UNIX, the engine must choose: word or operating_system. Also, when a user searches windows (small letter 'w'), the engine chooses word, but should also show Assuming 'windows' is a word. Use as an <a href="etc..">operating system</a> instead.

Can anyone point me in the right direction with parsing and choosing the topic of the search query? Thanks.


Note: it doesn't need to be able to perform calculations as WA can do.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

走野 2024-09-07 08:17:21

Wolfram Alpha 比你的例子复杂得多...我不确定它的内部工作原理(我对它的阅读很少),但我相信它是一个非常大和复杂的 自动推理系统。它们的实现相当简单(Prolog 基本上是一种通用的,你可以将你需要的任何数据放入其中),但它们很难发挥作用。

Wolfram Alpha is far more complex than your example... I'm not certain of its inner workings (I have done very little reading on it), but I believe it is a very large and complex automated inference system. They're rather trivial to implement (Prolog is basically a general purpose one you can put whatever data you need into), but they're very hard to make useful.

眼中杀气 2024-09-07 08:17:20

有一个名为 terms 的新索引表,其中包含每个有效术语的标记化版本。这样,您只需搜索一张表。

# terms
Id Name     Type               Priority
1  window   word               false
2  Windows  operating_system   true

然后您可以看到用户搜索词的匹配程度。即“Windows”将与 2 100% 匹配 - 因此假设如此,但也与 1 紧密匹配,因此建议将其作为替代方案。您必须编写自己的规则引擎来决定单词匹配的程度(即“windows”与“Windows”的假设是什么?)如果规则符合,则 Priority 字段可能是最终决定者引擎无法决定,理论上可以由用户活动驱动,因此它可以了解用户更有可能参考的内容。

Have a new index table called terms that contains a tokenised version of each valid term. That way, you only have to search one table.

# terms
Id Name     Type               Priority
1  window   word               false
2  Windows  operating_system   true

Then you can see how close a match the users search term is. I.e. "Windows" would be a 100% match with 2 - so assume that, but a close match to 1 also, so suggest that as an alternative. You've have to write your own rules engine that decided how close a word matches (i.e. what gets assumed with "windows" vs "Windows"?) The Priority field could be the final decider if the rules engine can't decide, and could in theory be driven by user activity so it learns what users are more likely referring to.

油焖大侠 2024-09-07 08:17:20

那么如何以数据库表的形式创建一个缓存,其中包含所有关键字。

搜索查询将如下所示:

SELECT * FROM keywords WHERE keyword = '<YourKeyWord>'   /* mysql */

关键字表将包含对模块的某种引用。

这种方法的优点当然是快速搜索。

您可以使用两个查询来模拟您要求的行为:

  • 完全匹配(在 mysql 中没有问题)
  • 不区分大小写的搜索

And what about to make a cache in form of a database table where all the keywords would be.

The search query would be something like this:

SELECT * FROM keywords WHERE keyword = '<YourKeyWord>'   /* mysql */

the keywords table would contain some kind of references to your modules.

The advantage of this approarch is of course fast searching.

You may use two queries in order to simulate the behaviour you ask for:

  • Exact match (no problem in mysql)
  • Case insensitive search
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文