PHP 的同义词库类或 API [编辑]
TL;DR 摘要: 我需要一个命令行应用程序,可以使用它来获取同义词和其他相关单词。它需要是多语言的并且跨平台工作。任何人都可以为我推荐一个合适的程序,或者帮助我使用我已经找到的程序吗?谢谢。
更长的版本: 我的任务是用 PHP 编写一个系统,该系统可以针对用户输入的单词提出替代建议。我需要找到一个同义词库应用程序/API 或类似的应用程序,我可以用它来生成这些建议。
重要的是,它需要是多语言的(英语、丹麦语、法语和德语)。这排除了我使用谷歌找到的大部分软件。它还需要跨平台(需要在 Linux 和 Windows 上运行)。
我的研究让我找到了两个有希望的候选者:WordNet 和 Stardict。
到目前为止,我一直专注于 WordNet,使用 shell_exec() 函数从 PHP 调用它,并且我已经成功地使用它创建了一个非常有前途的原型 PHP 页面,但到目前为止是英文的仅有的。我正在努力解决如何使用多语言的问题。
Wordnet 站点具有其他语言的 Wordnet 项目的外部链接(例如 DanNet丹麦语),但虽然它们通常被称为 Wordnet,但它们似乎使用多种数据库格式和软件,这使得它们不适合我。我需要一个可以从 PHP 程序调用的一致接口。
从这个角度来看,Stardict 看起来更有前途:它们为一个应用程序以标准数据库格式提供多种语言的词典。
但 Stardict 的缺点是它主要是一个 GUI 应用程序。从命令行调用它会启动 GUI。显然有一个命令行版本(SDCV),但它似乎已经过时了(最后更新2006年) ,并且仅适用于 Linux。
任何人都可以帮助我解决这些程序中的任何一个问题吗?或者,有人可以建议我可以使用的任何其他替代软件或 API 吗?
非常感谢。
TL;DR Summary: I need a single command-line application which I can use to get synonyms and other related words. It needs to be multi-lingual and works cross platform. Can anyone suggest a suitable program for me, or help me with the ones I've already found? Thanks.
Longer version:
I've been tasked with writing a system in PHP that can come up with alternative suggestions for words entered by the user. I need to find a thesaurus application / API or similar which I can use to generate these suggestions.
Importantly, it needs to be multilingual (English, Danish, French and German). This rules out most of the software that I managed to find using Google. It also needs to be cross-platform (it needs to work on Linux and Windows).
My research has let me to two promising candidates: WordNet and Stardict.
I've been focusing on WordNet so far, calling it from PHP using the shell_exec()
function, and I've managed to use it to create a very promising prototype PHP page, but so far in English only. I'm struggling with how to use it multi-lingual.
The Wordnet site has external links to Wordnet projects in other language (eg DanNet for Danish), but although they're often called Wordnet, they seem to use a variety of database formats and software, which makes them unsuitable for me. I need a consistent interface that I can call from my PHP program.
Stardict looked more promising from that perspective: they provide dictionaries in many languages in a standard DB format for the one application.
But the down-side of Stardict is that its primarily a GUI app. Calling it from the command-line launches the GUI. There is apparently a command-line version (SDCV), but it seems quite out of date (last update 2006), and only for Linux.
Can anyone help me with my problems with either of these programs? Or else, can anyone suggest any other alternative software or API that I could use?
Many thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以尝试利用 PostgreSQL 的全文搜索功能:
http://www.postgresql.org/docs/9.0 /static/textsearch.html
您可以使用任何可用的语言和各种排序规则来配置它,以满足您的需求。 PostgreSQL 9.1 添加了一些额外的排序功能,如果该方法看起来合理,您可能需要研究一下。
基本步骤是(对于每种语言):
创建所需的表(适当整理)。为了我们的利益,一列就足够了,例如:
获取所需的词典/同义词库文件(来自 aspell/Open-Office 的那些应该可以)。
使用相关文件配置文本搜索(参见上面的链接,即第 12.6 节)。
将整个字典插入表中。 (肯定有一个 csv 文件在某处...)
最后索引向量,例如:
您现在可以运行使用此索引的查询:
您可能需要为每种语言创建一个单独的数据库或模式,并添加一个附加字段(tsvector)(如果 Postgres)由于语言参数而拒绝为表达式建立索引。 (我很久以前就读过全文文档)。有关此内容的详细信息将在第 12.2 节中进行,我相信如果是这种情况,您会知道如何调整上述内容。
但无论实施细节如何,我相信该方法应该有效。
You could try to leverage PostgreSQL's full text search functionality:
http://www.postgresql.org/docs/9.0/static/textsearch.html
You can configure it with any of the available languages and all sorts of collations to fit your needs. PostgreSQL 9.1 adds some extra collation functionality that you may want to look into if the approach seems reasonable.
The basic steps would be (for each language):
Create the needed table (collated appropriately). For our sake, a single column is enough, e.g.:
Fetch the needed dictionary/thesaurus files (those from aspell/Open-Office should work).
Configure text search (see link above, namely section 12.6) using the relevant files.
Insert the whole dictionary into the table. (Surely there's a csv file somewhere...)
And finally index the vector, e.g.:
You can now run queries that use this index:
You might need to create a separate database or schema for each language, and add an additional field (tsvector) if Postgres refuses to index the expression because of the language parameter. (I read the full text docs a long time ago). The details on this would be in section 12.2, and I'm sure you'll know how to adjust the above if this is the case.
Whichever the implementation details, though, I believe the approach should work.
这里有一个关于同义词库 API 使用的 PHP 示例...
http://thesaurus.altervista.org/testphp
可用于意大利语、英语、法语、德语、西班牙语和葡萄牙语。
There is a PHP example for a thesaurus API usage here...
http://thesaurus.altervista.org/testphp
Available for Italian, English, French, Deutsch, Spanish and Portuguese.
这似乎是一个选择,尽管我不确定它是否是多语言的:
http://developer.dictionary.com/products/synonyms
我还发现以下网站确实类似于您的最终目标,也许您可以尝试联系所有者并询问他是如何做到的:
http://www.synonymlab.com/
This seems to be an option, though I'm not sure whether its multilingual:
http://developer.dictionary.com/products/synonyms
I also found the following site which does something similar to your end goal, maybe you could try contacting the owner and ask him how he did it:
http://www.synonymlab.com/