有没有PHP的开源文本分析库?

发布于 2024-08-15 10:41:02 字数 1539 浏览 6 评论 0 原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

诺曦 2024-08-22 10:41:02

简短回答

据我所知,没有一个,或者至少不是一个众所周知的/
分布良好的一个。

长答案

我遇到的关闭事实代码是 php-text-statistics 作者:戴夫
(因为 PEAR 版本 已经多年无人维护了)但是
它只负责可读性以及句子、单词和音节的计数。
您必须自行获取的任何其他数据 count_chars< /代码>
str_word_count, < a href="http://www.php.net/manual/en/function.substr-count.php" rel="nofollow noreferrer">substr_count,
preg_match_all 和喜欢。当然还有一些数学
计算所有百分比的技能。

也就是说,我不确定您希望图书馆做什么,或者
http://textalyser.net/ 的作用...
(我的意思是什么是非索引字表?或者一个详尽的多词短语,为此
事情...?)

Short Answer

As far as I'm aware there's isn't one, or at least not a well-known /
well-distributed one.

Long Answer

The closes de-facto code I've come across is php-text-statistics by Dave
Child
(since the PEAR version has been unmaintained for years) but
that only takes care of readability and sentence, word and syllable counting.
Any other data you'd have to get yourself with count_chars,
str_word_count, substr_count,
preg_match_all and the like. And of course some math
skills to calculate all the percentages.

That said, I'm not sure exactly what is it you'd want the library to do, or
what http://textalyser.net/ does...
(I mean what is a stoplist anyway? Or an exhaustive polyword phrase, for that
matter...?)

玉环 2024-08-22 10:41:02

很老的问题...
不管怎样,如果你想找到两个给定字符串之间的相似度,PHP 有一个内置函数 similar_text

语法: similar_text ($first, $second, &$percent = null);

//*Find the similarity/difference between two strings in percentage
$pc = 0;
similar_text('You rock!', 'I Rock too!!',$pc);
print $pc;

输出: 57.142857142857< br>

如上所述,该值是两个字符串相似的百分比。

注意:该函数区分大小写

$pc = 0;
similar_text('you', 'YOU',$pc);
print $pc;  

将返回 0!

我遇到的另外两个这样的函数是:
编辑距离和编辑距离Soundex

了解更多信息请查看官方文档

Pretty old question...
Anyways, if you want to find the the similarity between two given strings, PHP ha a built-in function similar_text.

SYNTAX: similar_text ($first, $second, &$percent = null);

//*Find the similarity/difference between two strings in percentage
$pc = 0;
similar_text('You rock!', 'I Rock too!!',$pc);
print $pc;

OUTPUT: 57.142857142857

As noted above this value is the percentage two strings are similar.

Note: This function is case sensitive

$pc = 0;
similar_text('you', 'YOU',$pc);
print $pc;  

will give 0!

Two more such functions I came across are:
Levenshtein Distance & Soundex

For more information please check official documentation

玉环 2024-08-22 10:41:02

我目前正在寻找类似的东西,到目前为止我找到的选项是利用 Zend Lucene 搜索功能。

http: //www.opendocs.net/php/zend/ZendFramework-0.1.5/documentation/end-user/pl/zend.search.extending.html

或者查看 Open Calais API。 http://www.opencalais.com/(来自汤森路透)

我还没有探索过这两种选择任何深度,但我认为它们可能有一些用于构建动态链接或超(con)文本化网页的里程。

I'm looking for a similiar thing at the moment and the options I have found so far are to make use of the Zend Lucene Search functionality.

http://www.opendocs.net/php/zend/ZendFramework-0.1.5/documentation/end-user/pl/zend.search.extending.html

or alternatively to look into the Open Calais API. http://www.opencalais.com/ (from Thomson Reuters)

I have yet to explore either option in any depth but I think that there might be some mileage in them for building dynamic links or hyper(con)textualising web pages.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文