php 文本中最常用的单词
我在 stackoverflow 上找到了下面的代码,它可以很好地查找字符串中最常见的单词。但我可以排除对“a、if、you、have 等”等常用词的计数吗?或者我必须在计数后删除元素吗?我该怎么做?提前致谢。
<?php
$text = "A very nice to tot to text. Something nice to think about if you're into text.";
$words = str_word_count($text, 1);
$frequency = array_count_values($words);
arsort($frequency);
echo '<pre>';
print_r($frequency);
echo '</pre>';
?>
I found the code below on stackoverflow and it works well in finding the most common words in a string. But can I exclude the counting on common words like "a, if, you, have, etc"? Or would I have to remove the elements after counting? How would I do this? Thanks in advance.
<?php
$text = "A very nice to tot to text. Something nice to think about if you're into text.";
$words = str_word_count($text, 1);
$frequency = array_count_values($words);
arsort($frequency);
echo '<pre>';
print_r($frequency);
echo '</pre>';
?>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是一个从字符串中提取常用单词的函数。它需要三个参数;字符串、停用词数组和关键字计数。你必须使用php函数从txt文件中获取stop_words,该函数将txt文件放入数组中
您可以使用此文件 stop_words .txt 作为您的主要停用词文件,或创建您自己的文件。
This is a function that extract common words from a string. it takes three parameters; string, stop words array and keywords count. you have to get the stop_words from txt file using php function that take txt file into array
You can use this file stop_words.txt as your primary stop words file, or create your own file.
以下是我使用内置 PHP 函数的解决方案:
most_frequent_words — 查找字符串中出现最频繁的单词
返回数组包含字符串中出现最频繁的单词。
参数:
string $string - 输入字符串。
array $stop_words (可选)- 从数组中过滤掉的单词列表,默认为空数组。
string $limit(可选)- 限制返回的单词数,默认 5。
Here is my solution by using the built-in PHP functions:
most_frequent_words — Find most frequent word(s) appeared in a String
Returns array contains word(s) appeared most frequently in the string.
Parameters :
string $string - The input string.
array $stop_words (optional) - List of words which are filtered out from the array, Default empty array.
string $limit (optional) - Limit the number of words returned, Default 5.
没有其他参数或本机 PHP 函数可以传递要排除的单词。因此,我只会使用您拥有的内容并忽略
str_word_count
返回的自定义单词集。There's not additional parameters or a native PHP function that you can pass words to exclude. As such, I would just use what you have and ignore a custom set of words returned by
str_word_count
.您可以使用
array_diff()
:给出
但是你必须自己处理小写和大写。这里最简单的方法是
预先将文本转换为小写。
You can do this easily by using
array_diff()
:gives
But you have to take care of lower and upper case yourself. The easiest way here would be to
convert the text to lowercase beforehand.