如何检查字符串是否包含特定单词?
考虑一下:
$a = 'How are you?';
if ($a contains 'are')
echo 'true';
假设我有上面的代码,那么 if ($a contains 'are')
语句的正确编写方式是什么?
Consider:
$a = 'How are you?';
if ($a contains 'are')
echo 'true';
Suppose I have the code above, what is the correct way to write the statement if ($a contains 'are')
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
现在,在 PHP 8 中,您可以使用 str_contains 来执行此操作:< /strong>
请注意:如果 $needle(在字符串中搜索的子字符串)为空,
str_contains
函数将始终返回 true。您应该首先确保 $needle (您的子字符串)不为空。
输出:
This returned false!
还值得注意的是,新的
str_contains
函数区分大小写。输出:
返回 false!
RFC
在 PHP 8 之前
您可以使用
strpos()< /code>
函数,用于查找一个字符串在另一个字符串中的出现情况:
请注意,使用
!== false
是故意的(!= false
都不是故意的) code> 或=== true
都不会返回所需的结果);strpos()
返回针字符串在 haystack 字符串中开始的偏移量,如果未找到针,则返回布尔值false
。由于 0 是有效偏移量,而 0 是“假”,因此我们不能使用像!strpos($a, 'are')
这样更简单的结构。Now with PHP 8 you can do this using str_contains:
Please note: The
str_contains
function will always return true if the $needle (the substring to search for in your string) is empty.You should first make sure the $needle (your substring) is not empty.
Output:
This returned false!
It's also worth noting that the new
str_contains
function is case-sensitive.Output:
This returned false!
RFC
Before PHP 8
You can use the
strpos()
function which is used to find the occurrence of one string inside another one:Note that the use of
!== false
is deliberate (neither!= false
nor=== true
will return the desired result);strpos()
returns either the offset at which the needle string begins in the haystack string, or the booleanfalse
if the needle isn't found. Since 0 is a valid offset and 0 is "falsey", we can't use simpler constructs like!strpos($a, 'are')
.正如其他用户所提到的,您可以使用正则表达式,因为与
strpos
相比,它更适合单词匹配。对are
的strpos
检查也会对诸如 fare、care、stare 等字符串返回 true。通过使用字边界,可以在正则表达式中简单地避免这些意外匹配。are
的简单匹配可能如下所示:在性能方面,
strpos
大约快三倍。当我一次进行一百万次比较时,需要preg_match
需要 1.5 秒才能完成,而
strpos
需要 0.5 秒。编辑:
为了搜索字符串的任何部分,而不仅仅是逐字搜索,我建议使用正则表达式,例如
正则表达式末尾的
i
将正则表达式更改为不区分大小写,如果您不希望这样,您可以将其省略。现在,在某些情况下这可能会产生很大的问题,因为 $search 字符串没有以任何方式进行清理,我的意思是,在某些情况下它可能无法通过检查,就好像
$search
是用户输入一样他们可以添加一些可能表现得像不同正则表达式的字符串...另外,这里有一个很棒的工具,用于测试和查看各种正则表达式的解释 Regex101
要将两组功能组合成一个多用途函数(包括可选择区分大小写),您可以使用如下内容:
还要记住的一点是
\b
不适用于英语以外的其他语言。对此的解释和解决方案取自此处:
You could use regular expressions as it's better for word matching compared to
strpos
, as mentioned by other users. Astrpos
check forare
will also return true for strings such as: fare, care, stare, etc. These unintended matches can simply be avoided in regular expression by using word boundaries.A simple match for
are
could look something like this:On the performance side,
strpos
is about three times faster. When I did one million compares at once, it tookpreg_match
1.5 seconds to finish and forstrpos
it took 0.5 seconds.Edit:
In order to search any part of the string, not just word by word, I would recommend using a regular expression like
The
i
at the end of regular expression changes regular expression to be case-insensitive, if you do not want that, you can leave it out.Now, this can be quite problematic in some cases as the $search string isn't sanitized in any way, I mean, it might not pass the check in some cases as if
$search
is a user input they can add some string that might behave like some different regular expression...Also, here's a great tool for testing and seeing explanations of various regular expressions Regex101
To combine both sets of functionality into a single multi-purpose function (including with selectable case sensitivity), you could use something like this:
One more thing to take in mind, is that
\b
will not work in different languages other than english.The explanation for this and the solution is taken from here:
So in order to use the answer in PHP, you can use this function:
And if you want to search for array of words, you can use this:
As of PHP 8.0.0 you can now use str_contains
这是一个在这种情况下很有用的小实用函数
Here is a little utility function that is useful in situations like this
要确定一个字符串是否包含另一个字符串,您可以使用 PHP 函数
strpos()
。注意:
如果您要搜索的针位于大海捞针的开头,它将返回位置 0,如果您执行的
==
比较不起作用,您将需要做一个===
==
符号是一个比较,测试左边的变量/表达式/常量是否与变量/表达式/具有相同的值向右不变。===
符号是比较,以查看两个变量/表达式/常量是否相等AND
具有相同的类型 - 即都是字符串或都是整数。使用此方法的优点之一是每个 PHP 版本都支持此函数,这与
str_contains()
不同。To determine whether a string contains another string you can use the PHP function
strpos()
.CAUTION:
If the needle you are searching for is at the beginning of the haystack it will return position 0, if you do a
==
compare that will not work, you will need to do a===
A
==
sign is a comparison and tests whether the variable / expression / constant to the left has the same value as the variable / expression / constant to the right.A
===
sign is a comparison to see whether two variables / expresions / constants are equalAND
have the same type - i.e. both are strings or both are integers.One of the advantages of using this approach is that every PHP version supports this function, unlike
str_contains()
.虽然大多数答案都会告诉您子字符串是否出现在字符串中,但如果您要查找特定的单词,而不是子字符串,那么这通常不是您想要的。
有什么区别?子字符串可以出现在其他单词中:
缓解这种情况的一种方法是使用正则表达式与 单词边界 (
\b
) 相结合:该方法没有上述相同的误报,但它确实有一些自己的边缘情况。单词边界匹配非单词字符 (
\W
),这些字符不是az
、AZ
、0-9
或_
。这意味着数字和下划线将被计为单词字符,这样的场景将会失败:如果你想要比这更准确的东西,你就必须开始进行英语语法解析,这是一个相当大的蠕虫罐头(并且假设正确使用语法,无论如何,这并不总是给定的)。
While most of these answers will tell you if a substring appears in your string, that's usually not what you want if you're looking for a particular word, and not a substring.
What's the difference? Substrings can appear within other words:
One way to mitigate this would be to use a regular expression coupled with word boundaries (
\b
):This method doesn't have the same false positives noted above, but it does have some edge cases of its own. Word boundaries match on non-word characters (
\W
), which are going to be anything that isn'ta-z
,A-Z
,0-9
, or_
. That means digits and underscores are going to be counted as word characters and scenarios like this will fail:If you want anything more accurate than this, you'll have to start doing English language syntax parsing, and that's a pretty big can of worms (and assumes proper use of syntax, anyway, which isn't always a given).
查看
strpos()
:Look at
strpos()
:使用
strstr()
或stristr()
如果您的搜索不区分大小写,则另一种选择。Using
strstr()
orstristr()
if your search should be case insensitive would be another option.同行 SamGoody 和 Lego Stormtroopr 评论。
如果您正在寻找一种 PHP 算法来根据多个单词的邻近度/相关性对搜索结果进行排名
这里有一种仅使用 PHP 生成搜索结果的快速简便的方法:
其他布尔搜索方法的问题,例如
strpos()
、preg_match()
、< code>strstr() 或stristr()
基于 矢量空间模型 和 tf-idf(词频-逆文档频率):
听起来很难,但出奇的简单。
如果我们要搜索字符串中的多个单词,核心问题是如何为每个单词分配权重?
如果我们可以根据字符串中的术语在整个字符串中的代表性来对它们进行加权,
我们可以按照与查询最匹配的结果对结果进行排序。
这就是向量空间模型的思想,与 SQL 全文搜索的工作方式相差不远:
CASE 1
RESULT
CASE 2
结果
案例3
结果
还有很多需要改进的地方
但该模型提供了一种从自然查询中获得良好结果的方法,
没有布尔运算符,例如
strpos()
、preg_match()
、strstr()
或stristr().
NOTA BENE
可以选择在搜索单词之前消除冗余
从而减小索引大小并减少存储需求
更少的磁盘I/O
更快的索引和因此更快的搜索。
1.规范化
2.停用词消除
3.字典替换
用具有相同或相似含义的其他单词替换单词。
(例如:将“饥饿”和“饥饿”的实例替换为“饥饿”)
可以执行进一步的算法措施(滚雪球)以进一步将单词减少到其基本含义。
通过降低精度来减少数值是标准化文本的其他方法。
资源
Peer to SamGoody and Lego Stormtroopr comments.
If you are looking for a PHP algorithm to rank search results based on proximity/relevance of multiple words
here comes a quick and easy way of generating search results with PHP only:
Issues with the other boolean search methods such as
strpos()
,preg_match()
,strstr()
orstristr()
PHP method based on Vector Space Model and tf-idf (term frequency–inverse document frequency):
It sounds difficult but is surprisingly easy.
If we want to search for multiple words in a string the core problem is how we assign a weight to each one of them?
If we could weight the terms in a string based on how representative they are of the string as a whole,
we could order our results by the ones that best match the query.
This is the idea of the vector space model, not far from how SQL full-text search works:
CASE 1
RESULT
CASE 2
RESULTS
CASE 3
RESULTS
There are plenty of improvements to be made
but the model provides a way of getting good results from natural queries,
which don't have boolean operators such as
strpos()
,preg_match()
,strstr()
orstristr()
.NOTA BENE
Optionally eliminating redundancy prior to search the words
thereby reducing index size and resulting in less storage requirement
less disk I/O
faster indexing and a consequently faster search.
1. Normalisation
2. Stopword elimination
3. Dictionary substitution
Replace words with others which have an identical or similar meaning.
(ex:replace instances of 'hungrily' and 'hungry' with 'hunger')
Further algorithmic measures (snowball) may be performed to further reduce words to their essential meaning.
The replacement of colour names with their hexadecimal equivalents
The reduction of numeric values by reducing precision are other ways of normalising the text.
RESOURCES
使用
strpos()
进行子字符串匹配:Make use of substring matching using
strpos()
:如果你想避免“假”和“真”问题,可以使用 substr_count:
它比 strpos 慢一点,但它避免了比较问题。
If you want to avoid the "falsey" and "truthy" problem, you can use substr_count:
It's a bit slower than strpos but it avoids the comparison problems.
另一种选择是使用 strstr() 函数。类似于:
需要注意的是: strstr() 函数区分大小写。对于不区分大小写的搜索,请使用 stristr() 函数。
Another option is to use the strstr() function. Something like:
Point to note: The strstr() function is case-sensitive. For a case-insensitive search, use the stristr() function.
令我印象深刻的是,这里没有使用
strpos
、strstr
和类似函数的答案多字节字符串函数 (2015-05-08)。基本上,如果您无法找到包含某些语言特有字符的单词,例如德语、法语、葡萄牙语、西班牙语等(例如:ä、é、ô、ç、º、ñ),您可能需要在具有
mb_
的功能。因此,接受的答案将使用mb_strpos
或mb_stripos
(用于不区分大小写的匹配) 相反:如果您不能保证所有数据都是 100% 采用 UTF-8,您可能想使用
mb_
函数。一篇很好的文章,可以帮助您理解为什么 每个软件开发人员绝对必须了解 Unicode和字符集(没有借口!) 作者:Joel Spolsky。
I'm a bit impressed that none of the answers here that used
strpos
,strstr
and similar functions mentioned Multibyte String Functions yet (2015-05-08).Basically, if you're having trouble finding words with characters specific to some languages, such as German, French, Portuguese, Spanish, etc. (e.g.: ä, é, ô, ç, º, ñ), you may want to precede the functions with
mb_
. Therefore, the accepted answer would usemb_strpos
ormb_stripos
(for case-insensitive matching) instead:If you cannot guarantee that all your data is 100% in UTF-8, you may want to use the
mb_
functions.A good article to understand why is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.
在 PHP 中,验证字符串是否包含某个子字符串的最佳方法是使用一个简单的辅助函数,如下所示:
说明:
strpos
查找字符串中区分大小写的子字符串第一次出现的位置。stripos
查找 a 第一次出现的位置字符串中不区分大小写的子字符串。myFunction
始终返回布尔值并修复子字符串索引为 0 时的意外行为。$caseSensitive ? A : B
选择strpos
或stripos
来完成工作,具体取决于$caseSensitive
的值。输出:
In PHP, the best way to verify if a string contains a certain substring, is to use a simple helper function like this:
Explanation:
strpos
finds the position of the first occurrence of a case-sensitive substring in a string.stripos
finds the position of the first occurrence of a case-insensitive substring in a string.myFunction($haystack, $needle) === FALSE ? FALSE : TRUE
ensures thatmyFunction
always returns a boolean and fixes unexpected behavior when the index of the substring is 0.$caseSensitive ? A : B
selects eitherstrpos
orstripos
to do the work, depending on the value of$caseSensitive
.Output:
您可以使用
strstr
函数:不使用内置函数:
You can use the
strstr
function:Without using an inbuilt function:
下面的函数也可以工作,并且不依赖于任何其他函数;它仅使用本机 PHP 字符串操作。就我个人而言,我不推荐这样做,但你可以看看它是如何工作的:
测试:
The function below also works and does not depend on any other function; it uses only native PHP string manipulation. Personally, I do not recommend this, but you can see how it works:
Test:
很多答案使用
substr_count
检查是否结果是>0
。但由于if
语句认为零,与 false 相同,您可以避免该检查并直接写入:要检查是否存在,请添加
!
运算符:Lot of answers that use
substr_count
checks if the result is>0
. But since theif
statement considers zero the same as false, you can avoid that check and write directly:To check if not present, add the
!
operator:我对此遇到了一些麻烦,最后我选择创建自己的解决方案。不使用正则表达式引擎:
您可能会注意到之前的解决方案并不是这个词的答案用作另一个的前缀。为了使用您的示例:
在上面的示例中,
$a
和$b
都包含$c
,但您可能希望您的函数告诉你只有$a
包含$c
。I had some trouble with this, and finally I chose to create my own solution. Without using regular expression engine:
You may notice that the previous solutions are not an answer for the word being used as a prefix for another. In order to use your example:
With the samples above, both
$a
and$b
contains$c
, but you may want your function to tell you that only$a
contains$c
.使用 strstr() 和 stristr() 如下所示:
Another option to finding the occurrence of a word from a string using strstr() and stristr() is like the following:
它可以通过三种不同的方式完成:
1- stristr()
2- strpos()
3- preg_match()
It can be done in three different ways:
1- stristr()
2- strpos()
3- preg_match()
如果您只想检查一个字符串是否包含在另一字符串中,请不要使用
preg_match()
。请改用strpos()
或strstr()
,因为它们会更快。 (https://www.php.net/preg_match)Do not use
preg_match()
if you only want to check if one string is contained in another string. Usestrpos()
orstrstr()
instead as they will be faster. (https://www.php.net/preg_match)简写版本
The short-hand version
为了找到一个“单词”,而不是出现一系列实际上可能是另一个单词的一部分的字母,以下将是一个很好的解决方案。
In order to find a 'word', rather than the occurrence of a series of letters that could in fact be a part of another word, the following would be a good solution.
您应该使用不区分大小写的格式,因此如果输入的值是
small
或caps
也没关系。这里stripos在heystack中找到needle而不考虑大小写(小/大写)。
带输出的 PHP 代码示例
You should use case Insensitive format,so if the entered value is in
small
orcaps
it wont matter.Here stripos finds needle in heystack without considering case (small/caps).
PHPCode Sample with output
也许你可以使用这样的东西:
Maybe you could use something like this:
如果您想检查字符串是否包含多个特定单词,您可以执行以下操作:
例如,这对于在发送电子邮件时避免垃圾邮件很有用。
If you want to check if the string contains several specifics words, you can do:
This is useful to avoid spam when sending emails for example.
strpos 函数工作正常,但如果您想对段落中的单词进行
不区分大小写
检查,那么您可以使用 PHP 的stripos
函数。例如,
查找字符串中不区分大小写的子字符串第一次出现的位置。
如果字符串中不存在该单词,则返回 false,否则返回该单词的位置。
The strpos function works fine, but if you want to do
case-insensitive
checking for a word in a paragraph then you can make use of thestripos
function ofPHP
.For example,
Find the position of the first occurrence of a case-insensitive substring in a string.
If the word doesn't exist in the string then it will return false else it will return the position of the word.
可以使用以下函数检查字符串:
A string can be checked with the below function:
您需要使用相同/不相同的运算符,因为 strpos 可以返回 0 作为其索引值。如果您喜欢三元运算符,请考虑使用以下内容(我承认似乎有点倒退):
You need to use identical/not identical operators because strpos can return 0 as it's index value. If you like ternary operators, consider using the following (seems a little backwards I'll admit):
这意味着必须将字符串解析为单词(请参见下面的注释)。
执行此操作并指定分隔符的一种方法是使用
preg_split
(doc):运行给出
注意:这里我们并不是指每个符号序列的单词。
单词的实际定义是 PCRE 正则表达式引擎,其中单词是仅由单词字符组成的子串,由非单词字符分隔。
This means the string has to be resolved into words (see note below).
One way to do this and to specify the separators is using
preg_split
(doc):A run gives
Note: Here we do not mean word for every sequence of symbols.
A practical definition of word is in the sense the PCRE regular expression engine, where words are substrings consisting of word characters only, being separated by non-word characters.