使用 preg_match 发现并验证 html 中嵌入的链接类型
我已经实现了验证 .edu 域的功能。我就是这样做的:
if( preg_match('/edu/', $matches[0])==FALSE )
return FALSE;
return TRUE;
现在我也想跳过那些指向某些文档(例如 .pdf 和 .doc)的 url。
为此,以下代码应该有效,但无效:
if( preg_match('/edu/', $matches[0])==FALSE || preg_match('/pdf/i', $matches[0])!=FALSE || preg_match('/doc/i', $matches[0]!=FALSE))
return FALSE;
return TRUE;
在这方面我错在哪里? 此外,我将如何实现 preg_match 以使其具有要检查 url 字符串的文档类型列表。如果找到某种类型的文档,则应返回 false。换句话说,我想提供各种文档类型的列表(可能是一个数组)作为 $pattern 来在 url 中查找。
注意: matches[0] 包含整个 url 字符串。 例如: http://www.nust.edu.pk/Documents/pdf/NNBS_Form .pdf
函数代码:
public function validateEduDomain($url) {
// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i', $url, $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
if( preg_match('/edu/', $matches[0])!=FALSE && (preg_match('/pdf/i', $matches[0])==FALSE || preg_match('/doc/i', $matches[0]==FALSE)))
return TRUE;
return FALSE;
}
I have implemented a function to validate .edu domains. This is how I am doing it:
if( preg_match('/edu/', $matches[0])==FALSE )
return FALSE;
return TRUE;
Now I want to skip those urls as well that point to some documents such as .pdf and .doc.
For this, the following code should have worked but is not:
if( preg_match('/edu/', $matches[0])==FALSE || preg_match('/pdf/i', $matches[0])!=FALSE || preg_match('/doc/i', $matches[0]!=FALSE))
return FALSE;
return TRUE;
Where am I wrong in this regard?
Moreover, how will I implement preg_match in such a way that it has a list of document types to check in a url string. If a certain type of document is found, it should return false. In other words, I want to provide a list (an array maybe) of various document types as $pattern to find in a url.
Note:
matches[0] contains the whole url string.
eg: http://www.nust.edu.pk/Documents/pdf/NNBS_Form.pdf
The code for the function:
public function validateEduDomain($url) {
// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i', $url, $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
if( preg_match('/edu/', $matches[0])!=FALSE && (preg_match('/pdf/i', $matches[0])==FALSE || preg_match('/doc/i', $matches[0]==FALSE)))
return TRUE;
return FALSE;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我想知道为什么你让一切变得如此复杂,并且还注意到你有 $$matches[0] 而不是 $matches[0]。您想要的正则表达式是:
I wonder why are you making everything so complicated, and also noticed you have $$matches[0] instead of $matches[0]. The regexes you want is:
您可以查看文件扩展名是否与以下内容匹配:
另外,为什么在 $matches[0] 的第二次和第三次用法中使用双美元符号?
You can see if a file extension matches with something like:
Also, why are you using the double dollar sign for the 2nd and 3rd usages of $matches[0]?
如果我理解正确,类似这样的内容可以有所帮助: http://ideone.com/XOEiU
If I understood correctly, something like this can help: http://ideone.com/XOEiU
我不会为此使用正则表达式:
这与您在评论中指定的域相匹配。
对于文件扩展名,我将有一个更容易维护的单独函数:
您可以将两者结合起来:
比正则表达式更具可读性和可维护性,但请注意,我已经针对这些事情而不是速度进行了优化。
I wouldn't use a regular expression for this:
This matches the domains you specified in your comments.
For the file extensions I would have a separate function that is easier to maintain:
You can combine the two:
Much more readable and maintainable then regular expressions but note that I have optimised for these things and not for speed.