匹配“wap”的正则表达式前面不带“html”

发布于 2024-09-25 18:34:07 字数 1608 浏览 3 评论 0原文

我正在使用 NGINX 来分段移动 WAP/HTML 站点之间的移动流量。看起来最好的方法是通过检查 HTTP Accept 标头来检查 UA 对内容的偏好。

对 WAP 的偏好由标头中“html”或通配符 mimetype 之前出现的“wap”mimetype 指示。

因此,索尼爱立信 w300i 更喜欢 WAP:

multipart/mixed, application/vnd.wap.multpart.mixed,applicatnoin/vnd.wap.xhtml_xml,application/xhtml+xml,text/ved.wap.wl,*/*,text/x-hdml,image/mng,/\image/x-mng,ivdeo/mng,video/x-mng,ima/gebmp,text/html

而 Blackberry Bold 更喜欢 HTML:

text/html,application/xhtml+xml,application/vnd.wap.xhtml+xml,application/vnd.wp.wmlc;q=0.9,application/vnd.awp.wmlscriptc;q=0.7,text/vnd.wap.wml;q=07,/vnd/.sun.j2me.app-descriptor,*/*;q=0.5

因为我在 NGINX 领域,所以我拥有的最好的工具似乎是 NGINX 的正则表达式 (PCRE)。

现在,我尝试使用否定前瞻来断言“接受标头包含 WAP,但前面不包含 HTML”:

(?!html.*)wap

但这并不正确。我可以用不同的方式来思考这个问题吗?还是我的匹配逻辑?

到目前为止,我发现这些正则表达式资源很有用:

http://www.regular-expressions.info/完整线.html http://www.zytrax.com/tech/web/regex.htm http://wiki.nginx.org/NginxHttpRewriteModule

谢谢!


感谢您的回答,以下是相关测试:

import re

prefers_wap_re = re.compile(r'^(?!(?:(?!wap).)*html).*?wap', re.I)

tests = [
    ('', False),
    ('wap', True),
    ('wap html', True),
    ('html wap', False),
]

for test, expected in tests:
    result = prefers_wap_re.search(test)
    assert bool(result) is expected, \
        'Tested "%s", expected %s, got %s.' % (test, expected, result)

I'm using NGINX to segment mobile traffic between a mobile WAP/HTML site. Looks like the best way to do this is going to be to check the UA's preference for content by checking the HTTP Accept Header.

A preference for WAP is indicated by the appearance of a 'wap' mimetype in the header before an 'html' or wildcard mimetype.

So a Sony Ericsson w300i has a preference for WAP:

multipart/mixed, application/vnd.wap.multpart.mixed,applicatnoin/vnd.wap.xhtml_xml,application/xhtml+xml,text/ved.wap.wl,*/*,text/x-hdml,image/mng,/\image/x-mng,ivdeo/mng,video/x-mng,ima/gebmp,text/html

And a Blackberry Bold has a preference for HTML:

text/html,application/xhtml+xml,application/vnd.wap.xhtml+xml,application/vnd.wp.wmlc;q=0.9,application/vnd.awp.wmlscriptc;q=0.7,text/vnd.wap.wml;q=07,/vnd/.sun.j2me.app-descriptor,*/*;q=0.5

Since I'm in NGINX land, it seems like the best tool I have is NGINX's regular expressions (PCRE).

Right now I'm trying to use a negative lookahead to assert "The accept header contains WAP but not preceeded by HTML":

(?!html.*)wap

But this isn't correct. Is there a different way I can think about this problem? Or my matching logic?

So far I've found these regex resources useful:

http://www.regular-expressions.info/completelines.html
http://www.zytrax.com/tech/web/regex.htm
http://wiki.nginx.org/NginxHttpRewriteModule

Thanks!


Thanks for the answer, here are the related tests:

import re

prefers_wap_re = re.compile(r'^(?!(?:(?!wap).)*html).*?wap', re.I)

tests = [
    ('', False),
    ('wap', True),
    ('wap html', True),
    ('html wap', False),
]

for test, expected in tests:
    result = prefers_wap_re.search(test)
    assert bool(result) is expected, \
        'Tested "%s", expected %s, got %s.' % (test, expected, result)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

无语# 2024-10-02 18:34:07

最简单的方法是使用后视而不是前视。由于不支持这一点,您可以尝试用前瞻来模拟后视:

^(?!(?:(?!wap).)*html).*?wap

读起来不太舒服,但它应该可以工作。

红宝石

The simplest way to do this is with a lookbehind instead of a lookahead. Since that is not supported you can try to emulate a lookbehind with a lookahead:

^(?!(?:(?!wap).)*html).*?wap

Not pleasant to read, but it should work.

Rubular

故事和酒 2024-10-02 18:34:07

对于负向后看和“微米”更多性能,也许可以使用非贪婪匹配的负向后看:

(?<!html.*?)wap

For negative look behind, and a "micron" more performance, perhaps negative look behind with non-greedy matching:

(?<!html.*?)wap
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文