有没有好的方法来解析用户代理字符串?

发布于 2024-12-10 12:27:23 字数 573 浏览 0 评论 0 原文

我有一个 Java 模块,它从最终用户的浏览器接收 User-Agent 字符串,其行为需要根据浏览器类型、浏览器版本以及可能的不同而略有不同甚至操作系统。 例如: {"FireFox", "7.0", "Win7"}, {"Safari", "3.2", "iOS9"}

我明白 由于不同的插件安装等原因,对于完全相同的配置,User-Agent 字符串的格式可能会有所不同。

我的问题:

  1. User-Agent 的结构是否定义良好?如果是的话 - 我在哪里可以准确找到它? (根据我对 RFC 的理解,这里没有太多标准化)。
  2. 假设#1 的问题是 - 是否有正确的方法来解析它以获取我需要的信息?
  3. 除了 User-Agent 字符串之外,还有更好的方法来获取我需要的信息吗?

重要提示 - 我正在谈论一个网络应用程序,因此我的数据收集能力仅限于 javascript

I have a Java module that receives the User-Agent string from an end user's browser needs to behave slightly differently depending on the type of browser, the version of the browser and maybe even the operating system.
E.g.: {"FireFox", "7.0", "Win7"}, {"Safari", "3.2", "iOS9"}

I understood that the User-Agent string can vary in its format for the exact same configuration due to different plug-in installations etc.

My questions:

  1. Is the structure of the User-Agent well defined? If yes - where can I find it exactly? (From my understanding of the RFC there is not much standardization here).
  2. Assuming the question for #1 is No - is there a proper way to parse it to get the info I need?
  3. Is there a better way to get the info I need other than the User-Agent string?

Important note - I'm talking about a web-app, so my data collection abilities are limited to javascript.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

俯瞰星空 2024-12-17 12:27:23

看看我为此目的编写的 Java 库: Yauaa

我制作的一个非常简单的 servlet,您可以尝试一下,看看它是否给出您正在寻找的答案:https://try.yauaa.basjes.nl/

它已获得 Apache 2 许可并发布到 Maven 中,因此在 Java 应用程序中使用它非常容易。它目前在荷兰最繁忙的网站之一(我工作的地方)的生产中使用。

请参阅此博客https://techlab.bol.com/making-sense -用户代理字符串/

Have a look at the Java library I wrote for this purpose: Yauaa

I made a very simple servlet where you can try it out to see if it gives the answers you are looking for: https://try.yauaa.basjes.nl/

It is Apache 2 licensed and published into Maven so using it in a Java application is really easy. It is currently used in production on one of the busiest websites of the Netherlands (where I work).

See this blog about this https://techlab.bol.com/making-sense-user-agent-string/

娇俏 2024-12-17 12:27:23

对于 Java,请查看 User-Agent-Utils。它相当紧凑(< 50kB)并且没有依赖性。

请注意,尽管最新版本是最近的(1.21,于 2018 年 1 月 24 日发布),但该库的页面指出:

警告:该项目已终止,将不再定期更新

并且在 github 上页面上面写着:

停产警告

该库已达到生命周期终点,不会定期更新
不再这样了。

版本 1.21 是 2018 年最后一个正式版本。

For Java, take a look at User-Agent-Utils. It's fairly compact (< 50kB) and has no dependencies.

Note although the latest release is quite recent (1.21, released 2018-01-24), the library's page states:

Warning: This project is end-of-life and will not be updated regularly any longer

And on the github page it says:

EOL WARNING

This library has reached end-of-life and will not see regular updates
any longer.

Version 1.21 was the last official release in 2018.

若相惜即相离 2024-12-17 12:27:23
  1. 用户代理的结构定义是否明确?如果是的话 - 我在哪里可以准确找到它? (根据我对 RFC 的理解,没有太多
    此处标准化)。

不,用户代理字符串的结构不是标准化的,但不同代理之间非常相似。尽管它们非常相似,但仍然需要使用多种模式进行检测。

  • 假设#1 的问题是否定的 - 是否有正确的方法
    解析它以获取我需要的信息?
  • 您可以尝试库 UADetector。它是 user-agent-string.info 的用户代理数据库的包装。

  • 除了用户代理字符串之外,还有更好的方法来获取我需要的信息吗?
  • 我不会说这是更好或更坏的方法,但检测用户代理的另一种方法是客户端使用 JavaScript 来收集有关用户代理的信息,并通过隐藏的 HTML 输入或 XmlHttpRequest 将其提交到后端。这完全取决于您想要识别的内容。对于准确检测网络爬虫而言,JavaScript 无法提供帮助。

    1. Is the structure of the User-Agent well defined? If yes - where can I find it exactly? (From my understanding of the RFC there is not much
      standardization here).

    No, the structure of an User-Agent string is not standardized but is very similar between different agents. Although they are very similar, it is still necessary to use multiple patterns for detection.

    1. Assuming the question for #1 is No - is there a proper way to
      parse it to get the info I need?

    You can try the library UADetector. It is a wrapper for the User-Agent-Database of user-agent-string.info.

    1. Is there a better way to get the info I need other than the User-Agent string?

    I would not say it is a better or worse way, but another way to detect user agents is the client-side use of JavaScript to collect informations about the User-Agent and submitting it via hidden HTML inputs or XmlHttpRequest to your backend. It all depends on what you want to identify. For accurate detection of webcrawlers JavaScript won't be able to help.

    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文