如何使用Python解析用户代理字符串

发布于 2025-01-04 03:01:21 字数 1268 浏览 3 评论 0原文

<field name="http.user_agent" showname="User-Agent: CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)\r\n" size="62" pos="542" show="CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)" value="557365722d4167656e743a20434f52452f362e3530362e342e31204f70656e434f52452f322e303220284c696e75783b416e64726f696420322e32290d0a"/>

<field name="http.user_agent" showname="User-Agent: HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5\r\n" size="67" pos="570" show="HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5" value="557365722d4167656e743a204854432053747265616d696e6720506c61796572206874635f777765202f20312e30202f206874635f7669766f202f20322e332e350d0a"/>

<field name="http.user_agent" showname="User-Agent: AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)\r\n" size="85" pos="639" show="AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)" value="557365722d4167656e743a204170706c65436f72654d656469612f312e302e302e38433134382028695061643b20553b20435055204f5320345f325f31206c696b65204d6163204f5320583b2073765f7365290d0a"/>

上面列出了我获得的网址示例。我想知道 Python 中是否有任何模块可以用来解析用户代理。我想从这些示例中获取输出,例如:

Android
HTC Streaming player
ipad

如果是 PC 用户,我想获取网络浏览器类型。

<field name="http.user_agent" showname="User-Agent: CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)\r\n" size="62" pos="542" show="CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)" value="557365722d4167656e743a20434f52452f362e3530362e342e31204f70656e434f52452f322e303220284c696e75783b416e64726f696420322e32290d0a"/>

<field name="http.user_agent" showname="User-Agent: HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5\r\n" size="67" pos="570" show="HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5" value="557365722d4167656e743a204854432053747265616d696e6720506c61796572206874635f777765202f20312e30202f206874635f7669766f202f20322e332e350d0a"/>

<field name="http.user_agent" showname="User-Agent: AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)\r\n" size="85" pos="639" show="AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)" value="557365722d4167656e743a204170706c65436f72654d656469612f312e302e302e38433134382028695061643b20553b20435055204f5320345f325f31206c696b65204d6163204f5320583b2073765f7365290d0a"/>

The samples of the urls I've got are listed above. I am wondering if there is any module in Python which I can use to parse the user-agent. I want to get the output from these samples like:

Android
HTC Streaming player
ipad

and if it is a PC user, I want to get the web browser type.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

慈悲佛祖 2025-01-11 03:01:21

有一个名为 httpagentparser 的库:

import httpagentparser
>>> s = "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.9 (KHTML, like Gecko) Chrome/5.0.307.11 Safari/532.9"
>>> print httpagentparser.simple_detect(s)
('Linux', 'Chrome 5.0.307.11')
>>> print httpagentparser.detect(s)
{'os': {'name': 'Linux'},
 'browser': {'version': '5.0.307.11', 'name': 'Chrome'}}

There is a library called httpagentparser for that:

import httpagentparser
>>> s = "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.9 (KHTML, like Gecko) Chrome/5.0.307.11 Safari/532.9"
>>> print httpagentparser.simple_detect(s)
('Linux', 'Chrome 5.0.307.11')
>>> print httpagentparser.detect(s)
{'os': {'name': 'Linux'},
 'browser': {'version': '5.0.307.11', 'name': 'Chrome'}}
独享拥抱 2025-01-11 03:01:21

Werkzeug 内置了一个用户代理解析器。

http://werkzeug。 pocoo.org/docs/quickstart/?highlight=user_agent#header-parsing

from werkzeug.test import create_environ
from werkzeug.wrappers import Request

environ = create_environ()
environ.update(HTTP_USER_AGENT=('Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
    ' AppleWebKit/537.36 (KHTML, like Gecko)'
    ' Chrome/76.0.3809.100 Safari/537.36'))
request = Request(environ)

request.user_agent.browser
'chrome'

Werkzeug has a user agent parser built in.

http://werkzeug.pocoo.org/docs/quickstart/?highlight=user_agent#header-parsing

from werkzeug.test import create_environ
from werkzeug.wrappers import Request

environ = create_environ()
environ.update(HTTP_USER_AGENT=('Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
    ' AppleWebKit/537.36 (KHTML, like Gecko)'
    ' Chrome/76.0.3809.100 Safari/537.36'))
request = Request(environ)

request.user_agent.browser
'chrome'
二货你真萌 2025-01-11 03:01:21

我要给出的答案不是关于开源项目的,但它确实提供了任何正在研究如何解析 HTTP user-agent 字符串以获得 设备智能 会想了解。

WURFL 是一个历史悠久的工具,用于进行用户代理(以及更一般的 HTTP 请求)分析并获取易于使用的设备/浏览器信息。这是广告技术行业事实上的标准,借助专有数据库,从 HTTP 请求中榨取最后一滴信息。在实践中,代码将类似于:

from pywurfl.wurfl import Wurfl

# Create a WURFL Engine. Please note that the installed wurfl.zip path may change.
# for example, on OS X systems, it will be in `/usr/local/share/wurfl/wurfl.zip`
# on Linux systems, it will be in `/usr/share/wurfl/wurfl.zip`.
wurfl = Wurfl('/usr/share/wurfl/wurfl.zip')

# Lookup an HTTP request
http_request = {
    "accept-encoding": "gzip, deflate, br",
    "accept-language": "en-US,en;q=0.9",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp",
    "user-agent": " Mozilla/5.0 (Linux; Android 10; SM-G981U1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Mobile Safari/537.36",
}
dev = wurfl.parse_headers(http_request)

# You can also lookup a device with just the user-agent string
# dev = wurfl.parse_useragent(user_agent)

# retrieve some properties and capabilities values

# WURFL device ID:
print("device id =", dev.id)

# Some static capabilities:
static_capabilities = ["model_name", "brand_name", "device_os"]

# Retrieve the value of a single static capability:
print("get_capability('model_name') =",
      dev.get_capability(static_capabilities[0]))

# Retrieve the value of many static capabilities at once:
print("get_capabilities(static_capabilities) =",
      dev.get_capabilities(static_capabilities))

# Some virtual capabilities:
virtual_capabilities = ["complete_device_name", "form_factor"]

# Retrieve the value of a single virtual capability:
print("get_virtual_capability('complete_device_name') =",
      dev.get_virtual_capability(virtual_capabilities[0]))

# Retrieve the value of many virtual capabilities at once:
print("get_virtual_capabilities(virtual_capabilities) =",
      dev.get_virtual_capabilities(virtual_capabilities))

# Make sure you release the device when you are finished
dev.release()

上面的代码将返回:

device id = samsung_sm_g981u_ver1_subuau1
get_capability('model_name') = SM-G981U1
get_capabilities(static_capabilities) = {'model_name': 'SM-G981U1', 'brand_name': 'Samsung', 'device_os': 'Android'}
get_virtual_capability('complete_device_name') = Samsung SM-G981U1 (Galaxy S20 5G)
get_virtual_capabilities(virtual_capabilities) = {'complete_device_name': 'Samsung SM-G981U1 (Galaxy S20 5G)', 'form_factor': 'Smartphone'}

更多信息可以找到 此处

对于那些想要在没有从 ScientiaMobile 获得试用许可证的情况下尝试 WURFL(特别是 PyWURFL)的人,我的公司最近发布了一个 WURFL 版本(称为 WURFL 微服务),可以从 AWSAzureGCP(当然还有 ScientiaMobile 本身)。此外,该产品完全支持 Pythion,尽管语法略有不同,因为该产品依赖于云中的服务器端组件进行更新:

from wmclient import *

try:
    client = WmClient.create("http", "localhost", 8080, "")
      :
    ua = "Mozilla/5.0 (Linux; Android 7.1.1; ONEPLUS A5000 Build/NMF26X) AppleWebKit/537.36 (KHTML, like Gecko) " \
         "Chrome/56.0.2924.87 Mobile Safari/537.36 "

    client.set_requested_static_capabilities(["brand_name", "model_name"])
    client.set_requested_virtual_capabilities(["is_smartphone", "form_factor"])
    print()
    print("Detecting device for user-agent: " + ua);

    # Perform a device detection calling WM server API
    device = client.lookup_useragent(ua)
           :
        # Let's get the device capabilities and print some of them
        capabilities = device.capabilities
        print("Detected device WURFL ID: " + capabilities["wurfl_id"])
        print("Device brand & model: " + capabilities["brand_name"] + " " + capabilities["model_name"])
        print("Detected device form factor: " + capabilities["form_factor"])
        if capabilities["is_smartphone"] == "true":

可以找到完整的示例和对 GitHub 客户端代码的引用 此处

披露:我在提供此处描述的库的公司工作。

The answer I am about to give is not about an open-source project, but it does provide information that whoever is researching how to parse the HTTP user-agent string to obtain device intelligence will want to know about.

WURFL is a time-honored tool to do User-Agent (and more generally HTTP request) analysis and obtain easily consumable device/browser information. This is the de-facto standard in the Ad Tech industry to squeeze the last drop of information out of HTTP requests, thanks to a proprietary database. In practice, code will look something like:

from pywurfl.wurfl import Wurfl

# Create a WURFL Engine. Please note that the installed wurfl.zip path may change.
# for example, on OS X systems, it will be in `/usr/local/share/wurfl/wurfl.zip`
# on Linux systems, it will be in `/usr/share/wurfl/wurfl.zip`.
wurfl = Wurfl('/usr/share/wurfl/wurfl.zip')

# Lookup an HTTP request
http_request = {
    "accept-encoding": "gzip, deflate, br",
    "accept-language": "en-US,en;q=0.9",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp",
    "user-agent": " Mozilla/5.0 (Linux; Android 10; SM-G981U1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Mobile Safari/537.36",
}
dev = wurfl.parse_headers(http_request)

# You can also lookup a device with just the user-agent string
# dev = wurfl.parse_useragent(user_agent)

# retrieve some properties and capabilities values

# WURFL device ID:
print("device id =", dev.id)

# Some static capabilities:
static_capabilities = ["model_name", "brand_name", "device_os"]

# Retrieve the value of a single static capability:
print("get_capability('model_name') =",
      dev.get_capability(static_capabilities[0]))

# Retrieve the value of many static capabilities at once:
print("get_capabilities(static_capabilities) =",
      dev.get_capabilities(static_capabilities))

# Some virtual capabilities:
virtual_capabilities = ["complete_device_name", "form_factor"]

# Retrieve the value of a single virtual capability:
print("get_virtual_capability('complete_device_name') =",
      dev.get_virtual_capability(virtual_capabilities[0]))

# Retrieve the value of many virtual capabilities at once:
print("get_virtual_capabilities(virtual_capabilities) =",
      dev.get_virtual_capabilities(virtual_capabilities))

# Make sure you release the device when you are finished
dev.release()

The code above, would return:

device id = samsung_sm_g981u_ver1_subuau1
get_capability('model_name') = SM-G981U1
get_capabilities(static_capabilities) = {'model_name': 'SM-G981U1', 'brand_name': 'Samsung', 'device_os': 'Android'}
get_virtual_capability('complete_device_name') = Samsung SM-G981U1 (Galaxy S20 5G)
get_virtual_capabilities(virtual_capabilities) = {'complete_device_name': 'Samsung SM-G981U1 (Galaxy S20 5G)', 'form_factor': 'Smartphone'}

More info can be found here.

For those who want to try WURFL (and PyWURFL specifically) without obtaining a Trial license from the ScientiaMobile, my company has recently released a version of WURFL (called WURFL Microservice) that can be obtained from the major marketplaces of AWS, Azure and GCP (in addition to ScientiaMobile itself of course). Also for that product Pythion is fully supported, albeit the syntax is slightly different as that product relies on a server side component in the Cloud for updates:

from wmclient import *

try:
    client = WmClient.create("http", "localhost", 8080, "")
      :
    ua = "Mozilla/5.0 (Linux; Android 7.1.1; ONEPLUS A5000 Build/NMF26X) AppleWebKit/537.36 (KHTML, like Gecko) " \
         "Chrome/56.0.2924.87 Mobile Safari/537.36 "

    client.set_requested_static_capabilities(["brand_name", "model_name"])
    client.set_requested_virtual_capabilities(["is_smartphone", "form_factor"])
    print()
    print("Detecting device for user-agent: " + ua);

    # Perform a device detection calling WM server API
    device = client.lookup_useragent(ua)
           :
        # Let's get the device capabilities and print some of them
        capabilities = device.capabilities
        print("Detected device WURFL ID: " + capabilities["wurfl_id"])
        print("Device brand & model: " + capabilities["brand_name"] + " " + capabilities["model_name"])
        print("Detected device form factor: " + capabilities["form_factor"])
        if capabilities["is_smartphone"] == "true":

Fully-fledged example and reference to GitHub client-code can be found here.

Disclosure: I work for the company that provides the library described here.

指尖凝香 2025-01-11 03:01:21

您可以尝试使用正则表达式编写自己的: http://docs.python.org/library/重新.html
或者看看这个: http://pypi.python.org/pypi/httpagentparser

You can try to write your own with regular expression : http://docs.python.org/library/re.html
or take a look at this : http://pypi.python.org/pypi/httpagentparser

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文