如何使用Python解析用户代理字符串
<field name="http.user_agent" showname="User-Agent: CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)\r\n" size="62" pos="542" show="CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)" value="557365722d4167656e743a20434f52452f362e3530362e342e31204f70656e434f52452f322e303220284c696e75783b416e64726f696420322e32290d0a"/>
<field name="http.user_agent" showname="User-Agent: HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5\r\n" size="67" pos="570" show="HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5" value="557365722d4167656e743a204854432053747265616d696e6720506c61796572206874635f777765202f20312e30202f206874635f7669766f202f20322e332e350d0a"/>
<field name="http.user_agent" showname="User-Agent: AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)\r\n" size="85" pos="639" show="AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)" value="557365722d4167656e743a204170706c65436f72654d656469612f312e302e302e38433134382028695061643b20553b20435055204f5320345f325f31206c696b65204d6163204f5320583b2073765f7365290d0a"/>
上面列出了我获得的网址示例。我想知道 Python 中是否有任何模块可以用来解析用户代理。我想从这些示例中获取输出,例如:
Android
HTC Streaming player
ipad
如果是 PC 用户,我想获取网络浏览器类型。
<field name="http.user_agent" showname="User-Agent: CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)\r\n" size="62" pos="542" show="CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)" value="557365722d4167656e743a20434f52452f362e3530362e342e31204f70656e434f52452f322e303220284c696e75783b416e64726f696420322e32290d0a"/>
<field name="http.user_agent" showname="User-Agent: HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5\r\n" size="67" pos="570" show="HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5" value="557365722d4167656e743a204854432053747265616d696e6720506c61796572206874635f777765202f20312e30202f206874635f7669766f202f20322e332e350d0a"/>
<field name="http.user_agent" showname="User-Agent: AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)\r\n" size="85" pos="639" show="AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)" value="557365722d4167656e743a204170706c65436f72654d656469612f312e302e302e38433134382028695061643b20553b20435055204f5320345f325f31206c696b65204d6163204f5320583b2073765f7365290d0a"/>
The samples of the urls I've got are listed above. I am wondering if there is any module in Python which I can use to parse the user-agent. I want to get the output from these samples like:
Android
HTC Streaming player
ipad
and if it is a PC user, I want to get the web browser type.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
有一个名为 httpagentparser 的库:
There is a library called httpagentparser for that:
Werkzeug 内置了一个用户代理解析器。
http://werkzeug。 pocoo.org/docs/quickstart/?highlight=user_agent#header-parsing
Werkzeug has a user agent parser built in.
http://werkzeug.pocoo.org/docs/quickstart/?highlight=user_agent#header-parsing
我要给出的答案不是关于开源项目的,但它确实提供了任何正在研究如何解析 HTTP user-agent 字符串以获得 设备智能 会想了解。
WURFL 是一个历史悠久的工具,用于进行用户代理(以及更一般的 HTTP 请求)分析并获取易于使用的设备/浏览器信息。这是广告技术行业事实上的标准,借助专有数据库,从 HTTP 请求中榨取最后一滴信息。在实践中,代码将类似于:
上面的代码将返回:
更多信息可以找到 此处。
对于那些想要在没有从 ScientiaMobile 获得试用许可证的情况下尝试 WURFL(特别是 PyWURFL)的人,我的公司最近发布了一个 WURFL 版本(称为 WURFL 微服务),可以从 AWS,Azure 和 GCP(当然还有 ScientiaMobile 本身)。此外,该产品完全支持 Pythion,尽管语法略有不同,因为该产品依赖于云中的服务器端组件进行更新:
可以找到完整的示例和对 GitHub 客户端代码的引用 此处。
披露:我在提供此处描述的库的公司工作。
The answer I am about to give is not about an open-source project, but it does provide information that whoever is researching how to parse the HTTP user-agent string to obtain device intelligence will want to know about.
WURFL is a time-honored tool to do User-Agent (and more generally HTTP request) analysis and obtain easily consumable device/browser information. This is the de-facto standard in the Ad Tech industry to squeeze the last drop of information out of HTTP requests, thanks to a proprietary database. In practice, code will look something like:
The code above, would return:
More info can be found here.
For those who want to try WURFL (and PyWURFL specifically) without obtaining a Trial license from the ScientiaMobile, my company has recently released a version of WURFL (called WURFL Microservice) that can be obtained from the major marketplaces of AWS, Azure and GCP (in addition to ScientiaMobile itself of course). Also for that product Pythion is fully supported, albeit the syntax is slightly different as that product relies on a server side component in the Cloud for updates:
Fully-fledged example and reference to GitHub client-code can be found here.
Disclosure: I work for the company that provides the library described here.
您可以尝试使用正则表达式编写自己的: http://docs.python.org/library/重新.html
或者看看这个: http://pypi.python.org/pypi/httpagentparser
You can try to write your own with regular expression : http://docs.python.org/library/re.html
or take a look at this : http://pypi.python.org/pypi/httpagentparser