如何分离后台HTTP请求
这更多的是一个尝试理解 HTTP 真正工作原理然后实现它的问题。
我需要一个 HTTP 分析器,能够区分主页请求和来自某些 HTTP 日志数据的“后台”请求。这个想法是将用户发出的 HTTP 请求与后台自动发生的请求(宽松地使用这个术语)分开。因此,从我看到的 HTTP 数据的最初印象来看,当我访问任何普通网站时,都会获取一个 text/html 对象,然后获取许多其他对象,如 css、xml、javascript、图像等。
现在的问题是如何在用户主动不生成请求的情况下分离这些“后台”请求。据我所知,这主要是广告获取、重定向和一些基于 Ajax 的东西。
有谁对此有任何想法吗?您可以指点我开始进行此分析的一些经验或资源?
This is more of an issue of trying to understand how HTTP really works and then implementing it.
I need to have a HTTP analyzer that will be able to separate between the main page requests and "background" requests from some HTTP log data. The idea is to separate HTTP requests made by the user from those that happen automatically (loosely using this term) in the background. So, from the first few impressions of the HTTP data that I've seen it seems like when I go to any normal website an text/html object is fetched followed by a lot of other objects like css, xml, javascript, images etc.
Now, the problem is how do I separate these "background" requests where the user is actively not generating the requests. This will mostly be ad fetches, redirections and some Ajax based things from what I know.
Does anyone has any idea with regards to this. Some, experience or may be resources that you could point me to get started with doing this analysis?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
无法区分浏览器因特定用户操作或其他自动化进程而与裸 HTTP 请求生成的请求。浏览器/客户端是唯一拥有此类知识的人,因此您必须将其作为图片的一部分,例如将分析器实现为浏览器插件或将 HTTP 客户端嵌入为分析器本身的一部分。
如果您尝试创建一个通用工具来分析流量负载,那么区分用户直接“点击”和自动请求生成的流量通常没有意义。
There's no way to distinguish which requests were generated by the browser because of specific user actions or because of other automated processes from the bare HTTP requests. The browser/client it the only one that has such knowledge, so that you have to make it part of the picture, e.g. implementing the analyzer as a browser plugin or to embed an HTTP client as part of the analyzer itself.
If you're trying to create a generic tool to analyze traffic load, it's usually not meaningful to distinguish between traffic generated by user's direct "clicks" and automated requests.
没有直接且干净的方法来做到这一点。但是,您可以通过过滤掉对显然不是“用户”请求的文件的请求(例如 *.jpg)来非常接近。此外,您还可以过滤掉非 HTTP/200 响应(例如 301 和 302 重定向)。
尝试按照以下方式进行操作:(
添加换行符以提高可读性)
There's no direct and clean way to do this. However, you can get pretty close by filtering out requests for files that clearly are not "user" requests, like *.jpg. Furthermore, you can filter out what is not a HTTP/200 response (e.g., 301 and 302 redirects).
Try something along the lines of:
(added line breaks for readability)