从一堆http数据包中获取单个网站url?
我是网络编程的新手,所以请原谅我的任何错误。
我正在编写一个简单的嗅探器,它应该只检测用户请求的网站的 URL。我正在使用 pcap.net,并且能够捕获 http 数据包(使用 tcp 端口 80 过滤器)并从中检索数据。我不能做的是获取请求的单个 URI,这会导致许多 http 数据包到来。
例如, 1. 用户(从浏览器)请求 www.website.com 2. 许多http响应到来,其中之一是www.website.com的text/html 3. www.website.com 包含来自其他html 页面的资源,因此来自其他主机的许多其他数据包都会到来。
有没有办法忽略来自资源的数据包?我必须重建一些 tcp 会话吗?我已经在谷歌上搜索了两天,但找不到任何有用的东西,所以请帮忙。
I'm newbie in network programming so please forgive me any mistakes.
I'm writing a simple sniffer, which should detect just URLs of websites requested by the user. I'm using pcap.net and I'm able to capture http packets (with tcp port 80 filter) and retrieve data from them. What I can't do is getting a single URI for the request which caused many http packets to come.
For example,
1. a user requests (from a browser) www.website.com
2. many http responses come, one of which is text/html for www.website.com
3. www.website.com contains resources from other html pages, so many other packets from other hosts are coming.
Is there a way to ignore the packets from the resources? Do I have to make some tcp session reconstruction? I've been googling for 2 days but couldn't find anything useful, so please help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
来自其他主机的 HTTP 响应可以被识别,因为它们可能来自不同的 IP,而不是请求发送到的 IP。
即使没有完整的 TCP 重建,您也可以通过查看 IP 和 TCP 端口来匹配 HTTP 请求和响应。
但是,如果同一 TCP 会话中有多个 HTTP 请求,则需要进行 TCP 重建以区分不同的请求和响应。
The HTTP responses from other hosts can be identified since they would probably come from different IPs, and not the IP that the request was sent to.
You can match HTTP requests and responses even without full TCP reconstruction by just looking at the IPs and TCP ports.
However, if you have multiple HTTP requests in the same TCP session, you will need to do TCP reconstruction to separate between the different requests and responses.