解析 http Web 响应中的相关请求
我想模拟 WebTestRequest 类(在 Visual Studio 的测试工具框架中)的行为,它可以根据从原始请求获取的响应中引用的资源来调用依赖请求。
例如,如果我发出一个 Web 请求并通过执行以下操作获取响应:
string url = "http://www.mysite.com";
WebRequest request = WebRequest.Create(url);
using (WebResponse response = request.GetResponse())
{
StreamReader reader = new StreamReader(response.GetResponseStream());
string responseText = reader.ReadToEnd();
}
我希望能够解析 responseText
并查看是否有任何对其他资源的请求(例如 js/css 文件,图像等)
有一个简单的方法可以做到这一点吗?我犹豫是否要手动执行此操作,因为某些资源请求可能是以编程方式设置的,并且在简单的文本解析中可能并不明显。
I want to simulate the behaviour of the WebTestRequest class (in Visual Studio's Test Tools framework) where it can invoke dependent requests based on resources that are referred to in the response that is obtained from the original request.
For example, if I issue a web request and get the response by doing this:
string url = "http://www.mysite.com";
WebRequest request = WebRequest.Create(url);
using (WebResponse response = request.GetResponse())
{
StreamReader reader = new StreamReader(response.GetResponseStream());
string responseText = reader.ReadToEnd();
}
I would like to be able to parse responseText
and see if there are any requests to other resources (like js/css files, images, etc.)
Is there an easy way of doing this? I hesitate to manually do this, as some of the resource requests may be set up programmatically and may not be obvious on a straightforward text parse.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用 html/sgml 解析器库。我不熟悉 Visual Studio,但是有一些用于解析 html 的框架。找到一个并在 API 中查找与查找元素相关的内容。
Use a html/sgml parser library. I'm not familiar with Visual Studio, but there are frameworks for parsing html out there. Find one and look in the API for something related to finding elements.
我相当确定 WebTestRequest 本身只执行“直接文本解析”来确定依赖的请求,因为它不了解 javascript。因此,如果您要实现这样的功能,那么您的代码将准确地模拟该行为。
以下是我粗略浏览 HMTL 4 规范时可以找到的所有元素的列表,这些元素可以引用其他资源,因此需要进行解析:
<; area href=
不确定它是否详尽。
顺便说一句,我很好奇你最后做了什么。
编辑:
事实上,在某些时候通过解析 html 响应来确定依赖请求是不可能的,我会给你一个例子:任何开发的东西与谷歌网络工具包。在我最近测试的一个 GWT 应用程序中,基本上没有可解析的 html —— 一切都是从 javascript 运行的。提取明显的路径名(如果可用)甚至没有用,因为实际上条件逻辑选择的是某些依赖项而不是其他依赖项。
I'm reasonably certain that WebTestRequest itself only does a "straightforward text parse" to determine the dependent requests, since it has no awareness of javascript. So if you were to implement such then your code would be accurately simulating the behaviour.
The following is a list of all the elements I could find in a cursory glance of the HMTL 4 spec that can refer to additional resources and thus would need to be parsed:
<link href=
<img src=
<script src=
<iframe src=
<object data=
<area href=
Not sure if it's exhaustive or not.
By the way, I'm curious as to what you ended up doing in the end.
EDIT:
It in fact becomes impossible at some point to determine the dependent requests from parsing an html response, and I'll give you an example: anything developed with Google Web Toolkit. In a recent GWT app that I tested, there was essentially no parsable html -- everything is run from javascript. Extracting obvious path names (when available) wasn't even useful because in reality conditional logic was selecting certain dependents and not others.