在 Cocoa 模拟浏览器中发出 HTTP 请求
因此,我尝试读取 HTML 文件的内容,以从特定网站上抓取一些元数据。
然而,我遇到的问题是,使用 cocoa 库调用在 Objective-C 中执行 HTTP 请求会给我一个不同的 HTML 文件,然后当我通过 Web 浏览器或我实现的 python 调用执行调用时。
这很烦人的原因是我正在抓取每个请求生成的密钥。该网站似乎知道我何时通过 cocoa 而不是从 python 库或浏览器执行请求。
任何人都可以阐明这一点吗?
这是我执行的以下 python 代码:
self.url = 'http://sample-site.com/1?ax=1ts=123123.12'
request = urllib2.Request(complete_url)
response = urllib2.urlopen(request)
html_data = response.read()
这是我尝试过的以下可可尝试:
NSString *completeUrl = [url stringByAppendingFormat:@"//%d?ax=1&ts=%1.2f ", pageNumber, time];
另一次尝试:
NSMutableURLRequest* request = [[[NSMutableURLRequest alloc] initWithURL:hypeURL] autorelease]; [请求setValue:userAgent forHTTPHeaderField:@"User-Agent"]; NSURLResponse* 响应 = nil; NSError* 错误 = nil; NSData* data = [NSURLConnection sendSynchronousRequest:请求返回响应:&响应错误:&错误]; NSString *hypeHTML = [[NSString alloc] initWithData:数据编码:NSASCIIStringEncoding];
cocoa 中的尝试正在检索 HTML,但是 HTML 包含我解析的键值,这些值是每次刷新生成的。然而,当使用 cocoa 执行请求时,每次执行应用程序时键值不会改变(HTML 中的键相同),而在 Python 中,HTML 正确地为每个请求具有不同的键。
谢谢
So I am trying to read the contents of an HTML file to scrape some metadata off of a particular website.
The issue I am running into however is that performing the HTTP requests in objective-c using the cocoa library calls gives me a different HTML file then when I perform the call via a web browser or my implemented python call.
The reason why this is annoying, is that I am scraping a key that is generated on every request. The site seems to know when I performing the request via cocoa instead of from the python library or from the browser.
Can anyone shed any light on this?
Here is the following python code I perform:
self.url = 'http://sample-site.com/1?ax=1ts=123123.12'
request = urllib2.Request(complete_url)
response = urllib2.urlopen(request)
html_data = response.read()
Here is the following cocoa attempts I've tried:
NSString * completeUrl = [url stringByAppendingFormat:@"//%d?ax=1&ts=%1.2f", pageNumber, time];
Another attempt:
NSMutableURLRequest* request = [[[NSMutableURLRequest alloc] initWithURL:hypeURL] autorelease]; [request setValue:userAgent forHTTPHeaderField:@"User-Agent"]; NSURLResponse* response = nil; NSError* error = nil; NSData* data = [NSURLConnection sendSynchronousRequest:request returningResponse:&response error:&error]; NSString *hypeHTML = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
The attempts in cocoa are retrieving the HTML however the HTML contains key values which I parse which are generated each refresh. When performing the requests using cocoa however the key values do not change upon each execution of the application (the same key is in the HTML) where in the Python, the HTML correctly has different keys for each request.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该网站可能会检测用户代理并根据它返回不同的内容。
只需更改请求标头中的用户代理:
使用此代码,服务器会认为您在 Linux 上运行 Firefox :)
获取特定浏览器的当前用户代理/查找用户代理:
http://www.useragentstring.com/
The website probably detects the user-agent and returns different content based on it.
Simply change the user-agent in the header of your request:
With this code, the server thinks you're running Firefox on Linux :)
Get current user-agent / lookup user-agents for specific browsers:
http://www.useragentstring.com/