从网站读取数据的问题
我编写了一个应用程序,用于解析机场站点板上的数据 - 谢列梅捷沃 (http://svo.aero/timetable/today/) 和多莫杰多沃 (http://www.domodedovo.ru/ru/main/airindicator/flightnew /)。
我使用该站点的链接创建一个对象 URL。接下来,我调用方法 openStream。之后,该流与解析器 HTMLEditorKit 一起使用。
在多莫杰多沃,它运行得很好,但在谢列梅捷沃机场,程序表现得很奇怪。首先,应用程序的一定次数的启动失败 - 流包含未知编码的符号,并且文本的长度明显小于实际页面的内容。然后意想不到的事情发生了——几次成功的运行,返回了所需的数据——然后又出现了一系列的挫折。这取决于什么?我无法追踪这一点。
我尝试通过 URLConnection 发送特定的 http 标头,希望其中的整个内容,就像在浏览器中一样,一切都可以正常打开。但这没有帮助。情况没有改变。
问题可能出在什么地方?
PS 我的英语不是很好,所以很抱歉。
I write an application, that parse data from boards on the airport sites - Sheremetyevo (http://svo.aero/timetable/today/) and Domodedovo (http://www.domodedovo.ru/ru/main/airindicator/flightnew/).
I create an object URL with the link of the site. Next, I call the method openStream. After that this stream is used with the parser HTMLEditorKit.
With Domodedovo it works pretty good, but with the Sheremetyevo airport program behaves strangely. First, a certain number of launches of the application fails - a stream contains symbols in an unknown encoding, and the length of the text is clearly smaller than content of actual page. Then the unexpected happens - a few successful runs, returning the desired data - then again, a series of setbacks. What does it depend? I can't trace this.
I tried to send specific http-headers through URLConnection, hoping that the whole thing in them, as in the browser everything opens fine. But it did not help. The situation has not changed.
In what may be the problem?
P.S. My English isn't very good, so sorry.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题解决了。
所有这一切都是由于服务器在将数据发送到应用程序时对数据进行了压缩。
来自服务器的http标头:
因此,如果您使用GZIPInputStream,则可以读取数据。
可能有人发现所有这些信息对自己有帮助。
Problem solved.
All of this was due to the fact that the server is compresses the data when sending it to the application.
The http header that came from server:
So, it can be possible to read data if you'll use GZIPInputStream.
May be someone finds all this information helpful for himself.