提取URL的前几行文本数据,没有get ting整个页面数据?
用例:需要通过检查前几行中的create_date字段来检查URL的JSON数据是否已更新。整个页面的JSON数据很大,我不想检索整个页面只是为了检查前几行。
当前,对于
x=feedparser.parse(url)
y=requests.get(url).text
#y.split("\n") etc..
整个URL数据,都将检索然后解析。
我想做一些下一个(URL)或仅读取前10行(块)。检查“ create_date”字段并退出。
什么可以用来解决这个问题?感谢您的知识&对URL的NOOB Q
示例表示歉意 - > https://wwwwww.w3schools.com/xml/xml/xml/xml/plant_catalog.xml
Use case: Need to check if JSON data from a url has been updated by checking it's created_date field which lies in the first few lines. The entire page's JSON data is huge and i don't want to retrieve the entire page just to check the first few lines.
Currently, For both
x=feedparser.parse(url)
y=requests.get(url).text
#y.split("\n") etc..
the entire url data is retrieved and then parsed.
I want to do some sort of next(url) or reading only first 10 lines (chunks).. thus not sending request for entire page's data...i.e just scroll & check 'created_date' field and exit.
What can be utilized to solve this? Thanks for your knowledge & Apologies for the noob q
Example of URL -> https://www.w3schools.com/xml/plant_catalog.xml
I want to stop reading the entire URL data if the first PLANT object's LIGHT tag hadn't changed from 'Mostly Shady' (without needing to read/get the data below)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
下面说明的原始海报工作:
而不是获得get请求,可以尝试头部请求:
“ get方法请求是指定资源的表示。使用Get Get的请求仅应检索数据。头方法要求与GET请求相同的响应,但没有响应主体。”
这样,您就无需要求整个JSON,因此将加快服务器端部件的速度,并且对托管服务器更加友好!
Original poster stated below solution worked:
Instead of GET request, one can try HEAD request:
"The GET method requests a representation of the specified resource. Requests using GET should only retrieve data. The HEAD method asks for a response identical to a GET request, but without the response body."
This way, you don't need to request entire JSON, and will therefore speed up the server side part, as well as be more friendly to the hosting server!