CFHTTP编码问题
我正在尝试使用 cfhttp 提取一个页面来解析其中的信息。我调用的页眉是:
内容编码:gzip
连接:保持活动状态
内容长度:19066
服务器:IBM_HTTP_Server
变化:接受编码,用户代理
内容语言:en-US
缓存控制:no-cache="set-cookie,
设置cookie2“
内容类型:
text/html;charset=ISO-8859-1
我将字符集设置为 ISO-8859-1 但是我在 FileContent 中得到以下内容(下面仅显示了一个小示例,但我认为它指出了这一点穿过)。
EðÑq·Oã?Ì\ZóL´þ´Vú5ðbä£ÿæ⁄_HÉÒñQãO\Çþãë85ÁÜ à±°ùÖ}&bßý?,u?2SùQyk5g?UÛ3Ѹfã×ARíi_iûRã _ òCA¿-ß.
"b /¯ßíWÝÆ´}w~,°iøÜCáÇþ@àZ5¤ïsÁ8½°ì* ZÜéjOÝK/Ë4§ÈG5×ä*Ø6ÚwÇ0]ã:àÑþéØG"ÅÁl/t° jlá»5¶&̀lìYìºØ'yDð½|#ý<ñìTé%¤ªÆªx¶}«±o9»ë⁄ÆÒï'w8Y?
÷ðxsllû 6íqüGÞsÜóÀx·ªk®XºàåZ{íÁ½åo÷mbq¥ÝÝ8M
我尝试了其他字符集,并认为 gzip 编码是导致问题的原因,但我不确定如何测试这是否是问题所在。任何建议或帮助将非常有价值。
下面是我的代码,
<cfhttp
METHOD="get"
throwonerror="yes"
CHARSET="ISO-8859-1"
URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&prMn=1&crSrtFlds=stkTypId-feedSegId-pseudoPrice&rd=100000&zc=44203&PMmt=0-0-0&stkTypId=28881&sf2Dir=ASC&sf1Nm=price&sf2Nm=miles&feedSegId=28705&searchSource=UTILITY&pgId=2102&rpp=10">
<cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
<cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >
</cfhttp>
<cfset listings = #cfhttp.FileContent#>
<cfoutput>
#listings#
</cfoutput>
我也尝试过标头:
<cfhttpparam type="Header" name="Accept-Encoding" value="*">
<cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >
并尝试删除“Accept-Encoding”标头并只留下 TE。
更新: 我还没弄清楚,但我发现了一些可能会帮助别人帮助我的东西。当我使用我的测试 php 服务器在同一页面上运行 file_get_contents 并且它工作正常时,如果我运行相同的 cfhttp 代码来调用正在调用该页面的 php 页面,我需要它工作得很好。感谢迄今为止提出的建议。
I am trying to pull a page for parsing information out of it using cfhttp. The page headers that I am calling are:
Content-Encoding: gzip
Connection: Keep-Alive
Content-Length: 19066
Server: IBM_HTTP_Server
Vary: Accept-Encoding, User-Agent
Content-Language: en-US
Cache-Control: no-cache="set-cookie,
set-cookie2"
Content-Type:
text/html;charset=ISO-8859-1
I set the charset to ISO-8859-1 however I am getting the following in the FileContent (only a small sample is shown below but I think it gets to point across).
EðÑq·Oã?·Ì\ZóL¯þ´Vú5ðbä£ÿæ¾_HÉÒñQãO\Çþãë85ÁÜ
à±°ùÖ}&bßý?,u?2SùQyk5g?UÛ3Ѹfã×ARÃi_iûRã
_ òCA¿-ß."b /¯ßíWÝÆ´}w~,°iøÜCáÇþ@ÃZ5¤ïsÁ8½°ì*
÷ðxsllû
ZÜéjOÝK/Ë4§ÈG5×ä*¬6ÚwÇ0]ã:àÑþé¬G"ÅÁl/t°
jlá»5¶&¯lìYìºØ'yDð½|#ý<ñìTé%¾ï¬ùƪx¶}«±o9»ë¼ÂÆÒï'w8Y?
6íqüGÞsÜóÀx·ªk®XºàåZ{íÁ½åo÷mbq¥ÝÃ8M
I tried other charsets and was considering the gzip encoding to be causing the problem but I am unsure how the test if that is the issue. Any suggestions or help would be greatly valued.
Below is my Code
<cfhttp
METHOD="get"
throwonerror="yes"
CHARSET="ISO-8859-1"
URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&prMn=1&crSrtFlds=stkTypId-feedSegId-pseudoPrice&rd=100000&zc=44203&PMmt=0-0-0&stkTypId=28881&sf2Dir=ASC&sf1Nm=price&sf2Nm=miles&feedSegId=28705&searchSource=UTILITY&pgId=2102&rpp=10">
<cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
<cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >
</cfhttp>
<cfset listings = #cfhttp.FileContent#>
<cfoutput>
#listings#
</cfoutput>
I have also tried the headers:
<cfhttpparam type="Header" name="Accept-Encoding" value="*">
<cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >
And tried removing the 'Accept-Encoding' header and just leaving the TE.
UPDATE:
I still havn't figured it out, but I found something that might help someone help me out. When I used a test php server of mine to run file_get_contents on the same page and it worked fine, then if I ran the same cfhttp code to call the php page that was calling the page I need it worked just fine. Thanks for the suggestions so far.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
cars.com 的问题似乎是他们将输出压缩两次(基于 此线程)
因此,我们需要再次解压内容...
首先,我们需要获取二进制内容,所以CFHTTP调用需要包含
然后,我们需要解压它。
我们可以使用 java.util.zip 来做到这一点。 Gunzip 是此 cflib.org 函数的修改版本:
请务必仔细检查 var 作用域的功能。我可能错过了什么。
The issue with cars.com seems to be that they're gzipping the output twice (based on this thread)
So, we need to unzip the content... again...
First, we need to get the content as binary, so the CFHTTP call needs to include
Then, we need to unzip it.
We can use java.util.zip to do it. The gunzip is a modified version of this cflib.org function:
Be sure to double-check the var scoping of the function. I might have missed something.
根据标题,您看到的是文件的 gzip 内容。它需要先解压缩,然后才能对您有用。我假设您可以使用 cfzip 来完成此操作,但没有任何经验。
这篇文章似乎表示您可以在请求中添加标头,以便在返回之前将其解压缩/压缩:
Per the header what you are seeing is the gzipped contents of the file. It will need to be uncompressed before it is useful to you. I assume you can do this with cfzip but have not had any experience doing it.
This post seems to indicate that you can add a header in your request to have it unzipped/deflated before being returned:
我要做的第一件事是通过在其他页面上尝试相同的代码来确保问题不是源内容/服务器。如果它们工作正常,那么它可能是您尝试使用的服务器/内容。如果他们有同样的问题,那么问题就出在你的代码中。如果您发布代码也会很有帮助。
The first thing I would do is make sure that it's not the source content/server that's the problem by trying your same code against other pages. If they work fine, then it's likely the server/content that you're trying to consume. If they have the same problem, then the issue is in your code. It would also be helpful if you posted your code.