CFHTTP编码问题

发布于 2024-09-03 20:09:17 字数 1921 浏览 8 评论 0原文

我正在尝试使用 cfhttp 提取一个页面来解析其中的信息。我调用的页眉是：

内容编码：gzip
连接：保持活动状态
内容长度：19066
服务器：IBM_HTTP_Server
变化：接受编码，用户代理
内容语言：en-US
缓存控制：no-cache="set-cookie,
设置cookie2“
内容类型：
text/html;charset=ISO-8859-1

我将字符集设置为 ISO-8859-1 但是我在 FileContent 中得到以下内容（下面仅显示了一个小示例，但我认为它指出了这一点穿过）。

EðÑq·Oã?Ì\ZóL´þ´Vú5ðbä£ÿæ⁄_HÉÒñQãO\Çþãë85ÁÜ à±°ùÖ}&bßý?,u?2SùQyk5g?UÛ3Ñ¸fã×ARíi_iûRã _ òCA¿-ß."b /¯ßíWÝÆ´}w~,°iøÜCáÇþ@àZ5¤ïsÁ8½°ì* ZÜéjOÝK/Ë4§ÈG5×ä*Ø6ÚwÇ0]ã:àÑþéØG"ÅÁl/t° jlá»5¶&̀lìYìºØ'yDð½|#ý<ñìTé%¤ªÆªx¶}«±o9»ë⁄ÆÒï'w8Y?÷ðxsllû 6íqüGÞsÜóÀx·ªk®XºàåZ{íÁ½åo÷mbq¥ÝÝ8M

我尝试了其他字符集，并认为 gzip 编码是导致问题的原因，但我不确定如何测试这是否是问题所在。任何建议或帮助将非常有价值。

下面是我的代码，

<cfhttp 
    METHOD="get"
    throwonerror="yes" 
    CHARSET="ISO-8859-1"
    URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&prMn=1&crSrtFlds=stkTypId-feedSegId-pseudoPrice&rd=100000&zc=44203&PMmt=0-0-0&stkTypId=28881&sf2Dir=ASC&sf1Nm=price&sf2Nm=miles&feedSegId=28705&searchSource=UTILITY&pgId=2102&rpp=10">

    <cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
    <cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >
</cfhttp>

<cfset listings = #cfhttp.FileContent#>
<cfoutput>
    #listings#
</cfoutput>

我也尝试过标头：

    <cfhttpparam type="Header" name="Accept-Encoding" value="*">
    <cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >

并尝试删除“Accept-Encoding”标头并只留下 TE。

更新： 我还没弄清楚，但我发现了一些可能会帮助别人帮助我的东西。当我使用我的测试 php 服务器在同一页面上运行 file_get_contents 并且它工作正常时，如果我运行相同的 cfhttp 代码来调用正在调用该页面的 php 页面，我需要它工作得很好。感谢迄今为止提出的建议。

原文

I am trying to pull a page for parsing information out of it using cfhttp. The page headers that I am calling are:

Content-Encoding: gzip
Connection: Keep-Alive
Content-Length: 19066
Server: IBM_HTTP_Server
Vary: Accept-Encoding, User-Agent
Content-Language: en-US
Cache-Control: no-cache="set-cookie,
set-cookie2"
Content-Type:
text/html;charset=ISO-8859-1

I set the charset to ISO-8859-1 however I am getting the following in the FileContent (only a small sample is shown below but I think it gets to point across).

EðÑq·Oã?·Ì\ZóL¯þ´Vú5ðbä£ÿæ¾_HÉÒñQãO\Çþãë85ÁÜ
à±°ùÖ}&bßý?,u?2SùQyk5g?UÛ3Ñ¸fã×ARÃi_iûRã
_ òCA¿-ß."b /¯ßíWÝÆ´}w~,°iøÜCáÇþ@ÃZ5¤ïsÁ8½°ì* ZÜéjOÝK/Ë4§ÈG5×ä*¬6ÚwÇ0]ã:àÑþé¬G"ÅÁl/t° jlá»5¶&¯lìYìºØ'yDð½|#ý<ñìTé%¾ï¬ùÆªx¶}«±o9»ë¼ÂÆÒï'w8Y?÷ðxsllû
6íqüGÞsÜóÀx·ªk®XºàåZ{íÁ½åo÷mbq¥ÝÃ8M

I tried other charsets and was considering the gzip encoding to be causing the problem but I am unsure how the test if that is the issue. Any suggestions or help would be greatly valued.

Below is my Code

<cfhttp 
    METHOD="get"
    throwonerror="yes" 
    CHARSET="ISO-8859-1"
    URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&prMn=1&crSrtFlds=stkTypId-feedSegId-pseudoPrice&rd=100000&zc=44203&PMmt=0-0-0&stkTypId=28881&sf2Dir=ASC&sf1Nm=price&sf2Nm=miles&feedSegId=28705&searchSource=UTILITY&pgId=2102&rpp=10">

    <cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
    <cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >
</cfhttp>

<cfset listings = #cfhttp.FileContent#>
<cfoutput>
    #listings#
</cfoutput>

I have also tried the headers:

    <cfhttpparam type="Header" name="Accept-Encoding" value="*">
    <cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >

And tried removing the 'Accept-Encoding' header and just leaving the TE.

UPDATE:
I still havn't figured it out, but I found something that might help someone help me out. When I used a test php server of mine to run file_get_contents on the same page and it worked fine, then if I ran the same cfhttp code to call the php page that was calling the page I need it worked just fine. Thanks for the suggestions so far.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

暮光沉寂 2024-09-10 20:09:17

cars.com 的问题似乎是他们将输出压缩两次（基于此线程)

因此，我们需要再次解压内容...

首先，我们需要获取二进制内容，所以CFHTTP调用需要包含

getasbinary="yes"

然后，我们需要解压它。

我们可以使用 java.util.zip 来做到这一点。 Gunzip 是此 cflib.org 函数的修改版本：

<cfhttp
    getasbinary="yes"
    METHOD="get"
    throwonerror="yes"
    CHARSET="ISO-8859-1"
    URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&prMn=1&crSrtFlds=stkTypId-feedSegId-pseudoPrice&rd=100000&zc=44203&PMmt=0-0-0&stkTypId=28881&sf2Dir=ASC&sf1Nm=price&sf2Nm=miles&feedSegId=28705&searchSource=UTILITY&pgId=2102&rpp=10" >

    <cfhttpparam type="Header" name="Accept" value="application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5">
    <cfhttpparam type="Header" name="User-Agent" value="Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41">
    <cfhttpparam type="Header" name="Accept-Encoding" value="deflate">
    <cfhttpparam type="Header" name="TE" value="deflate, chunked, identity, trailers">

</cfhttp>

<cfset unzippedHTML = gunzip(cfhttp.FileContent)>

<cfoutput>
    #unzippedHTML#
</cfoutput>

<cfscript>

    function gunzip(inBytes) {
        var gzInStream = createObject('java','java.util.zip.GZIPInputStream');
        var outStream = createObject('java','java.io.ByteArrayOutputStream');
        var inStream = createObject('java','java.io.ByteArrayInputStream');
        var buffer = repeatString(" ",1024).getBytes();
        var length = 0;
        var rv = "";

        try {
            inStream.init(inBytes);
            gzInStream.init(inStream);
            outStream.init();
            do {
                length = gzInStream.read(buffer,0,1024);
                if (length neq -1) outStream.write(buffer,0,length);
            } while (length neq -1);
            rv = outStream.toString();
            outStream.close();
            gzInStream.close();
            inStream.close();
        }
        catch (any e) {
            rv = "";
            try {
                outStream.close();
            } catch (any e) { }
                try {
                    gzInStream.close();
                } catch (any e) {
                    try {
                        inStream.close();
                    } catch (any e) {}
                }
        }
        return rv;
    }
</cfscript>

请务必仔细检查 var 作用域的功能。我可能错过了什么。

The issue with cars.com seems to be that they're gzipping the output twice (based on this thread)

So, we need to unzip the content... again...

First, we need to get the content as binary, so the CFHTTP call needs to include

getasbinary="yes"

Then, we need to unzip it.

We can use java.util.zip to do it. The gunzip is a modified version of this cflib.org function:

<cfhttp
    getasbinary="yes"
    METHOD="get"
    throwonerror="yes"
    CHARSET="ISO-8859-1"
    URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&prMn=1&crSrtFlds=stkTypId-feedSegId-pseudoPrice&rd=100000&zc=44203&PMmt=0-0-0&stkTypId=28881&sf2Dir=ASC&sf1Nm=price&sf2Nm=miles&feedSegId=28705&searchSource=UTILITY&pgId=2102&rpp=10" >

    <cfhttpparam type="Header" name="Accept" value="application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5">
    <cfhttpparam type="Header" name="User-Agent" value="Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41">
    <cfhttpparam type="Header" name="Accept-Encoding" value="deflate">
    <cfhttpparam type="Header" name="TE" value="deflate, chunked, identity, trailers">

</cfhttp>

<cfset unzippedHTML = gunzip(cfhttp.FileContent)>

<cfoutput>
    #unzippedHTML#
</cfoutput>

<cfscript>

    function gunzip(inBytes) {
        var gzInStream = createObject('java','java.util.zip.GZIPInputStream');
        var outStream = createObject('java','java.io.ByteArrayOutputStream');
        var inStream = createObject('java','java.io.ByteArrayInputStream');
        var buffer = repeatString(" ",1024).getBytes();
        var length = 0;
        var rv = "";

        try {
            inStream.init(inBytes);
            gzInStream.init(inStream);
            outStream.init();
            do {
                length = gzInStream.read(buffer,0,1024);
                if (length neq -1) outStream.write(buffer,0,length);
            } while (length neq -1);
            rv = outStream.toString();
            outStream.close();
            gzInStream.close();
            inStream.close();
        }
        catch (any e) {
            rv = "";
            try {
                outStream.close();
            } catch (any e) { }
                try {
                    gzInStream.close();
                } catch (any e) {
                    try {
                        inStream.close();
                    } catch (any e) {}
                }
        }
        return rv;
    }
</cfscript>

Be sure to double-check the var scoping of the function. I might have missed something.

回复收藏 0 原文

无远思近则忧 2024-09-10 20:09:17

根据标题，您看到的是文件的 gzip 内容。它需要先解压缩，然后才能对您有用。我假设您可以使用 cfzip 来完成此操作，但没有任何经验。

这篇文章似乎表示您可以在请求中添加标头，以便在返回之前将其解压缩/压缩：

<cfhttp ...>
    <cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
    <cfhttpparam type="Header" name="TE" value="deflate;q=0">
</cfhttp>

Per the header what you are seeing is the gzipped contents of the file. It will need to be uncompressed before it is useful to you. I assume you can do this with cfzip but have not had any experience doing it.

This post seems to indicate that you can add a header in your request to have it unzipped/deflated before being returned:

<cfhttp ...>
    <cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
    <cfhttpparam type="Header" name="TE" value="deflate;q=0">
</cfhttp>

回复收藏 0 原文