Squid 代理不提供修改后的 html 内容

发布于 2024-08-26 21:30:09 字数 2541 浏览 4 评论 0原文

我正在尝试使用squid来修改网页请求的页面内容。我按照 upside-down-ternet 教程进行操作,该教程显示有关如何翻转页面上的图像的说明。

我需要更改页面的实际 html。我一直在尝试做与教程中相同的事情,但我没有编辑图像,而是尝试编辑 html 页面。下面是我用来尝试执行此操作的一个 php 脚本。

所有 jpg 图像都会被翻转,但页面上的内容不会被编辑。编写的编辑后的index.html文件包含编辑后的内容,但用户收到的页面不包含编辑后的内容。

#!/usr/bin/php
<?php
$temp = array();
while ( $input = fgets(STDIN) ) {
    $micro_time = microtime();

    // Split the output (space delimited) from squid into an array.
    $temp = split(' ', $input);

    //Flip jpg images, this works correctly
    if (preg_match("/.*\.jpg/i", $temp[0])) {
        system("/usr/bin/wget -q -O /var/www/cache/$micro_time.jpg ". $temp[0]);
        system("/usr/bin/mogrify -flip /var/www/cache/$micro_time.jpg");
        echo "http://127.0.0.1/cache/$micro_time.jpg\n";
    }

    //Don't edit files that are obviously not html. $temp[0] contains url of file to get
    elseif (preg_match("/(jpg|png|gif|css|js|\(|\))/i", $temp[0], $matches)) {
        echo $input;
    }   

    //Otherwise, could be html (e.g. `wget http://www.google.com` downloads index.html)
    else{ 
        $time = time() . microtime();       //For unique directory names
        $time = preg_replace("/ /", "", $time); //Simplify things by removing the spaces
        mkdir("/var/www/cache/". $time);    //Create unique folder
        system("/usr/bin/wget -q --directory-prefix=\"/var/www/cache/$time/\" ". $temp[0]);
        $filename = system("ls /var/www/cache/$time/");     //Get filename of downloaded file

        //File is html, edit the content (this does not work)
        if(preg_match("/.*\.html/", $filename)){

            //Get the html file contents  
            $contentfh = fopen("/var/www/cache/$time/". $filename, 'r');
            $content = fread($contentfh, filesize("/var/www/cache/$time/". $filename));
            fclose($contentfh);

            //Edit the html file contents
            $content = preg_replace("/<\/body>/i", "<!-- content served by proxy --></body>", $content);

            //Write the edited file
            $contentfh = fopen("/var/www/cache/$time/". $filename, 'w');
            fwrite($contentfh, $content);
            fclose($contentfh);

            //Return the edited page
            echo "http://127.0.0.1/cache/$time/$filename\n";
        }               
        //Otherwise file is not html, don't edit
        else{
            echo $input;
        }
    }
}
?>

I'm trying to use squid to modify the page content of web page requests. I followed the upside-down-ternet tutorial which showed instructions for how to flip images on pages.

I need to change the actual html of the page. I've been trying to do the same thing as in the tutorial, but instead of editing the image I'm trying to edit the html page. Below is a php script I'm using to try to do it.

All jpg images get flipped, but the content on the page does not get edited. The edited index.html files written contain the edited content, but the pages the users receive don't contain the edited content.

#!/usr/bin/php
<?php
$temp = array();
while ( $input = fgets(STDIN) ) {
    $micro_time = microtime();

    // Split the output (space delimited) from squid into an array.
    $temp = split(' ', $input);

    //Flip jpg images, this works correctly
    if (preg_match("/.*\.jpg/i", $temp[0])) {
        system("/usr/bin/wget -q -O /var/www/cache/$micro_time.jpg ". $temp[0]);
        system("/usr/bin/mogrify -flip /var/www/cache/$micro_time.jpg");
        echo "http://127.0.0.1/cache/$micro_time.jpg\n";
    }

    //Don't edit files that are obviously not html. $temp[0] contains url of file to get
    elseif (preg_match("/(jpg|png|gif|css|js|\(|\))/i", $temp[0], $matches)) {
        echo $input;
    }   

    //Otherwise, could be html (e.g. `wget http://www.google.com` downloads index.html)
    else{ 
        $time = time() . microtime();       //For unique directory names
        $time = preg_replace("/ /", "", $time); //Simplify things by removing the spaces
        mkdir("/var/www/cache/". $time);    //Create unique folder
        system("/usr/bin/wget -q --directory-prefix=\"/var/www/cache/$time/\" ". $temp[0]);
        $filename = system("ls /var/www/cache/$time/");     //Get filename of downloaded file

        //File is html, edit the content (this does not work)
        if(preg_match("/.*\.html/", $filename)){

            //Get the html file contents  
            $contentfh = fopen("/var/www/cache/$time/". $filename, 'r');
            $content = fread($contentfh, filesize("/var/www/cache/$time/". $filename));
            fclose($contentfh);

            //Edit the html file contents
            $content = preg_replace("/<\/body>/i", "<!-- content served by proxy --></body>", $content);

            //Write the edited file
            $contentfh = fopen("/var/www/cache/$time/". $filename, 'w');
            fwrite($contentfh, $content);
            fclose($contentfh);

            //Return the edited page
            echo "http://127.0.0.1/cache/$time/$filename\n";
        }               
        //Otherwise file is not html, don't edit
        else{
            echo $input;
        }
    }
}
?>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一口甜 2024-09-02 21:30:09

看看 Dansguardian;它使用 PCRE 动态修改内容:链接(看最后2个主题)

Take a look at Dansguardian; it uses PCRE to modify content on the fly: link (look at the last 2 topics)

假装爱人 2024-09-02 21:30:09

不确定这是否是问题的原因,但代码有很多错误。

您可以根据微时间来分隔请求 - 这只有在流量相对较低时才能可靠地工作 - 请注意,如果有多个重定向器实例在运行,原始 (perl) 代码仍可能会中断。

您尝试根据文件扩展名来识别内容类型 - 这适用于与列表匹配的文件 - 但它不遵循与列表不匹配的内容必须是 text/html - 实际上您应该检查源服务器返回的 mimetype。

您在代码中没有错误检查/调试 - 尽管您没有可以轻松写入的错误流,但您可以将错误写入文件、系统日志,或者如果 fopen/ 则发出电子邮件fread 语句不起作用,或者如果存储的文件没有 .html 扩展名。

C.

Not sure if its the cause of your problem, but there's quite a lot wrong with the code.

You seperate requests based on microtime - this will only work reliably if you have relatively low volumes of traffic - note that the original (perl) code may still break if there is more than one instance of the redirector running.

You've tried to identify the content type based on the file extension - this will work for files which match the list - but it doesn't follow that stuff which doesn't match the list must be text/html - really you should check the mimetype returned by the origin server.

You've got no error checking/debugging in the code - although you don't have an error stream you can easily write to, you could write the errors to a file, to the syslog, or fire out an email if the fopen/fread statements don't work, or if the stored file doesn't have a .html extension.

C.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文