在 PHP 中转义 Python 字符串的最佳方法是什么?

发布于 2024-07-07 07:48:56 字数 586 浏览 15 评论 0原文

我有一个 PHP 应用程序,需要输出一个 python 脚本,更具体地说是一堆变量赋值语句,例如。

subject_prefix = 'This String From User Input'
msg_footer = """This one too."""

需要写入subject_prefix等内容来接受用户输入; 因此,我需要转义字符串的内容。 写出像下面这样的东西并不能解决问题; 一旦有人使用引用或换行符或任何其他我不知道可能危险的东西,我们就会被塞满:

echo "subject_prefix = '".$subject_prefix."'\n";

所以。 有任何想法吗?

(由于时间限制,不可能用 Python 重写应用程序。:P)

几年后编辑:

这是为了 Web 应用程序(用 PHP 编写)和 Mailman(用 Python 编写)之间的集成)。 我无法修改后者的安装,因此我需要想出一种用其语言进行对话的方法来管理其配置。

这也是一个非常的坏主意。

I have a PHP application which needs to output a python script, more specifically a bunch of variable assignment statements, eg.

subject_prefix = 'This String From User Input'
msg_footer = """This one too."""

The contents of subject_prefix et al need to be written to take user input; as such, I need to escape the contents of the strings. Writing something like the following isn't going to cut it; we're stuffed as soon as someone uses a quote or newline or anything else that I'm not aware of that could be hazardous:

echo "subject_prefix = '".$subject_prefix."'\n";

So. Any ideas?

(Rewriting the app in Python isn't possible due to time constraints. :P )

Edit, years later:

This was for integration between a web-app (written in PHP) and Mailman (written in Python). I couldn't modify the install of the latter, so I needed to come up with a way to talk in its language to manage its configuration.

This was also a really bad idea.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

只是一片海 2024-07-14 07:48:56

不要尝试用 PHP 编写此函数。 您将不可避免地出错,并且您的应用程序将不可避免地受到任意远程执行攻击。

首先,考虑一下您实际要解决的问题是什么。 我假设您只是想将数据从 PHP 获取到 Python。 您可能会尝试编写 .ini 文件而不是 .py 文件。 Python 有一个优秀的 ini 语法解析器,ConfigParser。 您可以在 PHP 中编写明显且可能不正确的引用函数,如果(阅读:当)您写错了,不会发生任何严重的情况。

您还可以编写 XML 文件。 PHP 和 Python 的 XML 解析器和发射器太多了,我什至无法在此处列出。

如果我真的无法让您相信这是一个非常非常糟糕的想法,那么您至少可以使用Python已有的函数来做这样的事情:repr()

这里有一个方便的 PHP 函数,它将运行一个 Python 脚本来为您执行此操作:

<?php

function py_escape($input) {
    $descriptorspec = array(
        0 => array("pipe", "r"),
        1 => array("pipe", "w")
        );
    $process = proc_open(
        "python -c 'import sys; sys.stdout.write(repr(sys.stdin.read()))'",
        $descriptorspec, $pipes);
    fwrite($pipes[0], $input);
    fclose($pipes[0]);
    $chunk_size = 8192;
    $escaped = fread($pipes[1], $chunk_size);
    if (strlen($escaped) == $chunk_size) {
        // This is important for security.
        die("That string's too big.\n");
    }
    proc_close($process);
    return $escaped;
}

// Example usage:
$x = "string \rfull \nof\t crappy stuff";
print py_escape($x);

chunk_size 检查旨在防止攻击,使您的输入最终成为两个非常长的字符串,看起来像 ("hello " + ("." * chunk_size))'; os.system("做坏事") 分别。 现在,这种天真的攻击不会完全起作用,因为 Python 不会让单引号字符串在行中间结束,并且 system() 调用中的那些引号本身将被引用,但是如果攻击者设法将续行符(“\”)放入正确的位置并使用类似 os.system(map(chr, ...)) 的内容,那么他​​们就可以注入一些代码将会运行。

我选择简单地读取一大块,如果有更多输出就放弃,而不是继续读取和累积,因为Python源文件行长度也有限制; 据我所知,这可能是另一个攻击媒介。 Python 并不是为了防止任意人在您的系统上编写任意源代码而设计的,因此该区域不太可能被审计。

事实上,我必须为这个简单的例子考虑所有这些,这只是为什么你不应该使用 python 源代码作为数据交换格式的另一个例子。

Do not try write this function in PHP. You will inevitably get it wrong and your application will inevitably have an arbitrary remote execution exploit.

First, consider what problem you are actually solving. I presume you are just trying to get data from PHP to Python. You might try to write a .ini file rather than a .py file. Python has an excellent ini syntax parser, ConfigParser. You can write the obvious, and potentially incorrect, quoting function in PHP and nothing serious will happen if (read: when) you get it wrong.

You could also write an XML file. There are too many XML parsers and emitters for PHP and Python for me to even list here.

If I really can't convince you that this is a terrible, terrible idea, then you can at least use the pre-existing function that Python has for doing such a thing: repr().

Here's a handy PHP function which will run a Python script to do this for you:

<?php

function py_escape($input) {
    $descriptorspec = array(
        0 => array("pipe", "r"),
        1 => array("pipe", "w")
        );
    $process = proc_open(
        "python -c 'import sys; sys.stdout.write(repr(sys.stdin.read()))'",
        $descriptorspec, $pipes);
    fwrite($pipes[0], $input);
    fclose($pipes[0]);
    $chunk_size = 8192;
    $escaped = fread($pipes[1], $chunk_size);
    if (strlen($escaped) == $chunk_size) {
        // This is important for security.
        die("That string's too big.\n");
    }
    proc_close($process);
    return $escaped;
}

// Example usage:
$x = "string \rfull \nof\t crappy stuff";
print py_escape($x);

The chunk_size check is intended to prevent an attack whereby your input ends up being two really long strings, which look like ("hello " + ("." * chunk_size)) and '; os.system("do bad stuff") respectively. Now, that naive attack won't work exactly, because Python won't let a single-quoted string end in the middle of a line, and those quotes in the system() call will themselves be quoted, but if the attacker manages to get a line continuation ("\") into the right place and use something like os.system(map(chr, ...)) then they can inject some code that will run.

I opted to simply read one chunk and give up if there was more output, rather than continuing to read and accumulate, because there are also limits on Python source file line length; for all I know, that could be another attack vector. Python is not intended to be secure against arbitrary people writing arbitrary source code on your system so this area is unlikely to be audited.

The fact that I had to think of all this for this trivial example is just another example of why you shouldn't use python source code as a data interchange format.

情场扛把子 2024-07-14 07:48:56

我首先标准化我在 python 中使用的字符串类型,以使用三引号字符串 (""")。这应该减少输入中杂散引号引起的问题的发生。您仍然需要将其转义当然,但它应该减少我所关心的问题的数量,

这在某种程度上取决于我担心被滑入的内容,以及它们再次打印出来的上下文。如果您只是担心引号引起问题,您可以简单地检查“””的出现情况并转义它们。 另一方面,如果我担心输入本身是恶意的(并且它是用户输入,所以你可能应该这样做),那么我会查看诸如 strip_tags() 或其他类似函数之类的选项。

I'd start by standardizing the string type I was using in python, to use triple-quoted strings ("""). This should reduce the incidents of problems from stray quotes in the input. You'll still need to escape it of course, but it should reduce the number of issues that are a concern.

What I did to escape the strings would somewhat depend on what I'm worried about getting slipped in, and the context that they are getting printed out again. If you're just worried about quotes causing problems, you could simply check for and occurrences of """ and escape them. On the other hand if I was worried about the input itself being malicious (and it's user input, so you probably should), then I would look at options like strip_tags() or other similar functions.

我为君王 2024-07-14 07:48:56

另一种选择可能是将数据导出为数组或对象导出为 JSON 字符串,并稍微修改 python 代码以处理新输入。 虽然通过 JSON 进行转义并不是 100% 安全,但它仍然比我们自己的转义例程更好。

如果 JSON 字符串格式错误,您将能够处理错误。

Python 有一个用于编码和解码 JSON 的包: python-json 3.4

Another option may be to export the data as array or object as JSON string and modify the python code slightly to handle the new input. While the escaping via JSON is not 100% bulletproof it will be still better than own escaping routines.

And you'll be able to handle errors if the JSON string is malformatted.

There's a package for Python to encode and decode JSON: python-json 3.4

狂之美人 2024-07-14 07:48:56

我需要对此进行编码以转义“ntriples”格式的字符串,该格式使用 python 转义

以下函数采用 utf-8 字符串并返回转义为 python(或 ntriples 格式)的字符串。
如果给出非法的 utf-8 数据,它可能会做奇怪的事情。 它不理解 xFFFF 之后的 Unicode 字符。 它(当前)不会将字符串括在双引号中。

uniord 函数来自 php.net 上的评论。

function python_string_escape( $string ) {
    $string = preg_replace( "/\\\\/", "\\\\", $string ); # \\ (first to avoid string re-escaping)
    $string = preg_replace( "/\n/", "\\n", $string ); # \n
    $string = preg_replace( "/\r/", "\\r", $string ); # \r 
    $string = preg_replace( "/\t/", "\\t", $string ); # \t 
    $string = preg_replace( "/\"/", "\\\"", $string ); # \"
    $string = preg_replace( "/([\x{00}-\x{1F}]|[\x{7F}-\x{FFFF}])/ue",
                            "sprintf(\"\\u%04X\",uniord(\"$1\"))",
                            $string );
    return $string;
}

function uniord($c) {
    $h = ord($c{0});
    if ($h <= 0x7F) {
        return $h;
    } else if ($h < 0xC2) {
        return false;
    } else if ($h <= 0xDF) {
        return ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
    } else if ($h <= 0xEF) {
        return ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6 | (ord($c{2}) & 0x3F);
    } else if ($h <= 0xF4) {
        return ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12 | (ord($c{2}) & 0x3F) << 6 | (ord($c{3}) & 0x3F);
    } else {
        return false;
    }
}

I needed to code this to escape a string in the "ntriples" format, which uses python escaping.

The following function takes a utf-8 string and returns it escaped for python (or ntriples format).
It may do odd things if given illegal utf-8 data. It doesn't understand about Unicode characters past xFFFF. It does not (currently) wrap the string in double quotes.

The uniord function comes from a comment on php.net.

function python_string_escape( $string ) {
    $string = preg_replace( "/\\\\/", "\\\\", $string ); # \\ (first to avoid string re-escaping)
    $string = preg_replace( "/\n/", "\\n", $string ); # \n
    $string = preg_replace( "/\r/", "\\r", $string ); # \r 
    $string = preg_replace( "/\t/", "\\t", $string ); # \t 
    $string = preg_replace( "/\"/", "\\\"", $string ); # \"
    $string = preg_replace( "/([\x{00}-\x{1F}]|[\x{7F}-\x{FFFF}])/ue",
                            "sprintf(\"\\u%04X\",uniord(\"$1\"))",
                            $string );
    return $string;
}

function uniord($c) {
    $h = ord($c{0});
    if ($h <= 0x7F) {
        return $h;
    } else if ($h < 0xC2) {
        return false;
    } else if ($h <= 0xDF) {
        return ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
    } else if ($h <= 0xEF) {
        return ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6 | (ord($c{2}) & 0x3F);
    } else if ($h <= 0xF4) {
        return ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12 | (ord($c{2}) & 0x3F) << 6 | (ord($c{3}) & 0x3F);
    } else {
        return false;
    }
}
ヤ经典坏疍 2024-07-14 07:48:56

我建议编写一个带有两个参数的函数:要转义的文本和字符串所在的引号类型。然后,例如,如果引号类型是单引号,则该函数将对字符串中的单引号进行转义以及任何其他需要转义的字符(反斜杠?)。

function escape_string($text, $type) {
    // Escape backslashes for all types of strings?
    $text = str_replace('\\', '\\\\', $text);

    switch($type) {
        case 'single':
            $text = str_replace("'", "\\'", $text);
            break;
        case 'double':
            $text = str_replace('"', '\\"', $text);
            break;
        // etc...
    }

    return $text;
}

我假设对于单引号字符串,您想要转义单引号,而对于双引号字符串,您想要转义双引号...

I suggest writing a function that will take two arguments: the text to be escaped and the type of quotes the string is in. Then, for example, if the type of quotes are single quotes, the function will escape the single quotes in the string and any other characters that need to be escaped (backslash?).

function escape_string($text, $type) {
    // Escape backslashes for all types of strings?
    $text = str_replace('\\', '\\\\', $text);

    switch($type) {
        case 'single':
            $text = str_replace("'", "\\'", $text);
            break;
        case 'double':
            $text = str_replace('"', '\\"', $text);
            break;
        // etc...
    }

    return $text;
}

I'm assuming that for single-quoted strings you want to escape the single quotes, and with double-quoted strings you want to escape the double quotes...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文