Zend 解码表单输入元素中的 html 实体导致空值

发布于 2024-11-26 18:18:29 字数 2908 浏览 1 评论 0原文

我有一个名为 metaDescription 的表单元素:

        //inside the form
        $description = $this    -> createElement('text', 'metaDescription')
                                -> setLabel('Description:')
                                -> setRequired(false)
                                -> addFilter('StringTrim')
                                -> addValidator('StringLength', array(0, 300))
                                -> addErrorMessage('Invalid description.');               
        $this->addElement($description);

每当加载此表单时,我都会使用从数据库中提取的默认值对其进行初始化:

$form->setDefault('metaDescription', $oldPage->getMetaDescription());

这工作得很好。

但是,我现在想要在有人发送表单时对任何输入描述进行 htmlencode,并从数据库中提取默认值,以便字符以其原始形状显示再次。

我在处理表单输入时这样做了:

//handle post
        if ($request->isPost()) {
            if ($form->isValid($request->getPost())) {
                $page = new Application_Model_PagePainter(array(
                    'metaDescription'   => htmlentities($form->getValue('metaDescription'))
                ));
                $pageMapper->save($page);

                ....

现在我像这样设置默认值:

$form->setDefault('metaDescription', html_entity_decode($oldPage->getMetaDescription()));

起初,这似乎也工作得很好。当我发送例如 woord1, woord2, me&you 作为描述时,这会在数据库中正确保存为 woord1, woord2, me&you 并正确显示再次为woord1、woord2、我和你。但是,当我设置像 ó 这样的奇怪字符时,例如。 wóórd1 这在数据库中正确保存为 wóórd1 但随后发生了一些奇怪的事情:当再次显示表单时,默认值为空。当我查看源代码时,它确实是空的:

这会让我相信由于某种原因 html_entity_decode($oldPage->getMetaKeywords()) 返回一个空字符串。但是,当我回显它时,它返回正确的结果:wóórd1,但 setDefault 没有效果。当我删除 html_entity_decode 时,setDefault 再次正确工作,并且该值显示在表单中,但没有解码的 html 实体。

为什么这个 html 实体解码会导致表单值对于这种奇怪的字符为空?

回复 vstm

出于调试目的,我像这样取消编码:

$this->view->setEscape(array($this, 'myEscape'));

public function myEscape($inputString)
    {
        return $inputString;
    }

不幸的是,问题仍然与之前解释的相同。只是为了澄清一下,我在将值放入数据库之前对其进行编码,如下所示:

'metaDescription'   => htmlentities($form->getValue('metaDescription'), ENT_COMPAT, 'UTF-8')

并且在将值从数据库中取出后对值进行解码,如下所示:

$form->setDefault('metaDescription', html_entity_decode($oldPage->getMetaDescription(), ENT_COMPAT, 'UTF-8'));

但非常有趣的是,它似乎确实与 UTF8 编码相关,因为当我将编码更改为,

'metaDescription'   => htmlentities($form->getValue('metaDescription'), ENT_COMPAT 'ISO-8859-1') 

同时保持解码为 UTF8,输入 tést 将导致输入框显示 tést 而不是设置两种方法时的空值转为 UTF8。

这对你有帮助吗?

I have a form element, called metaDescription:

        //inside the form
        $description = $this    -> createElement('text', 'metaDescription')
                                -> setLabel('Description:')
                                -> setRequired(false)
                                -> addFilter('StringTrim')
                                -> addValidator('StringLength', array(0, 300))
                                -> addErrorMessage('Invalid description.');               
        $this->addElement($description);

Whenever this form loads, I initialize it with a default value pulled from the database:

$form->setDefault('metaDescription', $oldPage->getMetaDescription());

This works perfectly fine.

However, I now want to htmlencode any input description when someone sends the form and html_entity_decode the default value that is pulled from the database so that the characters are shown in their original shape again.

I did this like so when handling form input:

//handle post
        if ($request->isPost()) {
            if ($form->isValid($request->getPost())) {
                $page = new Application_Model_PagePainter(array(
                    'metaDescription'   => htmlentities($form->getValue('metaDescription'))
                ));
                $pageMapper->save($page);

                ....

And I now set the default value like so:

$form->setDefault('metaDescription', html_entity_decode($oldPage->getMetaDescription()));

At first, this seems to work fine as well. When I send for example woord1, woord2, me&you as the description, this is correctly saved as woord1, woord2, me&you in the database and correctly displayed again as woord1, woord2, me&you. However, when I set a strange character like ó, eg. wóórd1 this is correctly saved in the database as wóórd1 but then something strange happens: when the form is displayed again, the default value is empty. When I look at the source, it is indeed empty: <input type="text" name="metaDescription" id="metaDescription" value="" />.

This would make me believe that for some reason html_entity_decode($oldPage->getMetaKeywords()) returns an empty string. However, when I echo it it returns the correct result: wóórd1, yet the setDefault has no effect. When I remove the html_entity_decode the setDefault works correct again and the value is shown in the form, but without the decoded html entity.

Why is this html entity decode causing the form value to be empty for such strange characters?

Reply to vstm

For debugging purposes, I unset encoding like so:

$this->view->setEscape(array($this, 'myEscape'));

public function myEscape($inputString)
    {
        return $inputString;
    }

Unfortunately, the problem remains the same as explained earlier. Just to clarify, I encode the value before putting it in the database like so:

'metaDescription'   => htmlentities($form->getValue('metaDescription'), ENT_COMPAT, 'UTF-8')

And I decode the value after getting it out of the database like so:

$form->setDefault('metaDescription', html_entity_decode($oldPage->getMetaDescription(), ENT_COMPAT, 'UTF-8'));

Very interestingly however, is that it does seem related to the UTF8 encoding, because when I change the encoding to

'metaDescription'   => htmlentities($form->getValue('metaDescription'), ENT_COMPAT 'ISO-8859-1') 

while keeping decoding at UTF8, an input tést will result in the input box showing tést rather than an empty value which is the case when setting both methods to UTF8.

Does this help you?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

围归者 2024-12-03 18:18:29

我知道这与 Zend 框架做自己的 使用 htmlspecialchars 和 utf-8 转义(除非您使用视图 setEscape/setEncoding 方法更改它)。事实上,当你这样做时:

$test = "wóórd1";
$test = html_entity_decode($test, ENT_COMPAT, "iso-8859-1");
$test = htmlspecialchars($test, ENT_COMPAT, "utf-8");

$test 最后是空的。

因此,您必须使用“utf-8”调用 html_entity_decode 或将视图编码更改为“iso-8859-1”(或任何您的编码)。我认为提供“utf-8”是更好的选择。

反对编码的战争

发明字符编码的人要么是邪恶的天才,要么是
愚蠢的穴居人。

为了完成这项工作,您还必须注意浏览器使用的编码,否则您要么在数据库中写入垃圾,要么在输出中渲染垃圾,或者两者兼而有之(或者什么也不做,如果您将错误的字符集交给某些 PHP 函数) 。 (请耐心等待)

所以首先您必须确保浏览器使用的编码。这可以通过以下方式实现:

  1. HTTP 响应标头
  2. Content-Type 元标记(ZF 中的主要选项)

因此,请检查 HTML 输出中的内容类型元标记以及它建议的编码。如果没有内容类型元信息或者不包含字符集信息,那么您应该在布局中添加一个,最好使用 utf-8(如果您现在不使用布局,那么是开始使用它的好时机) )。这很重要,否则您不确定您的输入是什么编码或者您必须向浏览器传递什么编码。这意味着您的应用程序返回的每个页面的打开 标签之后会出现类似的情况:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

在以下示例中,我们假设您选择 utf-8,但您可以使用任何合适的内容 -如果您相应地更改值(这意味着 s/UTF-8/您的编码/g)。

现在,当从浏览器检索数据时,您知道必须为 htmlentities 调用提供什么字符集 (utf-8):

'metaDescription'   => 
    htmlentities($form->getValue('metaDescription'), ENT_COMPAT, 'UTF-8')

这意味着 $form->getValue('metaDescription' ) 返回一个 utf-8 编码的字符串,必须将其转换为 HTML 实体字符串,这正是我们想要的。

因此,数据库中现在是无威胁的字符串,没有元音变音、重音符号或其他任何内容。

现在我们来看看编辑部分。您必须在那里解码 HTML 实体,以便用户不能处理它们。输出字符串必须使用我们想要的字符集进行编码(是的,正确的:utf-8):

$form->setDefault('metaDescription', 
    html_entity_decode($oldPage->getMetaDescription(), ENT_COMPAT, 'UTF-8'));

所以现在您已经将 html_entity_decode 返回的 utf-8 编码字符串分配给 metaDescription 现在我们只需要通过 htmlspecialchars 调用,如果有人使用 $view->escape() ,默认情况下会调用该调用。

最后一步是确保 Zend_Viewencode 知道我们的编码(如果您使用 utf-8,这是可选的,因为这已经是默认值)。可以使用 $this->view->setEncoding('UTF-8') 为控制器中的特定视图设置它,也可以为 bootstrap.php 中的所有视图设置它>:

protected function _initView()
{
    $view = new Zend_View();
    $view->setEncoding('UTF-8');
    $viewRenderer =
        Zend_Controller_Action_HelperBroker::getStaticHelper(
            'ViewRenderer'
        );
    $viewRenderer->setView($view);
    return $view;
}

如果现在有人调用 $view->escape(),它也需要一个 utf-8 字符串作为输入。您应该能够使用“null”转义删除 setEscape 调用。

如果您遵循了所有这些步骤,您现在应该已经根据需要恢复了所有带有元音变音、重音符号和坟墓的特殊字符(否则我现在已经丢脸了)。

因此每个函数都会收到它期望的编码,否则它会返回臭名昭著的空字符串(伪流程图):

  1. 浏览器 ->以 UTF-8 格式发送数据
  2. htmlentities($browserData, ,'UTF-8') ->期望 UTF-8 返回没有元音变音或其他奇特内容的
  3. ASCII 数据库存储 ASCII 文本
  4. -- 时间流逝 --
  5. 然后在编辑时:从数据库加载 ASCII
  6. html_entity_decode($dbData, ,'UTF-8') ->需要 ASCII,返回 UTF-8 编码
  7. 通过 $view->escape(): htmlspecialchars ->需要 UTF-8,返回 UTF-8
  8. 浏览器 ->需要 UTF-8

tl;dr / recap

  • 使用您想要的字符集设置内容类型元标记
  • 确保所有编码/解码函数都知道您选择的字符集(这意味着:保持一致)

I knew it hat something to do with the Zend framework doing its own escaping using htmlspecialchars and utf-8 (unless you change that with the view setEscape/setEncoding methods). And indeed when you do this:

$test = "wóórd1";
$test = html_entity_decode($test, ENT_COMPAT, "iso-8859-1");
$test = htmlspecialchars($test, ENT_COMPAT, "utf-8");

$test is empty at the end.

So you have to call html_entity_decode with "utf-8" or change the views encoding to "iso-8859-1" (or whatever your encoding is). I think supplying "utf-8" is the better option.

War against the encodings

Whoever invented character encodings was either an evil genius or a
stupid caveman.

To make this work you have also take care of what encoding the browser is using because otherwise you either write garbage in your database, render garbage in your output or both (or nothing, if you hand over the wrong charset to certain PHP-functions). (bear with me)

So first you have to ensure what encoding the browser is using. This can be achieved by:

  1. HTTP response headers
  2. The Content-Type meta tag (the primary option in ZF)

So check out the content-type meta tag in your HTML-output and what encoding it is suggesting. If there is no content-type meta information or it doesn't include the charset information then you should add one, preferably with utf-8, in your layout (if you're not using layout now is a good time to start with it). This is important otherwise you don't know for sure what encoding your input is or what encoding you have to deliver to the browser. That means something like that is after your opening <head>-Tag of every page returned by your application:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

In the following examples we assume you choose utf-8, but you might use whatever is appropriate - if you change the values accordingly (that means s/UTF-8/your encoding/g).

Now, when retrieving data from the browser you know what charset you have to supply for the htmlentities call (utf-8):

'metaDescription'   => 
    htmlentities($form->getValue('metaDescription'), ENT_COMPAT, 'UTF-8')

So that means that $form->getValue('metaDescription') returns an utf-8 encoded string which has to be converted to an HTML-entities string, which is exactly what we want.

So in the database is now the non-threatening string with no umlauts, accents or whatever.

Now we take a look at the editing-part. There you must decode the HTML-entities so the user must not deal with them. The output string has to be encoded with our desired charset (yes, right: utf-8):

$form->setDefault('metaDescription', 
    html_entity_decode($oldPage->getMetaDescription(), ENT_COMPAT, 'UTF-8'));

So now you have assigned the utf-8 encoded string returned by html_entity_decode to metaDescription now we only have to get past that htmlspecialchars call which is called by default if someone uses $view->escape().

The last step is to ensure that the Zend_View's encode is aware of our encoding (this is optional if you are using utf-8 since this is already the default). Either set it for a specific view in the controller with $this->view->setEncoding('UTF-8') or for all views in the bootstrap.php:

protected function _initView()
{
    $view = new Zend_View();
    $view->setEncoding('UTF-8');
    $viewRenderer =
        Zend_Controller_Action_HelperBroker::getStaticHelper(
            'ViewRenderer'
        );
    $viewRenderer->setView($view);
    return $view;
}

If someone now calls $view->escape() it also expects an utf-8 string as input. You should be able to remove the setEscape call with the "null" escape.

If you followed all these steps you should now have all special characters with umlauts, accents and graves restored as desired (or I have now disgraced myself).

So every function receives the encoding it expects, otherwise it returns the infamous empty string (pseudo flow-chart):

  1. Browser -> sends data in UTF-8
  2. htmlentities($browserData, ,'UTF-8') -> expects UTF-8 returns ASCII without umlauts or other fancy stuff
  3. Database stores ASCII-Text
  4. -- Time passes --
  5. Then when editing: Load ASCII from database
  6. html_entity_decode($dbData, ,'UTF-8') -> expects ASCII, returns UTF-8 encoded
  7. Via $view->escape(): htmlspecialchars -> expects UTF-8, returns UTF-8
  8. Browser -> expects UTF-8

tl;dr / recap

  • Set a content-type meta-tag with your desired charset
  • Ensure that all the encode/decode-functions are aware of the charset you have chosen (that means: be consistent)
铁轨上的流浪者 2024-12-03 18:18:29

您还可以使用 Zend_Filter_HtmlEntities() 代替 php 函数。它所做的并不比 php 函数多,但它将保证整个表单的持久编码。

You can also use Zend_Filter_HtmlEntities() instead of the php functions. It is not doing more than the php functions but it will guarantee a persistent encoding throughout your form.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文