双向文本破坏布局
在我的网站中,有一个标题应显示“欢迎用户名。”。
<span>Welcome <?php echo $username; ?>.</span>
问题是,如果用户将其名称更改为 U+202Eusername(其中 U+202E 是从右到左覆盖字符,或 RLO),则所有布局都会中断。
它不会显示“Welcome emanresu。”,而是显示“Welcome .emanresu”或“.emanresu欢迎”或类似内容。我尝试在用户名后添加 U+202C(流行定向格式或 PDF)字符,并且它有效。像这样:
<span>Welcome <?php echo $username; ?>‬.</span>
但是,如果用户名有多个 RLO 字符,它会再次中断。所以我应该做的是将 RLO 字符与 PDF 字符进行匹配,但我不确定如何执行此操作。根据 W3C 规范,没有解决方案。 我在这里错过了什么吗?
In my website there's a header that should display "Welcome username.".
<span>Welcome <?php echo $username; ?>.</span>
The problem is that if the user changes his name to U+202Eusername (where U+202E is the right-to-left override character, or RLO), all the layout breaks.
Instead of displaying "Welcome emanresu.", it displays "Welcome .emanresu" or ".emanresu Welcome" or things like that. I tried adding a U+202C (pop directional formatting or PDF) character after the username, and it worked. Like this:
<span>Welcome <?php echo $username; ?>.</span>
But, if the username has more than one RLO characters, it breaks again. So what I should do is matching the RLOs characters with the PDFs characters, but I'm not sure of how to do this. And according to the W3C specifications there's no solution to this.
Am I missing something here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可能对 HTML5 标签
感兴趣。
详细信息:http://rishida.net/blog/?p=564
You might be interested in HTML5 tag
<bdi>
.Details: http://rishida.net/blog/?p=564
为什么不在 $username 中搜索此字符,如果找到,请将
更改为
。还将 $username 中的这些字符替换为空白
Why not search for this character in $username and if it is found change
<span>
to<span dir="rtl">
. Also replace these characters with blank in $username了解 Unicode 中的双向 (BIDI) 文本
这篇文章是关于 bidi 的非常有趣的一般读物issues 在末尾处还有一个名为“过滤用户输入”的部分,该部分似乎正在准确地讨论您正在讨论的问题。
Understanding Bidirectional (BIDI) Text in Unicode
This article being a very interesting general read about bidi issues also has a section named "Filtering User Input" near the end that seems to be talking exactly about the issue you're talking about.
W3C 的解决方案是,您应该从称为“不合适的组中过滤掉 RLO 和其他字符用于标记”。
在过滤掉其他不需要的控制代码的同时执行此操作,例如 ASCII 0x00–0x1F(可能包括或排除换行符)和 0x7F-0x9F。请参阅此问题了解背景信息。
(您还应该使用
echo htmlspecialchars($username);
。也许您的用户名不能包含<
或&
但这不是当然,在输出阶段要习惯于对输出到页面的所有内容调用htmlspecialchars
;如果需要,请为其定义一个快捷函数。)W3C's solution is that you should be filtering out RLO and other characters from the group known as “Not suitable for use in markup”.
Do this at the same time as filtering out other unwanted control codes like ASCII 0x00–0x1F (potentially including or excluding the newline character) and 0x7F-0x9F. See this question for background.
(You should also be using
echo htmlspecialchars($username);
. Maybe your usernames can't contain<
or&
but that's not a good idea to rely on in your output stage. Get used to callinghtmlspecialchars
on everything that goes out to the page as a matter of course; define a shortcut function for it if necessary.)