帮助在 PHP 中实现标签
在我最近的 PHP 项目中,我需要实现用逗号分隔的标签(可搜索)(类似于此网站或 WordPress 中的类似内容)。检测和删除不必要的字符或标签的明智方法是什么?抛开 XSS 问题不谈,首先,如果用户输入 HTML(或其他标签)而不是纯文本,我只需要清理和提取文本。
例如:
If user inputs <b>sdfasdf</b>, <a href="something">sdfsdfsdf</a>, <sdfsdfsdf
It should strip out all the unnecessary characters and tags and only plain text should be saved in database.
我在WordPress中尝试过,发现这个加号仅自动提取文本是非常聪明的。
我的问题:
是否有一个开源库可用于此任务,我可以将其集成到我的项目中。我已经做了一些关于此的作业,但是 *htmlentities()、strip_tags()、HTML Purifier* 等似乎不适合此任务。或者是否需要结合此构建我自己的库?
有人可以指导我吗?
谢谢!
In my recent PHP project, I need to implement Tags (searchable) separated by comma (similar to this site or something like in WordPress). What is the smart way to detect and remove unnecessary characters or tags? Putting the XSS concern aside, first of all I need to clean and extract only text if user inputs HTML(or other tags) instead of the plain text.
For example:
If user inputs <b>sdfasdf</b>, <a href="something">sdfsdfsdf</a>, <sdfsdfsdf
It should strip out all the unnecessary characters and tags and only plain text should be saved in database.
I have tried it in WordPress and it is very smart to figure out this plus automatically extracts text only.
My question:
Is there an open source library available for this task, which I can integrate in my project. I have done some homework regarding this but *htmlentities(), strip_tags(), HTML Purifier* etc. doesn't seem suitable for this task. Or do need to build my own library combined with this?
Can somebody guide me on this?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
除了删除诸如
sdfasdf、sdfsdfsdf
中的“完整”标签(标记语言元素)之外,您还可以删除“禁止”字符,例如“<”、“>”和“&” (使用
preg_replace
等),并将多个空格折叠为一个空格(也使用preg_replace
)。请记住,它们仅用作标签(关键字),因此此处使用稍微受限制的字符集是可以接受的。在堆栈中
例如,溢出,标签中只允许使用字母、数字和连字符。
In addition to removing "complete" tags (markup language elements) such as found in
<b>sdfasdf</b>, <a href="something">sdfsdfsdf</a>
,you can also remove "forbidden" characters such as "<", ">", and "&" (using
preg_replace
and the like), and collapse multiple spaces into a single space (also usingpreg_replace
).Remember, they're used only as tags (keywords), so it's acceptable here to use a somewhat restricted character set. In Stack
Overflow, for instance, only letters, numbers, and hyphens are allowed in tags.
我会反过来看这个问题。什么输入是合法的?标签名称中允许使用哪些字符?这些问题得到解答后,我将使用正则表达式构建服务器端合法字符白名单,在 UI 中说明规则,然后简单地拒绝符合要求的输入。
将无效输入调整为有效输入很少是一个好主意。
标签中允许的字符通常是字母数字+破折号和下划线。有些网站还允许空间。
I would look at this the other way around. What input is legal? Which characters are allowed in tag names? Ones those questions are answered I would build a server-side whitelist of legal characters using regex, state the rules in the UI, and simply reject input that does comply.
Massaging invalid inpu into valid, is rarely a good idea.
Characters allowed in tags are usually alphanumeric + dashes and underscores. Some sites also allow spaces.