正则表达式替换数据中的字符
我正在尝试清理一些特殊字符的垃圾数据(允许一些),但有些仍然可以通过。我之前找到了一个正则表达式片段,但没有删除一些字符,例如星号。
$clean_body = $raw_text;
$clean_title = preg_replace("/[^!&\/A-Za-z0-9_ ]/","", $clean_body);
$clean_title = substr($clean_title, 0, 64);
$clean_body = nl2br($clean_body);
if ($nid) {
$node = node_load($nid);
unset($node->field_category);
} else {
$node = new stdClass();
$node->type = 'article';
node_object_prepare($node);
}
$split_title = str_split($clean_title);
foreach ($split_title as $key => $character) {
if ($key > 15) {
if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
$node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
}
}
}
第一部分尝试清除原始文本中不属于正常标点符号或字母数字的任何内容。然后,我将标题拆分为一个数组并查找空格。我想要做的是创建一个至少 15 个字符长的标题,并截断空格(保留整个单词完整),而不在标点符号字符处停止。这是我遇到麻烦的部分。
有些标题仍然以 *****************
或 ** HOW TO MAKE $$$$$$ BLOGGING ***
时,该部分应该是 HOW TO MAKE...
。
I am trying to clean some junked up data of special characters (allowing a few) but some still get through. I found a regex snippet earlier but does not remove some characters, like asterisks.
$clean_body = $raw_text;
$clean_title = preg_replace("/[^!&\/A-Za-z0-9_ ]/","", $clean_body);
$clean_title = substr($clean_title, 0, 64);
$clean_body = nl2br($clean_body);
if ($nid) {
$node = node_load($nid);
unset($node->field_category);
} else {
$node = new stdClass();
$node->type = 'article';
node_object_prepare($node);
}
$split_title = str_split($clean_title);
foreach ($split_title as $key => $character) {
if ($key > 15) {
if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
$node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
}
}
}
The first part attempts to clean out anything in the raw text that isn't normal punctuation or alpha numeric. Then, I split the title into an array and look for a space. What I want to do is create a title that is at least 15 characters long, and truncates on a space (leaving whole words intact) without stopping on a punctuation character. This is the part I am having trouble with.
Some titles still come out as *****************
or ** HOW TO MAKE $$$$$$ BLOGGING **
, when the first title should not even have *
's, and the section should be HOW TO MAKE...
, for example.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
"/[^!&\/\w\s]/ui"
怎么样?在我的机器上运行良好
What about
"/[^!&\/\w\s]/ui"
?Works fine on my machine
您的问题(或者,无论如何,其中之一)是这样的逻辑:
只有在迭代
$split_title
$node->title > 数组。当它们不匹配时会发生什么?
$node->title
未设置(或被覆盖?您没有提供太多上下文,所以我无法判断)。以此作为测试:
您可以看到这些条件不匹配,因此
$node->title
未设置(或覆盖)。Your problem (or, one of them anyhow) is this logic:
You're only setting
$node->title
if these conditions match when iterating the characters in the$split_title
array.What happens when they don't match?
$node->title
doesn't get set (or overwritten? You didn't give much context, so I can't tell).Using this as a test:
You can see that these conditions do not match, so
$node->title
does not get set (or overwritten).