正则表达式替换数据中的字符

发布于 2024-12-04 12:20:47 字数 1102 浏览 1 评论 0原文

我正在尝试清理一些特殊字符的垃圾数据(允许一些),但有些仍然可以通过。我之前找到了一个正则表达式片段,但没有删除一些字符,例如星号。

  $clean_body = $raw_text;

  $clean_title = preg_replace("/[^!&\/A-Za-z0-9_ ]/","", $clean_body);
  $clean_title = substr($clean_title, 0, 64);

  $clean_body = nl2br($clean_body);  

  if ($nid) {
    $node = node_load($nid);
    unset($node->field_category);
  } else {
    $node = new stdClass();
    $node->type = 'article';
    node_object_prepare($node); 
  }

  $split_title = str_split($clean_title);

  foreach ($split_title as $key => $character) {
    if ($key > 15) {
      if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
        $node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
      }
    }
  }

第一部分尝试清除原始文本中不属于正常标点符号或字母数字的任何内容。然后,我将标题拆分为一个数组并查找空格。我想要做的是创建一个至少 15 个字符长的标题,并截断空格(保留整个单词完整),而不在标点符号字符处停止。这是我遇到麻烦的部分。

有些标题仍然以 ******************* HOW TO MAKE $$$$$$ BLOGGING *** 时,该部分应该是 HOW TO MAKE...

I am trying to clean some junked up data of special characters (allowing a few) but some still get through. I found a regex snippet earlier but does not remove some characters, like asterisks.

  $clean_body = $raw_text;

  $clean_title = preg_replace("/[^!&\/A-Za-z0-9_ ]/","", $clean_body);
  $clean_title = substr($clean_title, 0, 64);

  $clean_body = nl2br($clean_body);  

  if ($nid) {
    $node = node_load($nid);
    unset($node->field_category);
  } else {
    $node = new stdClass();
    $node->type = 'article';
    node_object_prepare($node); 
  }

  $split_title = str_split($clean_title);

  foreach ($split_title as $key => $character) {
    if ($key > 15) {
      if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
        $node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
      }
    }
  }

The first part attempts to clean out anything in the raw text that isn't normal punctuation or alpha numeric. Then, I split the title into an array and look for a space. What I want to do is create a title that is at least 15 characters long, and truncates on a space (leaving whole words intact) without stopping on a punctuation character. This is the part I am having trouble with.

Some titles still come out as ***************** or ** HOW TO MAKE $$$$$$ BLOGGING **, when the first title should not even have *'s, and the section should be HOW TO MAKE..., for example.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一场信仰旅途 2024-12-11 12:20:47

"/[^!&\/\w\s]/ui" 怎么样?
在我的机器上运行良好

What about "/[^!&\/\w\s]/ui" ?
Works fine on my machine

や莫失莫忘 2024-12-11 12:20:47

您的问题(或者,无论如何,其中之一)是这样的逻辑:

if ($key > 15) {
  if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
    $node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
  }
}

只有在迭代 $split_title$node->title > 数组。

当它们不匹配时会发生什么? $node->title 未设置(或被覆盖?您没有提供太多上下文,所以我无法判断)。

以此作为测试:

$clean_body = '** HOW TO MAKE $$$ BLOGGING **';

您可以看到这些条件不匹配,因此 $node->title 未设置(或覆盖)。

Your problem (or, one of them anyhow) is this logic:

if ($key > 15) {
  if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
    $node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
  }
}

You're only setting $node->title if these conditions match when iterating the characters in the $split_title array.

What happens when they don't match? $node->title doesn't get set (or overwritten? You didn't give much context, so I can't tell).

Using this as a test:

$clean_body = '** HOW TO MAKE $$$ BLOGGING **';

You can see that these conditions do not match, so $node->title does not get set (or overwritten).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文