正则表达式替换数据中的字符

发布于 2024-12-04 12:20:47 字数 1102 浏览 1 评论 0原文

我正在尝试清理一些特殊字符的垃圾数据（允许一些），但有些仍然可以通过。我之前找到了一个正则表达式片段，但没有删除一些字符，例如星号。

  $clean_body = $raw_text;

  $clean_title = preg_replace("/[^!&\/A-Za-z0-9_ ]/","", $clean_body);
  $clean_title = substr($clean_title, 0, 64);

  $clean_body = nl2br($clean_body);  

  if ($nid) {
    $node = node_load($nid);
    unset($node->field_category);
  } else {
    $node = new stdClass();
    $node->type = 'article';
    node_object_prepare($node); 
  }

  $split_title = str_split($clean_title);

  foreach ($split_title as $key => $character) {
    if ($key > 15) {
      if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
        $node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
      }
    }
  }

第一部分尝试清除原始文本中不属于正常标点符号或字母数字的任何内容。然后，我将标题拆分为一个数组并查找空格。我想要做的是创建一个至少 15 个字符长的标题，并截断空格（保留整个单词完整），而不在标点符号字符处停止。这是我遇到麻烦的部分。

有些标题仍然以 ***************** 或 ** HOW TO MAKE $$$$$$ BLOGGING *** 时，该部分应该是 HOW TO MAKE...。

原文

I am trying to clean some junked up data of special characters (allowing a few) but some still get through. I found a regex snippet earlier but does not remove some characters, like asterisks.

  $clean_body = $raw_text;

  $clean_title = preg_replace("/[^!&\/A-Za-z0-9_ ]/","", $clean_body);
  $clean_title = substr($clean_title, 0, 64);

  $clean_body = nl2br($clean_body);  

  if ($nid) {
    $node = node_load($nid);
    unset($node->field_category);
  } else {
    $node = new stdClass();
    $node->type = 'article';
    node_object_prepare($node); 
  }

  $split_title = str_split($clean_title);

  foreach ($split_title as $key => $character) {
    if ($key > 15) {
      if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
        $node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
      }
    }
  }

The first part attempts to clean out anything in the raw text that isn't normal punctuation or alpha numeric. Then, I split the title into an array and look for a space. What I want to do is create a title that is at least 15 characters long, and truncates on a space (leaving whole words intact) without stopping on a punctuation character. This is the part I am having trouble with.

Some titles still come out as ***************** or ** HOW TO MAKE $$$$$$ BLOGGING **, when the first title should not even have *'s, and the section should be HOW TO MAKE..., for example.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一场信仰旅途 2024-12-11 12:20:47

"/[^!&\/\w\s]/ui" 怎么样？
在我的机器上运行良好

回复收藏 0 原文

や莫失莫忘 2024-12-11 12:20:47

您的问题（或者，无论如何，其中之一）是这样的逻辑：

if ($key > 15) {
  if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
    $node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
  }
}

只有在迭代 $split_title$node->title > 数组。

当它们不匹配时会发生什么？ $node->title 未设置（或被覆盖？您没有提供太多上下文，所以我无法判断）。

以此作为测试：

$clean_body = '** HOW TO MAKE $$$ BLOGGING **';

您可以看到这些条件不匹配，因此 $node->title 未设置（或覆盖）。

Your problem (or, one of them anyhow) is this logic:

if ($key > 15) {
  if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
    $node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
  }
}

You're only setting $node->title if these conditions match when iterating the characters in the $split_title array.

What happens when they don't match? $node->title doesn't get set (or overwritten? You didn't give much context, so I can't tell).

Using this as a test:

$clean_body = '** HOW TO MAKE $$$ BLOGGING **';

You can see that these conditions do not match, so $node->title does not get set (or overwritten).

回复收藏 0 原文

~没有更多了~

关于作者

尘曦

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

正则表达式替换数据中的字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

正则表达式替换数据中的字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。