如何从文件中删除换行符？

发布于 2024-11-02 06:18:08 字数 231 浏览 2 评论 0原文

如何

<p> (break line!!!)
text...
</p> (break line!!!)

使用正则表达式从文件中删除：？

我试过：

find . -type f -exec perl -p -i -e "s/SEARCH_REGEX/REPLACEMENT/g" {} \;

原文

How to remove:

<p> (break line!!!)
text...
</p> (break line!!!)

from a file with regex?

I tried:

find . -type f -exec perl -p -i -e "s/SEARCH_REGEX/REPLACEMENT/g" {} \;

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

明媚如初 2024-11-09 06:18:08

这些东西真的会在你脸上爆炸，所以要小心；尝试在测试目录等中使用测试数据。

-0 开关将“关闭”默认记录分隔符 ($/)，以便您可以一次执行多行操作。 s 让 . 跨换行符匹配，而 +? 则让它懒惰到“TERRANO”。对您的一个文件尝试此测试。

perl -0 -p -e 's/<p>.+?TERRANO[^<]*<\/p>//gs'

如果有效，您可以将其添加到您的原始版本中。

find . -type f -exec perl -0 -pi -e "s/<p>.+?TERRANO[^<]*<\/p>//gs" {} \;

正如评论中提到的，如果内容是 HTML，您可能应该使用 HTML 解析器。

This stuff can really blow up in your face so be careful; try it with test data in a test dir etc.

The -0 switch will "turn off" the default record separator ($/) so you can do multiple lines at once. The s lets . match across newlines and the +? is to make it lazy up to the "TERRANO." Try this test on one of your files.

perl -0 -p -e 's/<p>.+?TERRANO[^<]*<\/p>//gs'

If that works, you can add it to your original.

find . -type f -exec perl -0 -pi -e "s/<p>.+?TERRANO[^<]*<\/p>//gs" {} \;

As mentioned in a comment, if the content is HTML, you should probably be using an HTML parser.

回复收藏 0 原文

你的他你的她 2024-11-09 06:18:08

有几种方法可以做到这一点。

首先是 undef $\。
然后你匹配类似

/\

\nTERRANO.*\n\<\/p\>/

的内容，这可能取决于你是否使用 cr/lf，或者只是 lf's/

第二是使用循环来连接行（加上 $\ 中的任何内容）并在一个正则表达式中进行匹配，包括匹配 $\ 中的任何内容。

第三种方法是使用 File::Slurp。

第四是使用多个正则表达式和一个循环来匹配每一行，如果这三个都满足，则进行替换。

回复收藏 0 原文

浅唱ヾ落雨殇 2024-11-09 06:18:08

您还可以使用 Unix 文本编辑器 ed 通过正则表达式删除一系列行：

str='
BEFORE MULTILINE PATTERN 1
<p> (break line!!!)
text...
</p> (break line!!!)
AFTER MULTILINE PATTERN 1
BEFORE MULTILINE PATTERN 2 
<p> (break line!!!)
text...
</p> (break line!!!)
AFTER MULTILINE PATTERN 2
'

# for in-place file editing use "ed -s file" and replace ",p" with "w"
# cf. http://wiki.bash-hackers.org/howto/edit-ed

cat <<-'EOF' | sed -e 's/^ *//' -e 's/ *$//' -e '/^ *#/d' | ed -s <(echo "$str")
  H
  # only remove the first match
  #/<p>/,/<\/p>/d
  # remove all matches
  g/<p>/+0,/<\/p>/+0d
  ,p
  q
EOF

You may also use the Unix text editor ed to remove a range of lines with regex:

str='
BEFORE MULTILINE PATTERN 1
<p> (break line!!!)
text...
</p> (break line!!!)
AFTER MULTILINE PATTERN 1
BEFORE MULTILINE PATTERN 2 
<p> (break line!!!)
text...
</p> (break line!!!)
AFTER MULTILINE PATTERN 2
'

# for in-place file editing use "ed -s file" and replace ",p" with "w"
# cf. http://wiki.bash-hackers.org/howto/edit-ed

cat <<-'EOF' | sed -e 's/^ *//' -e 's/ *$//' -e '/^ *#/d' | ed -s <(echo "$str")
  H
  # only remove the first match
  #/<p>/,/<\/p>/d
  # remove all matches
  g/<p>/+0,/<\/p>/+0d
  ,p
  q
EOF

回复收藏 0 原文