使用正则表达式搜索和替换将多个空行替换为一个空行

发布于 2024-10-08 19:28:55 字数 1771 浏览 0 评论 0原文

我有一个文件需要重新格式化并删除“额外”空白行。

我正在使用 UltraEdit 的 Perl 语法正则表达式搜索和替换功能,并且需要将正则表达式放入“查找内容:”字段中。

这是我需要重新格式化的文件示例。

All current text

REPLACE with all the following:


Winter 2011 Class Schedule 

Winter 2011 Class Registration Dates:  Dec. 6, 2010 – Jan. 1, 2011
Winter 2011 Class Session Dates:  Jan. 5 – Feb. 12, 2011

DANCE

Adventures in Ballet & Tap      
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
Saturdays   9 - 10 a.m.     Jan. 8 – Feb. 12        Six-week fees:   $30 


African Storytelling
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
Saturdays   10 – 11 a.m.    Jan. 8 – Feb. 12        Six-week fee:   $30


African Dance / Children

您会注意到一些双空行中包含空格或制表符或两者都有。

运行搜索和替换后,我应该有一个如下所示的文件。

All current text

REPLACE with all the following:

Winter 2011 Class Schedule 

Winter 2011 Class Registration Dates:  Dec. 6, 2010 – Jan. 1, 2011
Winter 2011 Class Session Dates:  Jan. 5 – Feb. 12, 2011

DANCE

Adventures in Ballet & Tap      
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
Saturdays   9 - 10 a.m.     Jan. 8 – Feb. 12        Six-week fees:   $30 

African Storytelling
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
Saturdays   10 – 11 a.m.    Jan. 8 – Feb. 12        Six-week fee:   $30

African Dance / Children

I have a file that I need to reformat and remove "extra" blank lines.

I am using the Perl syntax regular expression search and replace functionality of UltraEdit and need the regular expression to put in the "Find What:" field.

Here is a sample of the file I need to re-format.

All current text

REPLACE with all the following:


Winter 2011 Class Schedule 

Winter 2011 Class Registration Dates:  Dec. 6, 2010 – Jan. 1, 2011
Winter 2011 Class Session Dates:  Jan. 5 – Feb. 12, 2011

DANCE

Adventures in Ballet & Tap      
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
Saturdays   9 - 10 a.m.     Jan. 8 – Feb. 12        Six-week fees:   $30 


African Storytelling
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
Saturdays   10 – 11 a.m.    Jan. 8 – Feb. 12        Six-week fee:   $30


African Dance / Children

You'll notice that some of the double blank lines have spaces or tabs or both in them.

After the search and replace has been run I should have a file that looks like this.

All current text

REPLACE with all the following:

Winter 2011 Class Schedule 

Winter 2011 Class Registration Dates:  Dec. 6, 2010 – Jan. 1, 2011
Winter 2011 Class Session Dates:  Jan. 5 – Feb. 12, 2011

DANCE

Adventures in Ballet & Tap      
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
Saturdays   9 - 10 a.m.     Jan. 8 – Feb. 12        Six-week fees:   $30 

African Storytelling
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
Saturdays   10 – 11 a.m.    Jan. 8 – Feb. 12        Six-week fee:   $30

African Dance / Children

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

谎言月老 2024-10-15 19:28:55

替换

^(\s*\r\n){2,}

\r\n

是我最终得到的。

这只选择两个或更多倍数的空白行并将其替换为一个。

Replacing

^(\s*\r\n){2,}

With

\r\n

Is what I ended up with.

This only selects blank lines in multiples of two or more and replaces them with one.

白云悠悠 2024-10-15 19:28:55

这取决于行结尾是什么。假设 \n,将其替换

([ \t]*\n){3,}

\n\n

It depends what the line endings are. Assuming \n, replace this:

([ \t]*\n){3,}

with \n\n.

树深时见影 2024-10-15 19:28:55

试试这个 perl oneliner perl -00pe0,如果你想就地编辑,只需添加 -i 选项

Try this perl oneliner perl -00pe0, if you want in place editing, just add -i option

給妳壹絲溫柔 2024-10-15 19:28:55

为了完整起见,我想在这里引用大帖子 删除/删除空白和UltraEdit 用户论坛中的空行 在对新手的所有解释之后,其底部包含将两行或多行没有任何内容(空行)或只有空格(空行)的解决方案减少为独立于行的一个空行终止符类型。

关于 Alan Moore 在他的回答中所写的一些话:

UltraEdit 的 Perl 正则表达式支持并没有因其基于行的架构而受到削弱。 Perl 正则表达式引擎有一个标志,用于确定点是否与除换行符(如回车符 (CR) 和换行符 (LF))之外的所有字符匹配,或者与包括 CR 和 LF 在内的所有字符匹配。如果文本文件被解释为大字节流或 Perl 正则表达式查找/替换的行序列,则会产生差异。在 UltraEdit 中,该标志默认设置为在正则表达式搜索字符串中不包含 \r (CR) 和 \n (LF) 点。但是,可以在 UltraEdit 中轻松更改此行为,方法是使用 (?s) 启动正则表达式字符串,这会更改标记 match_not_dot_newline 的值,如 UltraEdit 用户论坛中发布的主题"。" Perl 正则表达式中不包含 CRLF?

Perl 正则表达式替换使用

  • 回车 + 换行符 (DOS/Windows) 或
  • 仅换行符(Unix、Mac OS 10.0 及更高版本)或
  • 仅回车符 的文件返回(Mac OS 9 及之前的版本)

作为 行尾,并可选择在末尾添加空格和制表符段落(一行或多行)以及段落下方没有(空行)或空白(空行)的两行或更多行可以使用搜索字符串 \h*(\r?\n|\r )(?:\h*\1){2,}\1\1 作为替换字符串。

说明:

\h* 根据 Unicode 匹配任何水平空白字符0 次或多次。搜索表达式的第一部分匹配行尾的水平空白字符,例如水平制表符、普通空格、不间断空格和其他一些不常用的空格。

使用 \s 不好,因为该字符类匹配任何空白字符,包括垂直空白字符回车符和换行符。

(\r?\n|\r) ... 是一个 OR 表达式,在标记组中有两个参数。第一个参数匹配换行符(可选)和前面的回车符,而第二个参数仅匹配回车符。所以这个表达式完全正确地匹配所有三种常见类型的线路终止。对于搜索和替换的其余部分来说,始终匹配 CR+LF(两者一起) 仅 LF 非常重要或只是 CR

(?:\h*\1) ...是一个非标记组,它匹配0个或多个水平空白换行符\1反向引用之前一样,即CR+LF只是LF 仅 CR。所以这部分表达式找到一个空行或空行。

{2,} ...是非标记组中先前表达式的乘数,这意味着至少两次。因此,段落结束后必须有两个或多个空行。段落下方只有一个空行不足以实现搜索表达式的正匹配。

替换字符串 \1\1 两次引用第一个找到的换行符。

与此处发布的其他正则表达式相比,此正则表达式的优点是行结束类型必须未知。搜索表达式发现替换字符串中引用了 out 和找到的行结尾。如果段落下面有两个或多个空行或空白行,则此正则表达式替换可能也会删除段落末尾的现有尾随空格和下一行的空格。

如果在运行此 Perl 常规程序时还应修剪段落末尾以及下一个空行或空行上的空格,则 {2,} 可以在搜索字符串中替换为 +表达式替换。但请注意,在这种情况下,如果段落末尾没有尾随空格并且下一行是空行,则替换将根本不会更改任何内容。

For completeness I want to reference here the large post Remove / delete blank and empty lines in the user forums of UltraEdit which contains at bottom after all the explanations for newbies the solution for reducing two or more lines with nothing (empty lines) or just whitespaces (blank lines) to one empty line independent on line terminator type.

And some words on what Alan Moore wrote in his answer:

UltraEdit's Perl regular expression support is not crippled by its line-based architecture. Perl regular expression engines have a flag which determine if a dot matches all characters except newline characters like carriage return (CR) and line feed (LF) or really all characters including CR and LF. This makes the difference if a text file is interpreted as large byte stream or as a sequence of lines for Perl regular expression finds/replaces. In UltraEdit the flag is set by default to not include \r (CR) and \n (LF) by a dot in the regular expression search string. But this behavior can be easily changed in UltraEdit by starting the regular expression string with (?s) which changes the value of the flag match_not_dot_newline as posted in UltraEdit user forums at topic "." in Perl regular expressions doesn't include CRLFs?

A Perl regular expression replace working for files with

  • carriage return + line feed (DOS/Windows) or
  • only line feed (Unix, Mac OS 10.0 and later versions) or
  • only carriage return (Mac OS 9 and previous versions)

as line ending with optionally trailing spaces and tabs at end of a paragraph (one or more lines) and with two or more lines without (empty line) or with whitespaces (blank line) below the paragraph could be done with search string \h*(\r?\n|\r)(?:\h*\1){2,} and \1\1 as replace string.

Explanation:

\h* matches any horizontal whitespace character according to Unicode 0 or more times. This first part of the search expression matches horizontal whitespace characters at end of a line like horizontal tabs, normal spaces, no-break-spaces and some other not often used spaces.

The usage of \s is not good as this character class matches any whitespace character including the vertical whitespace characters carriage return and line feed.

(\r?\n|\r) ... is an OR expression with two arguments in a marking group. The first argument matches a line feed optionally with a preceding carriage return while the second argument matches just a carriage return. So this expression matches all three common types of line terminations completely correct. It is important for the rest of the search and the replace to match always either CR+LF (both together) or just LF or just CR.

(?:\h*\1) ... is a non marking group which matches 0 or more horizontal whitespaces and the newline as found before back-referenced with \1, i.e. CR+LF or just LF or just CR. So this part of the expression finds an empty or blank line.

{2,} ... is a multiplier for the previous expression in the non marking group which means at least two times. So after end of a paragraph there must be two or more empty or blank lines. Only one empty or blank line below a paragraph is not enough for a positive match of search expression.

The replace string \1\1 references twice the first found line break.

The advantage of this regular expression in comparison to the others posted here is that the line ending type must not be known. The search expression finds that out and found line ending is referenced in the replace string. And probably existing trailing whitespaces at end of a paragraph and whitespaces on next line are removed also by this regular expression replace if there are two or more empty or blank lines below a paragraph.

{2,} can be replaced by + in search string if trimming whitespaces at end of a paragraph and on next empty or blank line should be also done on running this Perl regular expression replace. But please note that in this case the replace makes replaces which do not change anything at all if there are not trailing whitespaces at end of a paragraph and next line is an empty line.

焚却相思 2024-10-15 19:28:55

替换

\n\s*\n\s* 

\n\n

应该可以解决问题

Replacing

\n\s*\n\s* 

with

\n\n

should do the trick

紙鸢 2024-10-15 19:28:55

在 Vim 中,使用

:%!cat -s

我发现这是迄今为止删除多余空行的最简单方法。

In Vim, Using

:%!cat -s

I find this is the easiest way to delete extra empty line so far.

清晰传感 2024-10-15 19:28:55

我不确定 UltraEdit 可以让您在“替换”区域中摆脱什么,但如果您不能使用换行符(我以前遇到过这个问题)但可以使用捕获引用,这可能工作:

Find    : \s*(\r\n)\s*(\r\n)\s*\r\n
Replace : $1$2

未经过广泛测试,但似乎适用于您提供的示例。

I'm not sure what UltraEdit lets you get away with in the "replace" area, but if you cannot use a newline (I've had this problem before) but can use capture references, this might work:

Find    : \s*(\r\n)\s*(\r\n)\s*\r\n
Replace : $1$2

Not tested extensively, but seems to work on the sample you provided.

伴随着你 2024-10-15 19:28:55

请参阅此帖子了解导致问题的原因。据我了解,UltraEdit 正则表达式在字符级别(即行内)是贪婪的,但在行级别(粗略地说)是非贪婪的。我无法访问 UE,但我会尝试编写正则表达式,因此它必须匹配最后一个空行之后的具体内容。例如:

search:   (\r\n[ \t]*){2,}(\S)
replace:  $1$2

这会匹配并捕获行分隔符及其后面的任何水平空白的两个或多个实例,但它仅保留最后一个。 \S 应强制它保持匹配,直到找到至少包含一个非空白字符的行。

我承认我对这个解决方案没有太大信心; UltraEdit 的正则表达式支持因其基于行的架构而受到削弱。如果您想要一个能够正确处理正则表达式的编辑器,并且不想学习全新的正则表达式语法(如 vim 的语法),请获取 EditPadPro.

See this thread for what's causing the problem. As I understand it, UltraEdit regexes are greedy at the character level (i.e., within a line), but non-greedy at the line level (roughly speaking). I don't have access to UE, but I would try writing the regex so it has to match something concrete after the last blank line. For example:

search:   (\r\n[ \t]*){2,}(\S)
replace:  $1$2

This matches and captures two or more instances of a line separator and any horizontal whitespace that follows it, but it only retains the last one. The \S should force it to keep matching until it finds a line with at least one non-whitespace character.

I admit that I don't have a whole lot of confidence in this solution; UltraEdit's regex support is crippled by its line-based architecture. If you want an editor that does regexes right, and you don't want to learn a whole new regex syntax (like vim's), get EditPadPro.

一直在等你来 2024-10-15 19:28:55

还应该使用空白行上的空格

  • 搜索 - /\n^\s*\n/
  • 替换 - \n\n

Should also work with spaces on blank lines

  • Search - /\n^\s*\n/
  • Replace - \n\n
唐婉 2024-10-15 19:28:55

在我的 Intellij IDE 上搜索 \n\n 并将其替换为 \n

On my Intellij IDE what was search for \n\n and Replace it by \n

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文