正则表达式：如何仅删除样式属性中的空格？

发布于 2024-09-18 14:50:17 字数 791 浏览 4 评论 0原文

在过去的一个小时里，我一直在尝试自己解决这个问题，但我只是没有取得任何成功，并且认为也许你可以提供帮助。

基本上我有一个 html 电子邮件文档，它有很多用于元素内联样式的 style 属性，看起来有点像

<th rowspan="10" style="font-weight: normal; vertical-align: top; text-align: left;" width="87">

现在我需要做的是删除所有空白，以便它变成

<th rowspan="10" style="font-weight:normal;vertical-align:top;text-align:left;" width="87">

： http://www.gskinner.com/RegExr/ 我发现此搜索表达式

/style="([\w ;:\-0-9]+)"/gi

仅匹配样式属性与内容，但我似乎无法弄清楚如何仅替换 $1 捕获组内的空白。

最终，我将在 TextMate 中运行该程序以进行项目范围的查找和替换，以防万一。

如果您没有注意到我是 RegEx 的新手，因此请尝试解释您的解决方案，以便我可以向他们学习以供将来参考。

非常感谢您的阅读，

Jannis

原文

for the last hour I have been trying to figure this out myself but I am just not having any success and thought maybe you could help.

Basically I am having a html email document that has a lot style attributes for inline styling of elements that look somewhat like

<th rowspan="10" style="font-weight: normal; vertical-align: top; text-align: left;" width="87">

Now what I need to do is remove all thie white space so that it becomes:

<th rowspan="10" style="font-weight:normal;vertical-align:top;text-align:left;" width="87">

Playing around in http://www.gskinner.com/RegExr/ I have found this search expression

/style="([\w ;:\-0-9]+)"/gi

that matches only the style attribute with contents but I can't seem to figure out how to replace the white space only within the $1 capture group.

Ultimately I will run this for a project wide find and replace in TextMate in case that matters.

In case you haven't noticed I am a complete newbie to RegEx so please try to explain your solution so I can learn from them for future reference.

Many thanks for reading,

Jannis

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

简单 2024-09-25 14:50:17

注意速记属性。例如，在

style="background: #fff; border: 1px solid #ccc"

... 中，您可以安全地删除前三个空格，但必须保留分隔 border: 速记值的组成部分的最后两个空格。只是为了好玩，这里有一个正则表达式，它删除与属性名称以及 : 和 ; 分隔符相邻的所有空格，但不在属性值内：

((?:\sstyle="|(?!\A)\G))\s*+([a-z]++(?>-[a-z]+)*+)\s*+:\s*+([^;]+?)\s*+;

替换为：

$1$2:$3;

测试它EditPad Pro，它将这个（353 个字符）：

<th rowspan="10" style="font-weight: normal; vertical-align: top; text-align: left;" width="87"><input title="Search" value="" size=57 style="background: #fff; border: 1px solid #ccc ; border-bottom-color: #999; border-right-color:#999;color: #000; font: 18px arial,sans-serif bold; height: 25px; margin: 0; padding: 5px 8px 0 6px; vertical-align: top">

...转换为这个（330 个字符）：

<th rowspan="10" style="font-weight:normal;vertical-align:top;text-align:left;" width="87"><input title="Search" value="" size=57 style="background:#fff;border:1px solid #ccc;border-bottom-color:#999;border-right-color:#999;color:#000;font:18px arial,sans-serif bold;height:25px;margin:0;padding:5px 8px 0 6px;vertical-align:top">

但我不建议您使用这个或任何正则表达式解决方案；我只是好奇它在 TextMate 中是否能像在 EditPad 中那样工作。（TextMate 使用 Oniguruma 正则表达式引擎，它支持所有必要的功能，因此它应该可以工作，但我无法自己测试它。）

但是您真正应该使用的是专用的 CSS 压缩器/最小化器/缩小器；那里有很多这样的东西。

Watch out for shorthand properties. For example, in

style="background: #fff; border: 1px solid #ccc"

...you can safely remove the first three spaces, but the last two, separating the components of the border: shorthand value, must remain. Just for fun, here's a regex that removes any whitespace that's adjacent to the property names and the : and ; delimiters, but not within property values:

((?:\sstyle="|(?!\A)\G))\s*+([a-z]++(?>-[a-z]+)*+)\s*+:\s*+([^;]+?)\s*+;

Replace with:

$1$2:$3;

Testing it in EditPad Pro, it converts this (353 characters):

<th rowspan="10" style="font-weight: normal; vertical-align: top; text-align: left;" width="87"><input title="Search" value="" size=57 style="background: #fff; border: 1px solid #ccc ; border-bottom-color: #999; border-right-color:#999;color: #000; font: 18px arial,sans-serif bold; height: 25px; margin: 0; padding: 5px 8px 0 6px; vertical-align: top">

...to this (330 characters):

<th rowspan="10" style="font-weight:normal;vertical-align:top;text-align:left;" width="87"><input title="Search" value="" size=57 style="background:#fff;border:1px solid #ccc;border-bottom-color:#999;border-right-color:#999;color:#000;font:18px arial,sans-serif bold;height:25px;margin:0;padding:5px 8px 0 6px;vertical-align:top">

But I'm not recommending that you use this, or any regex solution; I'm just curious as to whether it works in TextMate like it does in EditPad. (TextMate uses the Oniguruma regex engine, which supports all the necessary features, so it should work, but I'm not in a position to test it myself.)

But what you really should use for this job is a dedicated CSS compressor/minimizer/minifier; there are lots of them out there.

回复收藏 0 原文

慕烟庭风 2024-09-25 14:50:17

这是一个非常棘手的问题。找不到执行此操作的单个正则表达式，但您可以使用一系列正则表达式来执行此操作：

换行，以便 style="blabla" 出现在单独的行中。（用特殊字符串标记分隔行，以便稍后重新加入）。
对 style="blabla" 行进行操作。
重新加入线条
清除剩余的特殊标记。
这是我使用 sed 的方法（希望转换为 textmate regexp 样式很容易）：

sed -e 's/$.*$$style="[^"]*"$$ .*$/AAA\1\nBBB\2\nCCC\3/g' test.txt | sed '/BBB/s/ //g' | sed -e :a -e '$!N;s/\ nBBB//;ta' -e 'P;D' | sed -e :a -e '$!N;s/\nCCC//;ta' -e 'P;D' | sed -e 's/AAA //g'

说明：

sed -e 's/\(.*\)\(style="[^"]*"\)\(.*\)/AAA\1\nBBB\2\nCCC\3/g' test.txt

将 style="..." 的行分成 3 行，并用特殊字符串 AAA、BBB 和 CCC 进行标记。
它将导致文件如下所示：

AAA line before style
BBB line with style=""
CCC line after style

然后我们应用下一个正则表达式：

sed '/BBB/s/ //g'

删除以 BBB 开头的所有行（即样式行）中的空格

然后我们重新加入：

sed -e :a -e '$!N;s/\nBBB//;ta' -e 'P;D'

将以 BBB 开头的行附加到前面的行（并删除字符串 BBB )

然后：

sed -e :a -e '$!N;s/\nCCC//;ta' -e 'P;D'

将以 CCC 开头的行追加到前面的行中。

最后：

sed -e 's/AAA//g'

删除特殊字符串 AAA。

这肯定不是最理想的，并且可以使用正则表达式以外的方法轻松完成。（甚至还有自动格式化源代码的工具）。
无论如何，这就是我一个小时内所能做的一切。我确信存在一个正则表达式可以满足您的要求，只是很难找到它。

It's a really tough question. Couldn't find a single regexp which does this but you can use a sequence of regexps to do this:

break the lines so style="blabla" appears in a separate line. (mark the separated lines with special strings so you can rejoin later).
do manipulation on the style="blabla" lines.
rejoin the lines
clean remaining special markers.
here is how I did it with sed (hopefully the conversion to textmate regexp style is easy):

sed -e 's/$.*$$style="[^"]*"$$.*$/AAA\1\nBBB\2\nCCC\3/g' test.txt | sed '/BBB/s/ //g' | sed -e :a -e '$!N;s/\nBBB//;ta' -e 'P;D' | sed -e :a -e '$!N;s/\nCCC//;ta' -e 'P;D' | sed -e 's/AAA//g'

Explanation:

sed -e 's/\(.*\)\(style="[^"]*"\)\(.*\)/AAA\1\nBBB\2\nCCC\3/g' test.txt

break lines with style="...", into 3 lines, and mark with special strings AAA, BBB and CCC.
it will result in the file to be like this:

AAA line before style
BBB line with style=""
CCC line after style

Then we apply the next regexp:

sed '/BBB/s/ //g'

removes spaces in all lines starting with BBB (i.e. style lines)

Then we rejoin:

sed -e :a -e '$!N;s/\nBBB//;ta' -e 'P;D'

appends lines starting with BBB to the previous lines (and removes the string BBB)

And then:

sed -e :a -e '$!N;s/\nCCC//;ta' -e 'P;D'

appends lines starting with CCC to the previous lines.

Lastly:

sed -e 's/AAA//g'

removes special string AAA.

It's surely suboptimal and could be done with much less effort using methods other than regexps. (there are even tools for auto-formatting source code).
Anyway, this is all I could do in an hour. I'm sure that there exists a single regexp which does what you want, it's just difficult to find it.

回复收藏 0 原文

~没有更多了~