在字符串中多次插入双引号
我继承了一个带有数百行的平面 html 文件,与此类似:
<blink>
<td class="pagetxt bordercolor="#666666 width="203 colspan="3 height="20>
</blink>
到目前为止,我还无法找到一种为每个元素插入结束双引号的 sed 方法。可能需要 sed 以外的其他东西来执行此操作。谁能建议一个简单的方法来做到这一点? 谢谢
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
说明:
"
- 前导双引号\([^" >]\+\)
- 非引号或空格或-'>
' 字符,分组(到组 1)\( \|>\)
- 终止空格或 '>
',分组(到组 2)我们将其替换为“
”“
”。Explanation:
"
- leading double quote\([^" >]\+\)
- non-quote-or-space-or-'>
' chars, grouped (into group 1)\( \|>\)
- terminating space or '>
', grouped (into group 2)We replace it with '
"<group1>"<group2>
'.我想到的一个解决方案是解析文件的每一行以查找引用。当它找到一个时,激活一个标志来跟踪是否位于引用区域内,然后继续解析该行,直到它遇到第一个空格或 > > 。它出现并在其前面插入一个额外的 "。将标志关闭,然后继续在字符串中查找下一个引号。可能不是一个完美的解决方案,但也许是一个开始。
One solution that pops out at me is to parse through each line of the file looking for the quote. When it finds one, activate a flag to keep track of being inside a quoted area, then continue parsing the line until it hits the first space or > it comes to and inserts an additional " just before it. Flip the flag off, then continue through the string looking for the next quote. Probably not a perfect solution, but a start perhaps.
如果所有行共享相同的结构,您可以使用简单的文本编辑器全局替换
为
(不带单引号)。这与字段值无关,并且对于其他字段的工作方式类似。您仍然需要做一些手动工作,但如果它只是一个大文件,这次我会硬着头皮,不会浪费更多时间来制定 sed 解决方案。
If all lines share the same structure, you could use a simple texteditor to globally replace
with
(without single-quotes). This is then independend from the field values and works similarly for the other fields. You still have to do some manual work, but if it's just one big file, I'd bite the bullet this time and not waste probably more time working out a sed-solution.
如果您的文件很简单,则应该这样做 - 如果引号内有空格,则它将不起作用 - 在这种情况下,将需要更复杂的代码,但可以按照相同的方式完成。
This should do if your file is simple - it won't work if you have whitespace which should be inside the quotes - in that case, a more complex code will be needed, but can be done along the same lines.
用bash
with bash
正则表达式是您的朋友:
查找:
(="[^" >]+)([ >])
替换:
\1"\2
完成此操作后,请确保也运行此命令:
查找:
替换:
\n
(这不会修复一个元素上的多个类,例如)
Regular expressions are your friend:
Find:
(="[^" >]+)([ >])
Replace:
\1"\2
After you've done that, make sure to run this one too:
Find:
</?blink>
Replace:
\n
(This won't fix more than one class on an element, like
<element class="class1 class2 id="jimmy">
)