在字符串中多次插入双引号

发布于 2024-08-14 01:25:47 字数 258 浏览 6 评论 0 原文

我继承了一个带有数百行的平面 html 文件,与此类似:

<blink>
<td class="pagetxt bordercolor="#666666 width="203 colspan="3 height="20>
</blink>

到目前为止,我还无法找到一种为每个元素插入结束双引号的 sed 方法。可能需要 sed 以外的其他东西来执行此操作。谁能建议一个简单的方法来做到这一点? 谢谢

I have inherited a flat html file with a few hundred lines similar to this:

<blink>
<td class="pagetxt bordercolor="#666666 width="203 colspan="3 height="20>
</blink>

So far I have not been able to work out a sed way of inserting the closing double quotes for each element. Probably needs something other than sed to do this. Can anyone suggest an easy way to do this?
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

浅笑轻吟梦一曲 2024-08-21 01:25:47
sed -i 's/"\([^" >]\+\)\( \|>\)/"\1"\2/g' file.html

说明:

  • " - 前导双引号
  • \([^" >]\+\) - 非引号或空格或-'> ' 字符,分组(到组 1)
  • \( \|>\) - 终止空格或 '>',分组(到组 2)

我们将其替换为“”。

sed -i 's/"\([^" >]\+\)\( \|>\)/"\1"\2/g' file.html

Explanation:

  • " - leading double quote
  • \([^" >]\+\) - non-quote-or-space-or-'>' chars, grouped (into group 1)
  • \( \|>\) - terminating space or '>', grouped (into group 2)

We replace it with '"<group1>"<group2>'.

仙女山的月亮 2024-08-21 01:25:47

我想到的一个解决方案是解析文件的每一行以查找引用。当它找到一个时,激活一个标志来跟踪是否位于引用区域内,然后继续解析该行,直到它遇到第一个空格或 > > 。它出现并在其前面插入一个额外的 "。将标志关闭,然后继续在字符串中查找下一个引号。可能不是一个完美的解决方案,但也许是一个开始。

One solution that pops out at me is to parse through each line of the file looking for the quote. When it finds one, activate a flag to keep track of being inside a quoted area, then continue parsing the line until it hits the first space or > it comes to and inserts an additional " just before it. Flip the flag off, then continue through the string looking for the next quote. Probably not a perfect solution, but a start perhaps.

烟雨凡馨 2024-08-21 01:25:47

如果所有行共享相同的结构,您可以使用简单的文本编辑器全局替换

' bordercolor'

'" bordercolor'

(不带单引号)。这与字段值无关,并且对于其他字段的工作方式类似。您仍然需要做一些手动工作,但如果它只是一个大文件,这次我会硬着头皮,不会浪费更多时间来制定 sed 解决方案。

If all lines share the same structure, you could use a simple texteditor to globally replace

' bordercolor'

with

'" bordercolor'

(without single-quotes). This is then independend from the field values and works similarly for the other fields. You still have to do some manual work, but if it's just one big file, I'd bite the bullet this time and not waste probably more time working out a sed-solution.

花间憩 2024-08-21 01:25:47

如果您的文件很简单,则应该这样做 - 如果引号内有空格,则它将不起作用 - 在这种情况下,将需要更复杂的代码,但可以按照相同的方式完成。

#!usr/bin/env python

#change the "utf-8" bellow to your files encoding
data = open("<myfile.html>").read().decode("utf-8")
new_data = []

inside_tag = False
inside_quotes = False
for char in data:
    if char == "<":
        inside_tag = True
    if char == '"':
        inside_quotes = True
    if inside_tag and (char.isspace() or char==">") and inside_quotes:
        new_data.append('"')
        inside_quotes = False
    if char == ">":
        inside_tag = False
    new_data.append(char)


outputfile = open("<mynewfile.html>", "wt")

outputfile.write("".join(new_data).encode("utf-8"))
outputfile.close()

This should do if your file is simple - it won't work if you have whitespace which should be inside the quotes - in that case, a more complex code will be needed, but can be done along the same lines.

#!usr/bin/env python

#change the "utf-8" bellow to your files encoding
data = open("<myfile.html>").read().decode("utf-8")
new_data = []

inside_tag = False
inside_quotes = False
for char in data:
    if char == "<":
        inside_tag = True
    if char == '"':
        inside_quotes = True
    if inside_tag and (char.isspace() or char==">") and inside_quotes:
        new_data.append('"')
        inside_quotes = False
    if char == ">":
        inside_tag = False
    new_data.append(char)


outputfile = open("<mynewfile.html>", "wt")

outputfile.write("".join(new_data).encode("utf-8"))
outputfile.close()
网白 2024-08-21 01:25:47

用bash

for file in *
do
    flag=0
    while read -r line
    do
        case "$line" in
            *"<blink>"*)
                flag=1
                ;;
        esac
        if [ "$flag" -eq 1 ];then
            case "$line" in
                *class=\"pagetxt*">" )
                    line="${line%>}\">"
                    flag=0
                    ;;
            esac
        fi
        echo "${line}"
    done <"file" > temp
    mv temp "$file"
done

with bash

for file in *
do
    flag=0
    while read -r line
    do
        case "$line" in
            *"<blink>"*)
                flag=1
                ;;
        esac
        if [ "$flag" -eq 1 ];then
            case "$line" in
                *class=\"pagetxt*">" )
                    line="${line%>}\">"
                    flag=0
                    ;;
            esac
        fi
        echo "${line}"
    done <"file" > temp
    mv temp "$file"
done
清风挽心 2024-08-21 01:25:47

正则表达式是您的朋友:

查找: (="[^" >]+)([ >])

替换: \1"\2

完成此操作后,请确保也运行此命令:

查找:

替换: \n

(这不会修复一个元素上的多个类,例如 )

Regular expressions are your friend:

Find: (="[^" >]+)([ >])

Replace: \1"\2

After you've done that, make sure to run this one too:

Find: </?blink>

Replace: \n

(This won't fix more than one class on an element, like <element class="class1 class2 id="jimmy">)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文