当前位置：文江博客话题详情

sed double-quotes

在字符串中多次插入双引号

发布于 2024-08-14 01:25:47 字数 258 浏览 11 评论 0 原文

我继承了一个带有数百行的平面 html 文件，与此类似：

<blink>
<td class="pagetxt bordercolor="#666666 width="203 colspan="3 height="20>
</blink>

到目前为止，我还无法找到一种为每个元素插入结束双引号的 sed 方法。可能需要 sed 以外的其他东西来执行此操作。谁能建议一个简单的方法来做到这一点？谢谢

原文

I have inherited a flat html file with a few hundred lines similar to this:

<blink>
<td class="pagetxt bordercolor="#666666 width="203 colspan="3 height="20>
</blink>

So far I have not been able to work out a sed way of inserting the closing double quotes for each element. Probably needs something other than sed to do this. Can anyone suggest an easy way to do this?
Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅笑轻吟梦一曲 2024-08-21 01:25:47

sed -i 's/"\([^" >]\+\)\( \|>\)/"\1"\2/g' file.html

说明：

" - 前导双引号
\([^" >]\+\) - 非引号或空格或-'> ' 字符，分组（到组 1）
\( \|>\) - 终止空格或 '>'，分组（到组 2）

我们将其替换为“”“”。

sed -i 's/"\([^" >]\+\)\( \|>\)/"\1"\2/g' file.html

Explanation:

" - leading double quote
\([^" >]\+\) - non-quote-or-space-or-'>' chars, grouped (into group 1)
\( \|>\) - terminating space or '>', grouped (into group 2)

We replace it with '"<group1>"<group2>'.

回复收藏 0 原文

仙女山的月亮 2024-08-21 01:25:47

我想到的一个解决方案是解析文件的每一行以查找引用。当它找到一个时，激活一个标志来跟踪是否位于引用区域内，然后继续解析该行，直到它遇到第一个空格或 > > 。它出现并在其前面插入一个额外的 "。将标志关闭，然后继续在字符串中查找下一个引号。可能不是一个完美的解决方案，但也许是一个开始。

回复收藏 0 原文

烟雨凡馨 2024-08-21 01:25:47

如果所有行共享相同的结构，您可以使用简单的文本编辑器全局替换

' bordercolor'

为

'" bordercolor'

（不带单引号）。这与字段值无关，并且对于其他字段的工作方式类似。您仍然需要做一些手动工作，但如果它只是一个大文件，这次我会硬着头皮，不会浪费更多时间来制定 sed 解决方案。

If all lines share the same structure, you could use a simple texteditor to globally replace

' bordercolor'

with

'" bordercolor'

(without single-quotes). This is then independend from the field values and works similarly for the other fields. You still have to do some manual work, but if it's just one big file, I'd bite the bullet this time and not waste probably more time working out a sed-solution.

回复收藏 0 原文

花间憩 2024-08-21 01:25:47

如果您的文件很简单，则应该这样做 - 如果引号内有空格，则它将不起作用 - 在这种情况下，将需要更复杂的代码，但可以按照相同的方式完成。

#!usr/bin/env python

#change the "utf-8" bellow to your files encoding
data = open("<myfile.html>").read().decode("utf-8")
new_data = []

inside_tag = False
inside_quotes = False
for char in data:
    if char == "<":
        inside_tag = True
    if char == '"':
        inside_quotes = True
    if inside_tag and (char.isspace() or char==">") and inside_quotes:
        new_data.append('"')
        inside_quotes = False
    if char == ">":
        inside_tag = False
    new_data.append(char)


outputfile = open("<mynewfile.html>", "wt")

outputfile.write("".join(new_data).encode("utf-8"))
outputfile.close()

This should do if your file is simple - it won't work if you have whitespace which should be inside the quotes - in that case, a more complex code will be needed, but can be done along the same lines.

#!usr/bin/env python

#change the "utf-8" bellow to your files encoding
data = open("<myfile.html>").read().decode("utf-8")
new_data = []

inside_tag = False
inside_quotes = False
for char in data:
    if char == "<":
        inside_tag = True
    if char == '"':
        inside_quotes = True
    if inside_tag and (char.isspace() or char==">") and inside_quotes:
        new_data.append('"')
        inside_quotes = False
    if char == ">":
        inside_tag = False
    new_data.append(char)


outputfile = open("<mynewfile.html>", "wt")

outputfile.write("".join(new_data).encode("utf-8"))
outputfile.close()

回复收藏 0 原文

网白 2024-08-21 01:25:47

用bash

for file in *
do
    flag=0
    while read -r line
    do
        case "$line" in
            *"<blink>"*)
                flag=1
                ;;
        esac
        if [ "$flag" -eq 1 ];then
            case "$line" in
                *class=\"pagetxt*">" )
                    line="${line%>}\">"
                    flag=0
                    ;;
            esac
        fi
        echo "${line}"
    done <"file" > temp
    mv temp "$file"
done

with bash

for file in *
do
    flag=0
    while read -r line
    do
        case "$line" in
            *"<blink>"*)
                flag=1
                ;;
        esac
        if [ "$flag" -eq 1 ];then
            case "$line" in
                *class=\"pagetxt*">" )
                    line="${line%>}\">"
                    flag=0
                    ;;
            esac
        fi
        echo "${line}"
    done <"file" > temp
    mv temp "$file"
done

回复收藏 0 原文