使用 sed 处理 html 数据
我在将 sed 与 html 结合使用时遇到一些问题。以下示例说明了该问题:
HTML="<html><body>ENTRY</body><html>"
TABLE="<table></table>"
echo $HTML | sed -e s/ENTRY/$TABLE/
此输出:
sed: -e expression #1, char 18: unknown option to `s'
如果我省略 $TABLE
中的 /
,使其变为 它工作正常。
关于如何修复它有什么想法吗?
更新
这是一个可以重现该问题的示例:
template.html:
<html>
<body>
<table>
ENTRIES
</table>
</body>
</html>
gui_template:
<tr>
<td class="td_tut_title">TITLE</td>
<td class="td_tut_content">
<a href="../tutorials/GUI/FILENAME"><img src="img/bbp.png" alt="bbp" /></a>
</td>
</tr>
genhtml.sh:
#!/bin/bash
HTML=`cat template.html`
ENTRIES=`cat gui_template | sed -e s/FILENAME/test/ | sed -e s/TITLE/title/`
DELIM=$'\377'
echo $HTML | sed -e "s${DELIM}ENTRIES${DELIM}$ENTRIES${DELIM}"
Output:
~/htmlgen $ ./genhtml.sh
sed: -e expression #1, char 14: unterminated `s' command
I'm having some problems using sed in combination with html. The following sample illustrates the problem:
HTML="<html><body>ENTRY</body><html>"
TABLE="<table></table>"
echo $HTML | sed -e s/ENTRY/$TABLE/
This outputs:
sed: -e expression #1, char 18: unknown option to `s'
If I leave out the /
from $TABLE
so that it becomes <table><table>
it works ok.
Any ideas on how to fix it?
Update
Here's a sample that can reproduce the problem:
template.html:
<html>
<body>
<table>
ENTRIES
</table>
</body>
</html>
gui_template:
<tr>
<td class="td_tut_title">TITLE</td>
<td class="td_tut_content">
<a href="../tutorials/GUI/FILENAME"><img src="img/bbp.png" alt="bbp" /></a>
</td>
</tr>
genhtml.sh:
#!/bin/bash
HTML=`cat template.html`
ENTRIES=`cat gui_template | sed -e s/FILENAME/test/ | sed -e s/TITLE/title/`
DELIM=
Output:
~/htmlgen $ ./genhtml.sh
sed: -e expression #1, char 14: unterminated `s' command
\377'
echo $HTML | sed -e "s${DELIM}ENTRIES${DELIM}$ENTRIES${DELIM}"
Output:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
例如使用不同的分隔符@
Use different delimiter @ for example
在 FreeBSD 控制台上发出这些行:
结果:
Issuing these lines on FreeBSD console:
Result in:
您需要使用不能出现在 $TABLE 中的分隔符,如果 $TABLE 足够不可预测,这可能会很棘手。我建议使用非打印字符作为分隔符;找到一个不会出现在 $TABLE 中并破坏所有内容的方法会更容易。唯一的问题是它们更难输入,因此我建议将其放入变量中并在 sed 命令中使用它:
请注意,
$'...'
构造是 bash - 唯一的功能;如果您需要它在通用 sh 下运行,您将不得不做一些更混乱的事情,例如DELIM="$(printf "\377")"
。另外,我选择了 \377(即十六进制的 FF),因为它在 UTF-8 编码中是非法的,所以如果您在 HTML 中使用 UTF-8,它应该是安全的;如果您使用其他东西,例如 Windows-1252,那么 \177(“DEL”字符)可能是更安全的选择。哦,是的,如果您尝试使用 bash -x 进行调试,请做好喜剧准备。
You need to use a delimiter that can't appear in $TABLE, and if $TABLE is unpredictable enough this can be tricky. I'd suggest using a nonprinting character as a delimiter; it's easier to find one that's not going to show up in $TABLE and break everything. The only problem is they're harder to type in, so I'd suggest putting it in a variable and using that in the sed command:
Note that the
$'...'
construct is a bash-only feature; if you need this to run under generic sh you'll have to do something messier, likeDELIM="$(printf "\377")"
. Also, I chose \377 (that's FF in hex) because it's illegal in the UTF-8 encoding, so it should be safe if you're using UTF-8 for your HTML; if you're using something else, like Windows-1252, then \177 (the 'DEL' character) might be a safer choice.Oh, yeah, and if you ever try to debug this with
bash -x
, be prepared for comedy.