使用 XQuery 转换 HTML
我想获取 QTextEdit 编辑器生成的 HTML,并将其转换为更友好的内容,以便在实际网页中使用。不幸的是,作为 QTextEdit api 一部分的 HTML 生成器不是公开的,无法修改。当我内置了大部分需要的内容时,我宁愿不必创建一个所见即所得的 html 编辑器。
在 qt-interest 邮件列表的简短讨论中,有人提到通过 QtXmlPatterns 模块使用 XQuery。
对于编辑器输出的丑陋 HTML 的示例,它使用 表示粗体文本,
用于粗体和下划线文本等。这是一个示例:
<html>
<head>
</head>
<body style=" font-family:'Lucida Grande'; font-size:14pt; font-weight:400; font-style:normal;">
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600;">bold text</span></p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-weight:600;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-style:italic;">italics text</span></p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-style:italic;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" text-decoration: underline;">underline text</span></p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600; text-decoration: underline;">bold underline text</span></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600;">bold text </span><span style=" font-weight:600; text-decoration: underline;">bold underline text</span></p>
</body>
</html>
我想将其转换成类似的内容:
<body>
<p>plain text</p>
<p/>
<p>plain text <b>bold text</b></p>
<p/>
<p>plain text <em>italics text</em></p>
<p/>
<p>plain text <u>underline text</u></p>
<p/>
<p>plain text <b>bold text <u>bold underline text</u></b></p>
</body>
我得到了大约 90 % 到达我需要去的地方。我可以正确转换前 4 个,其中每个 样式成员只有斜体、粗体或下划线属性之一。当跨度样式具有多个属性时,我遇到了麻烦。例如,如果 span 样式同时具有
font-weight:600
和 text-decoration: underline
。
这是我迄今为止所拥有的 XQuery 代码:
declare function local:process_span_data($node as node())
{
for $n in $node
return (
for $attr in $n/@style
return (
if(contains($attr, 'font-weight:600')) then (
<b>{data($n)}</b>
)
else if(contains($attr, 'text-decoration: underline')) then (
<u>{data($n)}</u>
)
else if (contains($attr, 'font-style:italic')) then (
<em>{data($n)}</em>
)
else (
data($n)
)
)
)
};
declare function local:process_p_data($data as node()+)
{
for $d in $data
return (
if ($d instance of text()) then $d
else local:process_span_data($d)
)
};
let $doc := doc('myfile.html')
for $body in $doc/html/body
return
<body>
{
for $p in $body/p
return (
if (contains($p/@style, '-qt-paragraph-type:empty;')) then (
<p />
)
else (
if (count($p/*) = 0) then (
<p>{data($p)}</p>
)
else (
<p>
{for $data in $p/node()
return local:process_p_data($data)}
</p>
)
)
)
}</body>
它给出了几乎正确的结果:
<body>
<p>plain text</p>
<p/>
<p>plain text <b>bold text</b>
</p>
<p/>
<p>plain text <em>italics text</em>
</p>
<p/>
<p>plain text <u>underline text</u>
</p>
<p/>
<p>plain text <b>bold underline text</b>
</p>
<p>plain text <b>bold text </b>
<b>bold underline text</b> <!-- NOT UNDERLINED!! -->
</p>
</body>
任何人都可以指出我实现所需输出的正确方向吗?预先感谢 XQuery n00b!
I'm wanting to take the HTML generated by a QTextEdit editor and transform it to something a little more friendly for use in an actual web page. Unfortunately, the HTML generator that is part of the QTextEdit api is not public and cannot be modified. I'd rather not have to create a WYSIWYG html editor when I have most of what I need built in.
In a short discussion on the qt-interest mailing list, someone mentioned using XQuery via the QtXmlPatterns module.
For an example of the ugly HTML the editor outputs, it uses <span style=" font-weight:600">
for bold text, <span style=" font-weight:600; text-decoration: underline">
for bold and underline text, etc. Here's a sample:
<html>
<head>
</head>
<body style=" font-family:'Lucida Grande'; font-size:14pt; font-weight:400; font-style:normal;">
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600;">bold text</span></p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-weight:600;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-style:italic;">italics text</span></p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-style:italic;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" text-decoration: underline;">underline text</span></p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600; text-decoration: underline;">bold underline text</span></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600;">bold text </span><span style=" font-weight:600; text-decoration: underline;">bold underline text</span></p>
</body>
</html>
What I'd like to transform this into is something along the lines of this:
<body>
<p>plain text</p>
<p/>
<p>plain text <b>bold text</b></p>
<p/>
<p>plain text <em>italics text</em></p>
<p/>
<p>plain text <u>underline text</u></p>
<p/>
<p>plain text <b>bold text <u>bold underline text</u></b></p>
</body>
I've gotten around 90% of the way to where I need to be. I can correctly transform the first 4 where each <span>
style member has only one of the italics, bold, or underline attributes. I'm having trouble when the span style has multiple attributes. For instance, if the span style has both font-weight:600
and text-decoration: underline
.
Here's my XQuery code that I have thus far:
declare function local:process_span_data($node as node())
{
for $n in $node
return (
for $attr in $n/@style
return (
if(contains($attr, 'font-weight:600')) then (
<b>{data($n)}</b>
)
else if(contains($attr, 'text-decoration: underline')) then (
<u>{data($n)}</u>
)
else if (contains($attr, 'font-style:italic')) then (
<em>{data($n)}</em>
)
else (
data($n)
)
)
)
};
declare function local:process_p_data($data as node()+)
{
for $d in $data
return (
if ($d instance of text()) then $d
else local:process_span_data($d)
)
};
let $doc := doc('myfile.html')
for $body in $doc/html/body
return
<body>
{
for $p in $body/p
return (
if (contains($p/@style, '-qt-paragraph-type:empty;')) then (
<p />
)
else (
if (count($p/*) = 0) then (
<p>{data($p)}</p>
)
else (
<p>
{for $data in $p/node()
return local:process_p_data($data)}
</p>
)
)
)
}</body>
Which gives ALMOST the correct result:
<body>
<p>plain text</p>
<p/>
<p>plain text <b>bold text</b>
</p>
<p/>
<p>plain text <em>italics text</em>
</p>
<p/>
<p>plain text <u>underline text</u>
</p>
<p/>
<p>plain text <b>bold underline text</b>
</p>
<p>plain text <b>bold text </b>
<b>bold underline text</b> <!-- NOT UNDERLINED!! -->
</p>
</body>
Can anyone point me in the right direction of achieving my desired output? Thanks in advance from an XQuery n00b!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的方法是正确的,但 XQuery 转换逻辑有点非功能性范例方法。
看看这个。
your approach is correct but XQuery transformation logic is bit non-functional paradigm approach.
check out this.
此 XQuery(使用通用标识函数):
输出:
This XQuery (using the common identity function):
Output: