使用 XQuery 转换 HTML

发布于 2024-10-15 12:14:33 字数 5405 浏览 2 评论 0原文

我想获取 QTextEdit 编辑器生成的 HTML,并将其转换为更友好的内容,以便在实际网页中使用。不幸的是,作为 QTextEdit api 一部分的 HTML 生成器不是公开的,无法修改。当我内置了大部分需要的内容时,我宁愿不必创建一个所见即所得的 html 编辑器。

在 qt-interest 邮件列表的简短讨论中,有人提到通过 QtXmlPatterns 模块使用 XQuery。

对于编辑器输出的丑陋 HTML 的示例,它使用 表示粗体文本, 用于粗体和下划线文本等。这是一个示例:

<html>
  <head>
  </head>
  <body style=" font-family:'Lucida Grande'; font-size:14pt; font-weight:400; font-style:normal;">
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text</p>
    <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"></p>
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600;">bold text</span></p>
    <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-weight:600;"></p>
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-style:italic;">italics text</span></p>
    <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-style:italic;"></p>
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" text-decoration: underline;">underline text</span></p>
    <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"></p>
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600; text-decoration: underline;">bold underline text</span></p>
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600;">bold text </span><span style=" font-weight:600; text-decoration: underline;">bold underline text</span></p>  
  </body>
</html>

我想将其转换成类似的内容:

<body>
   <p>plain text</p>
   <p/>
   <p>plain text <b>bold text</b></p>
   <p/>
   <p>plain text <em>italics text</em></p>
   <p/>
   <p>plain text <u>underline text</u></p>
   <p/>
   <p>plain text <b>bold text <u>bold underline text</u></b></p>
</body>

我得到了大约 90 % 到达我需要去的地方。我可以正确转换前 4 个,其中每个 样式成员只有斜体、粗体或下划线属性之一。当跨度样式具有多个属性时,我遇到了麻烦。例如,如果 span 样式同时具有 font-weight:600text-decoration: underline

这是我迄今为止所拥有的 XQuery 代码:

declare function local:process_span_data($node as node())
{
    for $n in $node
    return (
        for $attr in $n/@style
        return (
            if(contains($attr, 'font-weight:600')) then (
                <b>{data($n)}</b>
            )
            else if(contains($attr, 'text-decoration: underline')) then (
                <u>{data($n)}</u>
            )
            else if (contains($attr, 'font-style:italic')) then (
                <em>{data($n)}</em>
            )
            else (
                data($n)
            )
        )
    )
};

declare function local:process_p_data($data as node()+)
{
    for $d in $data
    return (
        if ($d instance of text()) then $d
        else local:process_span_data($d)
    )
};

let $doc := doc('myfile.html')

for $body in $doc/html/body
return
    <body>
    {
    for $p in $body/p
    return (
        if (contains($p/@style, '-qt-paragraph-type:empty;')) then (
            <p />
        )
        else (
            if (count($p/*) = 0) then (
                <p>{data($p)}</p>
            )
            else (
                <p>
                {for $data in $p/node()
                return local:process_p_data($data)}
                </p>
            )
        )
    )
    }</body>

它给出了几乎正确的结果:

<body>
    <p>plain text</p>
    <p/>
    <p>plain text <b>bold text</b>
    </p>
    <p/>
    <p>plain text <em>italics text</em>
    </p>
    <p/>
    <p>plain text <u>underline text</u>
    </p>
    <p/>
    <p>plain text <b>bold underline text</b>
    </p>
    <p>plain text <b>bold text </b>
        <b>bold underline text</b> <!-- NOT UNDERLINED!! -->
    </p>
</body>

任何人都可以指出我实现所需输出的正确方向吗?预先感谢 XQuery n00b!

I'm wanting to take the HTML generated by a QTextEdit editor and transform it to something a little more friendly for use in an actual web page. Unfortunately, the HTML generator that is part of the QTextEdit api is not public and cannot be modified. I'd rather not have to create a WYSIWYG html editor when I have most of what I need built in.

In a short discussion on the qt-interest mailing list, someone mentioned using XQuery via the QtXmlPatterns module.

For an example of the ugly HTML the editor outputs, it uses <span style=" font-weight:600"> for bold text, <span style=" font-weight:600; text-decoration: underline"> for bold and underline text, etc. Here's a sample:

<html>
  <head>
  </head>
  <body style=" font-family:'Lucida Grande'; font-size:14pt; font-weight:400; font-style:normal;">
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text</p>
    <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"></p>
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600;">bold text</span></p>
    <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-weight:600;"></p>
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-style:italic;">italics text</span></p>
    <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-style:italic;"></p>
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" text-decoration: underline;">underline text</span></p>
    <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"></p>
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600; text-decoration: underline;">bold underline text</span></p>
    <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text <span style=" font-weight:600;">bold text </span><span style=" font-weight:600; text-decoration: underline;">bold underline text</span></p>  
  </body>
</html>

What I'd like to transform this into is something along the lines of this:

<body>
   <p>plain text</p>
   <p/>
   <p>plain text <b>bold text</b></p>
   <p/>
   <p>plain text <em>italics text</em></p>
   <p/>
   <p>plain text <u>underline text</u></p>
   <p/>
   <p>plain text <b>bold text <u>bold underline text</u></b></p>
</body>

I've gotten around 90% of the way to where I need to be. I can correctly transform the first 4 where each <span> style member has only one of the italics, bold, or underline attributes. I'm having trouble when the span style has multiple attributes. For instance, if the span style has both font-weight:600 and text-decoration: underline.

Here's my XQuery code that I have thus far:

declare function local:process_span_data($node as node())
{
    for $n in $node
    return (
        for $attr in $n/@style
        return (
            if(contains($attr, 'font-weight:600')) then (
                <b>{data($n)}</b>
            )
            else if(contains($attr, 'text-decoration: underline')) then (
                <u>{data($n)}</u>
            )
            else if (contains($attr, 'font-style:italic')) then (
                <em>{data($n)}</em>
            )
            else (
                data($n)
            )
        )
    )
};

declare function local:process_p_data($data as node()+)
{
    for $d in $data
    return (
        if ($d instance of text()) then $d
        else local:process_span_data($d)
    )
};

let $doc := doc('myfile.html')

for $body in $doc/html/body
return
    <body>
    {
    for $p in $body/p
    return (
        if (contains($p/@style, '-qt-paragraph-type:empty;')) then (
            <p />
        )
        else (
            if (count($p/*) = 0) then (
                <p>{data($p)}</p>
            )
            else (
                <p>
                {for $data in $p/node()
                return local:process_p_data($data)}
                </p>
            )
        )
    )
    }</body>

Which gives ALMOST the correct result:

<body>
    <p>plain text</p>
    <p/>
    <p>plain text <b>bold text</b>
    </p>
    <p/>
    <p>plain text <em>italics text</em>
    </p>
    <p/>
    <p>plain text <u>underline text</u>
    </p>
    <p/>
    <p>plain text <b>bold underline text</b>
    </p>
    <p>plain text <b>bold text </b>
        <b>bold underline text</b> <!-- NOT UNDERLINED!! -->
    </p>
</body>

Can anyone point me in the right direction of achieving my desired output? Thanks in advance from an XQuery n00b!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

黑色毁心梦 2024-10-22 12:14:33

您的方法是正确的,但 XQuery 转换逻辑有点非功能性范例方法。

看看这个。

xquery version '1.0-ml';
declare namespace mittai = "mittai";

declare function  mittai:parse-thru($n as node())
{
   for $z in $n/node() 
     return mittai:dispatch($z)
};

declare function mittai:dispatch($n as node())
{
   typeswitch($n)
      case text() return $n
      case element(p) return element{ fn:node-name($n) } {mittai:parse-thru($n)} 
      case element(span) return element{ fn:node-name($n) } {mittai:parse-thru($n)}       
      case element(body) return element{ fn:node-name($n) } {mittai:parse-thru($n)}       
      default return element{ fn:node-name($n) } {$n/@*, mittai:parse-thru($n)} 

};

let $d := doc('myfile.html')
return <html> {mittai:parse-thru($d)} </html>

your approach is correct but XQuery transformation logic is bit non-functional paradigm approach.

check out this.

xquery version '1.0-ml';
declare namespace mittai = "mittai";

declare function  mittai:parse-thru($n as node())
{
   for $z in $n/node() 
     return mittai:dispatch($z)
};

declare function mittai:dispatch($n as node())
{
   typeswitch($n)
      case text() return $n
      case element(p) return element{ fn:node-name($n) } {mittai:parse-thru($n)} 
      case element(span) return element{ fn:node-name($n) } {mittai:parse-thru($n)}       
      case element(body) return element{ fn:node-name($n) } {mittai:parse-thru($n)}       
      default return element{ fn:node-name($n) } {$n/@*, mittai:parse-thru($n)} 

};

let $d := doc('myfile.html')
return <html> {mittai:parse-thru($d)} </html>
音栖息无 2024-10-22 12:14:33

此 XQuery(使用通用标识函数):

declare variable $Prop as element()* :=
       (<prop name="em">font-style:italic</prop>,
        <prop name="strong">font-weight:600</prop>,
        <prop name="u">text-decoration:underline</prop>);

declare function local:copy($element as element()) {
  element {node-name($element)}
    {$element/@*,
     for $child in $element/node()
        return if ($child instance of element())
          then local:match($child)
          else $child
    }
};
declare function local:match($element as element()) {
  if ($element/self::span[@style])
  then local:replace($element)
  else local:copy($element)
};
declare function local:replace($element as element()) {
  let $prop := local:parse($element/@style)
  let $no-match := $prop[not(.=$Prop)]
  return element {node-name($element)}
           {$element/@* except $element/@style,
            if (exists($no-match))
            then attribute style
                   {string-join($no-match,';')}
            else (),
            local:nested($Prop[.=$prop]/@name,$element)}
};
declare function local:parse($string as xs:string) {
  for $property in tokenize($string,';')[.]
  return
    <prop>{
      replace(normalize-space($property),'( )?:( )?',':')
   }</prop>
};
declare function local:nested($names as xs:string*,
                              $element as element()) {
  if (exists($names))
  then element {$names[1]}
         {local:nested($names[position()>1],$element)}
  else for $child in $element/node()
       return if ($child instance of element())
          then local:match($child)
          else $child
};
local:match(*)

输出:

<html>
    <head>   </head>
    <body style=" font-family:'Lucida Grande'; font-size:14pt; font-weight:400; font-style:normal;">
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text</p>
        <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"/>
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text 
            <span>
                <strong>bold text</strong>
            </span>
        </p>
        <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-weight:600;"/>
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text 
            <span>
                <em>italics text</em>
            </span>
        </p>
        <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-style:italic;"/>
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text 
            <span>
                <u>underline text</u>
            </span>
        </p>
        <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"/>
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text 
            <span>
                <strong>
                    <u>bold underline text</u>
                </strong>
            </span>
        </p>
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text 
            <span>
                <strong>bold text </strong>
            </span>
            <span>
                <strong>
                    <u>bold underline text</u>
                </strong>
            </span>
        </p>
    </body>
</html>

This XQuery (using the common identity function):

declare variable $Prop as element()* :=
       (<prop name="em">font-style:italic</prop>,
        <prop name="strong">font-weight:600</prop>,
        <prop name="u">text-decoration:underline</prop>);

declare function local:copy($element as element()) {
  element {node-name($element)}
    {$element/@*,
     for $child in $element/node()
        return if ($child instance of element())
          then local:match($child)
          else $child
    }
};
declare function local:match($element as element()) {
  if ($element/self::span[@style])
  then local:replace($element)
  else local:copy($element)
};
declare function local:replace($element as element()) {
  let $prop := local:parse($element/@style)
  let $no-match := $prop[not(.=$Prop)]
  return element {node-name($element)}
           {$element/@* except $element/@style,
            if (exists($no-match))
            then attribute style
                   {string-join($no-match,';')}
            else (),
            local:nested($Prop[.=$prop]/@name,$element)}
};
declare function local:parse($string as xs:string) {
  for $property in tokenize($string,';')[.]
  return
    <prop>{
      replace(normalize-space($property),'( )?:( )?',':')
   }</prop>
};
declare function local:nested($names as xs:string*,
                              $element as element()) {
  if (exists($names))
  then element {$names[1]}
         {local:nested($names[position()>1],$element)}
  else for $child in $element/node()
       return if ($child instance of element())
          then local:match($child)
          else $child
};
local:match(*)

Output:

<html>
    <head>   </head>
    <body style=" font-family:'Lucida Grande'; font-size:14pt; font-weight:400; font-style:normal;">
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text</p>
        <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"/>
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text 
            <span>
                <strong>bold text</strong>
            </span>
        </p>
        <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-weight:600;"/>
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text 
            <span>
                <em>italics text</em>
            </span>
        </p>
        <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; font-style:italic;"/>
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text 
            <span>
                <u>underline text</u>
            </span>
        </p>
        <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"/>
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text 
            <span>
                <strong>
                    <u>bold underline text</u>
                </strong>
            </span>
        </p>
        <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">plain text 
            <span>
                <strong>bold text </strong>
            </span>
            <span>
                <strong>
                    <u>bold underline text</u>
                </strong>
            </span>
        </p>
    </body>
</html>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文