通过将元素列入白名单来清理 SVG 文档

发布于 2024-10-15 10:24:10 字数 2408 浏览 6 评论 0原文

我想从一些 SVG 文档中提取大约 20 种元素类型来形成一个新的 SVG。 rectcirclepolygontextpolyline,基本上是一组视觉效果零件在白名单中。 JavaScript、评论、动画和外部链接都需要删除。

我想到了三种方法:

  1. 正则表达式:我完全熟悉,并且显然不想去那里。
  2. PHP DOM:大概一年前使用过一次。
  3. XSLT:我刚才第一眼看到了。

如果 XSLT 是适合该工作的工具,那么我需要什么 xsl:stylesheet? 否则,您会使用哪种方法?

输入示例:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:svg="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" version="1.1" width="512" height="512" id="svg2">
<title>Mostly harmless</title>
  <metadata id="metadata7">Some metadata</metadata>

<script type="text/ecmascript">
<![CDATA[
alert('Hax!');
]]>
</script>
<style type="text/css">
<![CDATA[ svg{display:none} ]]>
</style>

  <defs id="defs4">
    <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/> 
  </defs>

  <g id="layer1">
  <a xlink:href="www.hax.ru">
    <use xlink:href="#my_circle" x="20" y="20"/>
    <use xlink:href="#my_circle" x="100" y="50"/>
  </a>
  </g>
  <text>
    <tspan>It was the best of times</tspan>
    <tspan dx="-140" dy="15">It was the worst of times.</tspan>
  </text>
</svg>

输出示例。显示完全相同的图像:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="512" height="512">
  <defs>
    <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/> 
  </defs>
  <g id="layer1">
    <use xlink:href="#my_circle" x="20" y="20"/>
    <use xlink:href="#my_circle" x="100" y="50"/>
  </g>
  <text>
    <tspan>It was the best of times</tspan>
    <tspan dx="-140" dy="15">It was the worst of times.</tspan>
  </text>
</svg>

keeper 元素的大致列表为: g、rect、circle、ellipse、line、polyline、polygon、path、text、tspan、tref、textpath、linearGradient+stop、radialGradient、defs、clippath ,路径。

如果不是特指 SVG tiny,那么肯定是 SVG lite。

I want to extract around 20 element types from some SVG documents to form a new SVG.
rect, circle, polygon, text, polyline, basically a set of visual parts are in the white list.
JavaScript, comments, animations and external links need to go.

Three methods come to mind:

  1. Regex: I'm completely familiar with, and would rather not go there obviously.
  2. PHP DOM: Used once perhaps a year ago.
  3. XSLT: Took my first look just now.

If XSLT is the right tool for the job, what xsl:stylesheet do I need?
Otherwise, which approach would you use?

Example input:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:svg="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" version="1.1" width="512" height="512" id="svg2">
<title>Mostly harmless</title>
  <metadata id="metadata7">Some metadata</metadata>

<script type="text/ecmascript">
<![CDATA[
alert('Hax!');
]]>
</script>
<style type="text/css">
<![CDATA[ svg{display:none} ]]>
</style>

  <defs id="defs4">
    <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/> 
  </defs>

  <g id="layer1">
  <a xlink:href="www.hax.ru">
    <use xlink:href="#my_circle" x="20" y="20"/>
    <use xlink:href="#my_circle" x="100" y="50"/>
  </a>
  </g>
  <text>
    <tspan>It was the best of times</tspan>
    <tspan dx="-140" dy="15">It was the worst of times.</tspan>
  </text>
</svg>

Example output. Displays exactly the same image:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="512" height="512">
  <defs>
    <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/> 
  </defs>
  <g id="layer1">
    <use xlink:href="#my_circle" x="20" y="20"/>
    <use xlink:href="#my_circle" x="100" y="50"/>
  </g>
  <text>
    <tspan>It was the best of times</tspan>
    <tspan dx="-140" dy="15">It was the worst of times.</tspan>
  </text>
</svg>

The approximate list of keeper elements is: g, rect, circle, ellipse, line, polyline, polygon, path, text, tspan, tref, textpath, linearGradient+stop, radialGradient, defs, clippath, path.

If not specifically SVG tiny, then certainly SVG lite.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

念﹏祤嫣 2024-10-22 10:24:10

此转换

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:s="http://www.w3.org/2000/svg"
 >
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="*">
  <xsl:element name="{name()}" namespace="{namespace-uri()}">
   <xsl:copy-of select="namespace::xlink"/>

   <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="@*">
  <xsl:attribute name="{name()}"
                 namespace="{namespace-uri()}">
   <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>

 <xsl:template match="s:a">
  <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match=
 "s:title|s:metadata|s:script|s:style|
  s:svg/@version|s:svg/@id"/>
</xsl:stylesheet>

应用于提供的 XML 文档时

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:cc="http://creativecommons.org/ns#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:svg="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink"
     xmlns="http://www.w3.org/2000/svg" version="1.1"
     width="512" height="512" id="svg2">
    <title>Mostly harmless</title>
    <metadata id="metadata7">Some metadata</metadata>
    <script type="text/ecmascript"><![CDATA[ alert('Hax!'); ]]></script>
    <style type="text/css"><![CDATA[ svg{display:none} ]]></style>
    <defs id="defs4">
        <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/>
    </defs>
    <g id="layer1">
        <a xlink:href="www.hax.ru">
            <use xlink:href="#my_circle" x="20" y="20"/>
            <use xlink:href="#my_circle" x="100" y="50"/>
        </a>
    </g>
    <text>
        <tspan>It was the best of times</tspan>
        <tspan dx="-140" dy="15">It was the worst of times.</tspan>
    </text>
</svg>

产生所需的正确结果

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="512" height="512">
   <defs id="defs4">
      <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/>
   </defs>
   <g id="layer1">
      <use xlink:href="#my_circle" x="20" y="20"/>
      <use xlink:href="#my_circle" x="100" y="50"/>
   </g>
   <text>
      <tspan>It was the best of times</tspan>
      <tspan dx="-140" dy="15">It was the worst of times.</tspan>
   </text>
</svg>

说明

  1. 两个模板,具有类似于身份规则的组合效果,匹配所有“白名单节点并本质上复制它们(仅消除不需要的命名空间节点)。

  2. 没有正文的模板匹配所有“黑名单”节点(元素和一些属性)。这些已被有效删除。

  3. 必须有与特定“灰名单”节点匹配的模板(在我们的例子中,模板与 s:a 匹配)。 “灰名单节点不会被完全删除——它可能会被重命名或以其他方式修改,或者至少其内容可能仍包含在输出中。

  4. 很可能随着你对问题的理解越来越清晰,这三个列表会不断增长,因此黑名单删除模板的匹配模式将被修改以适应新发现的黑名单元素。新发现的白名单节点根本不需要任何工作。仅处理新的灰名单元素(如果确实找到了)将需要更多的工作。

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:s="http://www.w3.org/2000/svg"
 >
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="*">
  <xsl:element name="{name()}" namespace="{namespace-uri()}">
   <xsl:copy-of select="namespace::xlink"/>

   <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="@*">
  <xsl:attribute name="{name()}"
                 namespace="{namespace-uri()}">
   <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>

 <xsl:template match="s:a">
  <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match=
 "s:title|s:metadata|s:script|s:style|
  s:svg/@version|s:svg/@id"/>
</xsl:stylesheet>

when applied on the provided XML document:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:cc="http://creativecommons.org/ns#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:svg="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink"
     xmlns="http://www.w3.org/2000/svg" version="1.1"
     width="512" height="512" id="svg2">
    <title>Mostly harmless</title>
    <metadata id="metadata7">Some metadata</metadata>
    <script type="text/ecmascript"><![CDATA[ alert('Hax!'); ]]></script>
    <style type="text/css"><![CDATA[ svg{display:none} ]]></style>
    <defs id="defs4">
        <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/>
    </defs>
    <g id="layer1">
        <a xlink:href="www.hax.ru">
            <use xlink:href="#my_circle" x="20" y="20"/>
            <use xlink:href="#my_circle" x="100" y="50"/>
        </a>
    </g>
    <text>
        <tspan>It was the best of times</tspan>
        <tspan dx="-140" dy="15">It was the worst of times.</tspan>
    </text>
</svg>

produces the wanted, correct result:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="512" height="512">
   <defs id="defs4">
      <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/>
   </defs>
   <g id="layer1">
      <use xlink:href="#my_circle" x="20" y="20"/>
      <use xlink:href="#my_circle" x="100" y="50"/>
   </g>
   <text>
      <tspan>It was the best of times</tspan>
      <tspan dx="-140" dy="15">It was the worst of times.</tspan>
   </text>
</svg>

Explanation:

  1. Two templates, having combined effect that is similar to the identity rule, match all "white-listed nodes and essentially copy them (only eliminating unwanted namespace nodes).

  2. A template with no body matches all "black-listed" nodes (elements and some attributes). These are effectively deleted.

  3. There must be templates that match specific "grey-listed" nodes (the template matching s:a in our case). A "grey-listed node will not be deleted completely -- it may be renamed or otherwize modified, or at least its contents may still be included in the output.

  4. It is likely that with your understanding of the problem becoming more and more clear, the three lists will continuously grow, so the match pattern for the black-list deleting template will be modified to accomodate the newly discovered black-listed elements. Newly-discovered white-listed nodes require no work at all. Only treating new grey-listed elements (if such are found at all) will require a little bit more work.

缪败 2024-10-22 10:24:10

Dimitre Novaatchev 的解决方案更加“干净”和优雅,但如果您需要一个“白名单”解决方案(因为您无法预测用户可能输入的内容需要“黑名单”),那么您需要充分充实“白名单”。

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:svg="http://www.w3.org/2000/svg">
    <xsl:output indent="yes" />

    <!--The "whitelist" template that will copy matched nodes forward and apply-templates
        for any attributes or child nodes -->
    <xsl:template match="svg:svg 
        | svg:defs  | svg:defs/text()
        | svg:g  | svg:g/text()
        | svg:a  | svg:a/text()
        | svg:use   | svg:use/text()
        | svg:rect  | svg:rect/text()
        | svg:circle  | svg:circle/text()
        | svg:ellipse  | svg:ellipse/text()
        | svg:line  | svg:line/text()
        | svg:polyline  | svg:polyline/text()
        | svg:polygon  | svg:polygon/text()
        | svg:path  | svg:path/text()
        | svg:text  | svg:text/text()
        | svg:tspan  | svg:tspan/text()
        | svg:tref  | svg:tref/text()
        | svg:textpath  | svg:textpath/text()
        | svg:linearGradient  | svg:linearGradient/text()
        | svg:radialGradient  | svg:radialGradient/text()
        | svg:clippath  | svg:clippath/text()
        | svg:text | svg:text/text()">
        <xsl:copy>
            <xsl:copy-of select="@*" />
            <xsl:apply-templates select="node()" />
        </xsl:copy>
    </xsl:template>

    <!--The "blacklist" template, which does nothing except apply templates for the 
        matched node's attributes and child nodes -->
    <xsl:template match="@* | node()">
        <xsl:apply-templates select="@* | node()" />
    </xsl:template>

</xsl:stylesheet>

Dimitre Novatchev's solution is more "clean" and elegant, but if you need a "whitelist" solution (because you can't predict what content users may input that you would need to "blacklist"), then you would need to fully flesh out the "whitelist".

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:svg="http://www.w3.org/2000/svg">
    <xsl:output indent="yes" />

    <!--The "whitelist" template that will copy matched nodes forward and apply-templates
        for any attributes or child nodes -->
    <xsl:template match="svg:svg 
        | svg:defs  | svg:defs/text()
        | svg:g  | svg:g/text()
        | svg:a  | svg:a/text()
        | svg:use   | svg:use/text()
        | svg:rect  | svg:rect/text()
        | svg:circle  | svg:circle/text()
        | svg:ellipse  | svg:ellipse/text()
        | svg:line  | svg:line/text()
        | svg:polyline  | svg:polyline/text()
        | svg:polygon  | svg:polygon/text()
        | svg:path  | svg:path/text()
        | svg:text  | svg:text/text()
        | svg:tspan  | svg:tspan/text()
        | svg:tref  | svg:tref/text()
        | svg:textpath  | svg:textpath/text()
        | svg:linearGradient  | svg:linearGradient/text()
        | svg:radialGradient  | svg:radialGradient/text()
        | svg:clippath  | svg:clippath/text()
        | svg:text | svg:text/text()">
        <xsl:copy>
            <xsl:copy-of select="@*" />
            <xsl:apply-templates select="node()" />
        </xsl:copy>
    </xsl:template>

    <!--The "blacklist" template, which does nothing except apply templates for the 
        matched node's attributes and child nodes -->
    <xsl:template match="@* | node()">
        <xsl:apply-templates select="@* | node()" />
    </xsl:template>

</xsl:stylesheet>
鹿港小镇 2024-10-22 10:24:10

svgfig 是完成这项工作的好工具。您可以加载 SVG 文件并挑选您喜欢的部分来制作新文档。或者您可以删除您不喜欢的部分并重新保存。

svgfig is a good tool for this job. You can load SVG files and pick out parts you like to make a new document. Or you can just remove parts you don't like and re-save.

爱的那么颓废 2024-10-22 10:24:10

作为已接受的 XSLT 答案的替代方案,您可以使用 RubyNokogiri

require 'nokogiri'
svg = Nokogiri::XML( IO.read( "myfile.svg" ) )
svg.xpath( '//*[not(name()="rect" or name()="circle" or ...)]' ).each do |node|
  node.remove
end
File.open( "myfile_clean.svg", "w" ) do |file|
  file << svg.to_xml
end

As an alternative to the accepted XSLT answer, you could use Ruby and Nokogiri:

require 'nokogiri'
svg = Nokogiri::XML( IO.read( "myfile.svg" ) )
svg.xpath( '//*[not(name()="rect" or name()="circle" or ...)]' ).each do |node|
  node.remove
end
File.open( "myfile_clean.svg", "w" ) do |file|
  file << svg.to_xml
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文