当前位置：文江博客话题详情

Node.toprettyxml() 在 Python 中向 DOCTYPE 添加换行符

发布于 2024-12-23 20:00:46 字数 2722 浏览 3 评论 0 原文

当使用prettify时，我的DOCTYPE被分成三行。我怎样才能让它保持在一根线上？

“损坏的”输出：

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE smil
  PUBLIC '-//W3C//DTD SMIL 2.0//EN'
  'http://www.w3.org/2001/SMIL20/SMIL20.dtd'>
<smil xmlns="http://www.w3.org/2001/SMIL20/Language">
  <head>
    <meta base="rtmp://cp23636.edgefcs.net/ondemand"/>
  </head>
  <body>
    <switch>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_256.mp4" system-bitrate="336000"/>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_512.mp4" system-bitrate="592000"/>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_768.mp4" system-bitrate="848000"/>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_1128.mp4" system-bitrate="1208000"/>
    </switch>
  </body>
</smil>

脚本：

import csv
import sys
import os.path

from xml.etree import ElementTree
from xml.etree.ElementTree import Element, SubElement, Comment, tostring

from xml.dom import minidom

def prettify(doctype, elem):
    """Return a pretty-printed XML string for the Element.
    """
    rough_string = doctype + ElementTree.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ", encoding = 'utf-8')

doctype = '<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">'

video_data = ((256, 336000),
              (512, 592000),
              (768, 848000),
              (1128, 1208000))


with open(sys.argv[1], 'rU') as f:
    reader = csv.DictReader(f)
    for row in reader:
        root = Element('smil')
        root.set('xmlns', 'http://www.w3.org/2001/SMIL20/Language')
        head = SubElement(root, 'head')
        meta = SubElement(head, 'meta base="rtmp://cp23636.edgefcs.net/ondemand"')
        body = SubElement(root, 'body')

        switch_tag = ElementTree.SubElement(body, 'switch')

        for suffix, bitrate in video_data:
            attrs = {'src': ("mp4:soundcheck/{year}/{id}/{file_root_name}_{suffix}.mp4"
                             .format(suffix=str(suffix), **row)),
                     'system-bitrate': str(bitrate),
                     }
            ElementTree.SubElement(switch_tag, 'video', attrs)

        file_root_name = row["file_root_name"]
        year = row["year"]
        id = row["id"]
        path = year+'-'+id

        file_name = row['file_root_name']+'.smil'
        full_path = os.path.join(path, file_name)
        output = open(full_path, 'w')
        output.write(prettify(doctype, root))

原文

When using prettify my DOCTYPE is broken into three lines. How can I keep it on one line?

The "broken" output:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE smil
  PUBLIC '-//W3C//DTD SMIL 2.0//EN'
  'http://www.w3.org/2001/SMIL20/SMIL20.dtd'>
<smil xmlns="http://www.w3.org/2001/SMIL20/Language">
  <head>
    <meta base="rtmp://cp23636.edgefcs.net/ondemand"/>
  </head>
  <body>
    <switch>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_256.mp4" system-bitrate="336000"/>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_512.mp4" system-bitrate="592000"/>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_768.mp4" system-bitrate="848000"/>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_1128.mp4" system-bitrate="1208000"/>
    </switch>
  </body>
</smil>

The script:

import csv
import sys
import os.path

from xml.etree import ElementTree
from xml.etree.ElementTree import Element, SubElement, Comment, tostring

from xml.dom import minidom

def prettify(doctype, elem):
    """Return a pretty-printed XML string for the Element.
    """
    rough_string = doctype + ElementTree.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ", encoding = 'utf-8')

doctype = '<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">'

video_data = ((256, 336000),
              (512, 592000),
              (768, 848000),
              (1128, 1208000))


with open(sys.argv[1], 'rU') as f:
    reader = csv.DictReader(f)
    for row in reader:
        root = Element('smil')
        root.set('xmlns', 'http://www.w3.org/2001/SMIL20/Language')
        head = SubElement(root, 'head')
        meta = SubElement(head, 'meta base="rtmp://cp23636.edgefcs.net/ondemand"')
        body = SubElement(root, 'body')

        switch_tag = ElementTree.SubElement(body, 'switch')

        for suffix, bitrate in video_data:
            attrs = {'src': ("mp4:soundcheck/{year}/{id}/{file_root_name}_{suffix}.mp4"
                             .format(suffix=str(suffix), **row)),
                     'system-bitrate': str(bitrate),
                     }
            ElementTree.SubElement(switch_tag, 'video', attrs)

        file_root_name = row["file_root_name"]
        year = row["year"]
        id = row["id"]
        path = year+'-'+id

        file_name = row['file_root_name']+'.smil'
        full_path = os.path.join(path, file_name)
        output = open(full_path, 'w')
        output.write(prettify(doctype, root))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤星 2024-12-30 20:00:46

查看了您当前的脚本以及您就该主题提出的其他问题后，我认为您可以通过使用字符串操作构建 smil 文件来使您的生活变得更加简单。

文件中的几乎所有 xml 都是静态的。您需要担心正确处理的唯一数据是 video 标记的属性值。为此，标准库中有一个方便的函数可以完全满足您的需求： xml.sax.saxutils.quoteattr。

因此，考虑到这些要点，下面的脚本应该更容易使用：

import sys, os, csv
from xml.sax.saxutils import quoteattr

smil_header = '''\
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">
<smil xmlns="http://www.w3.org/2001/SMIL20/Language">
  <head>
    <meta base="rtmp://cp23636.edgefcs.net/ondemand"/>
  </head>
  <body>
    <switch>
'''
smil_video = '''\
      <video src=%s system-bitrate=%s/>
'''
smil_footer = '''\
    </switch>
  </body>
</smil>
'''

src_format = 'mp4:soundcheck/%(year)s/%(id)s/%(file_root_name)s_%(suffix)s.mp4'

video_data = (
    ('256', '336000'), ('512', '592000'),
    ('768', '848000'), ('1128', '1208000'),
    )

root = os.getcwd()
if len(sys.argv) > 2:
    root = sys.argv[2]

with open(sys.argv[1], 'rU') as stream:

    for row in csv.DictReader(stream):
        smil = [smil_header]
        for suffix, bitrate in video_data:
            row['suffix'] = suffix
            smil.append(smil_video % (
                quoteattr(src_format) % row, quoteattr(bitrate)
                ))
        smil.append(smil_footer)

        directory = os.path.join(root, '%(year)s-%(id)s' % row)
        try:
            os.makedirs(directory)
        except OSError:
            pass
        path = os.path.join(directory, '%(file_root_name)s.smil' % row)
        print ':: writing file:', path
        with open(path, 'wb') as stream:
            stream.write(''.join(smil))

Having looked over your current script and the other questions you've asked on this subject, I think you could make your life a lot simpler by building your smil files using string manipulation.

Almost all the xml in your files is static. The only data you need to worry about processing correctly is the attribute values for the video tag. And for that, there is a handy function in the standard library that does exactly what you want: xml.sax.saxutils.quoteattr.

So, with those points in mind, here is a script that should be a lot easier to work with:

import sys, os, csv
from xml.sax.saxutils import quoteattr

smil_header = '''\
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">
<smil xmlns="http://www.w3.org/2001/SMIL20/Language">
  <head>
    <meta base="rtmp://cp23636.edgefcs.net/ondemand"/>
  </head>
  <body>
    <switch>
'''
smil_video = '''\
      <video src=%s system-bitrate=%s/>
'''
smil_footer = '''\
    </switch>
  </body>
</smil>
'''

src_format = 'mp4:soundcheck/%(year)s/%(id)s/%(file_root_name)s_%(suffix)s.mp4'

video_data = (
    ('256', '336000'), ('512', '592000'),
    ('768', '848000'), ('1128', '1208000'),
    )

root = os.getcwd()
if len(sys.argv) > 2:
    root = sys.argv[2]

with open(sys.argv[1], 'rU') as stream:

    for row in csv.DictReader(stream):
        smil = [smil_header]
        for suffix, bitrate in video_data:
            row['suffix'] = suffix
            smil.append(smil_video % (
                quoteattr(src_format) % row, quoteattr(bitrate)
                ))
        smil.append(smil_footer)

        directory = os.path.join(root, '%(year)s-%(id)s' % row)
        try:
            os.makedirs(directory)
        except OSError:
            pass
        path = os.path.join(directory, '%(file_root_name)s.smil' % row)
        print ':: writing file:', path
        with open(path, 'wb') as stream:
            stream.write(''.join(smil))

回复收藏 0 原文

開玄 2024-12-30 20:00:46

我认为您至少有三个选择：

只需接受换行符。它们可能不受欢迎且丑陋，但它们是完全合法的。

添加一个拼凑，用更好的 DOCTYPE 替换坏的 DOCTYPE。也许是这样的：

导入重新

Pretty_xml = 美化（doctype，elem）
m = re.search("()", Pretty_xml, re.DOTALL)
丑陋的文档类型 = m.group() 
fixed_xml = Pretty_xml.replace（ugly_doctype，doctype）

使用功能更丰富的 XML 包。 lxml 浮现在脑海中；它主要与 ElementTree 兼容。通过使用 lxml 的 tostring 函数，您赢得了不需要prettify 函数，DOCTYPE 就会按照您的需要显示出来。示例：

from lxml import etree 

doctype = ''

XML = '<正文><开关><视频src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_256.mp4" 系统比特率="336000"/><视频 src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_512.mp4" 系统比特率="592000" /><视频src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_768.mp4" 系统比特率="848000"/><视频 src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_1128.mp4" system-bitrate="1208000"/>'

elem = etree.fromstring(XML)
打印 etree.tostring(elem, doctype=doctype, Pretty_print=True,
                     xml_declaration=True，编码=“utf-8”）

输出：

>


  <头>
    />
  
  <正文>
    <开关>

I think you have at least three options:

Just accept the newlines. They may be undesired and ugly but they are perfectly legal.

Add a kludge that replaces the bad DOCTYPE with a nicer one. Perhaps something like this:

import re

pretty_xml = prettify(doctype, elem)
m = re.search("(<!.*dtd'>)", pretty_xml, re.DOTALL)
ugly_doctype = m.group() 
fixed_xml = pretty_xml.replace(ugly_doctype, doctype)

Use a more feature-rich XML package. lxml springs to mind; it is mostly compatible with ElementTree. By using lxml's tostring function, you won't need the prettify function and the DOCTYPE comes out as you want it. Example:

from lxml import etree 

doctype = '<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">'

XML = '<smil xmlns="http://www.w3.org/2001/SMIL20/Language"><head><meta base="rtmp://cp23636.edgefcs.net/ondemand"/></head><body><switch><video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_256.mp4" system-bitrate="336000"/><video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_512.mp4" system-bitrate="592000"/><video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_768.mp4" system-bitrate="848000"/><video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_1128.mp4" system-bitrate="1208000"/></switch></body></smil>'

elem = etree.fromstring(XML)
print etree.tostring(elem, doctype=doctype, pretty_print=True,
                     xml_declaration=True, encoding="utf-8")

Output:

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">
<smil xmlns="http://www.w3.org/2001/SMIL20/Language">
  <head>
    <meta base="rtmp://cp23636.edgefcs.net/ondemand"/>
  </head>
  <body>
    <switch>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_256.mp4" system-bitrate="336000"/>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_512.mp4" system-bitrate="592000"/>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_768.mp4" system-bitrate="848000"/>
      <video src="mp4:soundcheck/1/clay_aiken/02_sc_ca_sorry_1128.mp4" system-bitrate="1208000"/>
    </switch>
  </body>
</smil>

回复收藏 0 原文

咽泪装欢 2024-12-30 20:00:46

我不相信可以删除由 Node.toprettyxml 为 DOCTYPE 生成的换行符，至少以 Pythonic 的方式是这样。

它是 DocumentType 类的 writexml 方法，该方法从 minidom 模块，插入有问题的换行符。插入的换行符字符串最初来自 Node.toprettyxml 方法，并通过 Document 类的 writexml 方法传递。相同的换行符字符串也传递给 Node 的各个其他子类的 writexml 方法。更改对 Node.prettyxml 的调用中的换行符字符串将更改整个输出 XML 中使用的换行符字符串。

有多种解决此问题的方法：修改 minidom 模块的本地副本、“monkey-patch”DocumentType 类的 writexml 方法或者对 XML 字符串进行后处理以删除不需要的换行符。然而，这些方法都不吸引我。

对我来说，最好的方法似乎就是保持现状。将 DOCTYPE 拆分为多行真的是一个严重的问题吗？