为什么 Jython 2.5.2 中的 xml.sax 解析器将两个字符的属性转换为元组?

发布于 2024-12-06 09:22:09 字数 2677 浏览 3 评论 0原文

当我在 Jython 2.5.2 下使用 xml.sax 进行解析时,当我在 XML 流中遇到 2 个字符的属性时,它会将属性名称转换为元组。无论对该名称进行多少强制,我都无法提取该属性的值。我尝试传递元组或将其转换为字符串并传递它。这两种情况都会导致:

Traceback (most recent call last):
  File "test.py", line 18, in startElement
    print '%s = %s' % (k, attrs.getValue(k))
  File "/usr/local/Cellar/jython/2.5.2/libexec/Lib/xml/sax/drivers2/drv_javasax.py", line 266, in getValue
    value = self._attrs.getValue(_makeJavaNsTuple(name))
TypeError: getValue(): 1st arg can't be coerced to String, int

我有一些可以运行的示例代码,它显示了问题:

import xml
from xml import sax
from xml.sax import handler
import traceback

class MyXMLHandler( handler.ContentHandler):
    def __init__(self):
        pass

    def startElement(self, name, attrs):
        for k in attrs.keys():            
            print 'type(k) = %s' % type(k)
            if isinstance(k, (list, tuple)):    
                k = ''.join(k)
            print 'type(k) = %s' % type(k)
            print 'k = %s' % k
            try:
                print '%s = %s' % (k, attrs.getValue(k))
            except Exception, e:
                print '\nError:'
                traceback.print_exc()
                print ''

if __name__ == '__main__':
    s = '<TAG A="0" AB="0" ABC="0"/>'
    print '%s' % s
    xml.sax.parseString(s, MyXMLHandler())
    exit(0)

运行时,AB 属性作为元组返回,但 A 和 < code>ABC 属性是 unicode 字符串,并且可以通过 属性对象。在 Jython 2.5.2 下,对我来说,此输出为:

>  jython test.py
<TAG A="0" AB="0" ABC="0"/>
type(k) = <type 'unicode'>
type(k) = <type 'unicode'>
k = A
A = 0
type(k) = <type 'tuple'>
type(k) = <type 'unicode'>
k = AB

Error:
Traceback (most recent call last):
  File "test.py", line 18, in startElement
    print '%s = %s' % (k, attrs.getValue(k))
  File "/usr/local/Cellar/jython/2.5.2/libexec/Lib/xml/sax/drivers2/drv_javasax.py", line 266, in getValue
    value = self._attrs.getValue(_makeJavaNsTuple(name))
TypeError: getValue(): 1st arg can't be coerced to String, int

type(k) = <type 'unicode'>
type(k) = <type 'unicode'>
k = ABC
ABC = 0

此代码在 OS X 上的 Python 2.7.2 和 CentOS 5.6 上的 Python 2.4.3 下正确运行。我挖掘了 Jython 错误,但找不到与此问题类似的任何内容。

这是已知的 Jython xml.sax 处理问题吗?或者我是否弄乱了 Handler 中与 2.5.2 不兼容的内容?


编辑:这似乎是 Jython 2.5.2 的错误。我找到了对它的引用: http://sourceforge.net/mailarchive/message.php? msg_id=27783080 -- 欢迎提出解决方法的建议。

When ever I encounter a 2-character attribute in my XML stream when parsing with xml.sax under Jython 2.5.2 it converts the attribute name to a tuple. No amount of coercion of that name allows me to extract the value for the attribute. I tried passing the tuple or converting it to a string and passing that. Both cases result in:

Traceback (most recent call last):
  File "test.py", line 18, in startElement
    print '%s = %s' % (k, attrs.getValue(k))
  File "/usr/local/Cellar/jython/2.5.2/libexec/Lib/xml/sax/drivers2/drv_javasax.py", line 266, in getValue
    value = self._attrs.getValue(_makeJavaNsTuple(name))
TypeError: getValue(): 1st arg can't be coerced to String, int

I've got some sample code you can run that shows the problem:

import xml
from xml import sax
from xml.sax import handler
import traceback

class MyXMLHandler( handler.ContentHandler):
    def __init__(self):
        pass

    def startElement(self, name, attrs):
        for k in attrs.keys():            
            print 'type(k) = %s' % type(k)
            if isinstance(k, (list, tuple)):    
                k = ''.join(k)
            print 'type(k) = %s' % type(k)
            print 'k = %s' % k
            try:
                print '%s = %s' % (k, attrs.getValue(k))
            except Exception, e:
                print '\nError:'
                traceback.print_exc()
                print ''

if __name__ == '__main__':
    s = '<TAG A="0" AB="0" ABC="0"/>'
    print '%s' % s
    xml.sax.parseString(s, MyXMLHandler())
    exit(0)

When run, the AB attribute is returned as a tuple but the A and ABC attributes are unicode strings and function properly with the get() method on the Attribute object. Under Jython 2.5.2 this outputs, for me:

>  jython test.py
<TAG A="0" AB="0" ABC="0"/>
type(k) = <type 'unicode'>
type(k) = <type 'unicode'>
k = A
A = 0
type(k) = <type 'tuple'>
type(k) = <type 'unicode'>
k = AB

Error:
Traceback (most recent call last):
  File "test.py", line 18, in startElement
    print '%s = %s' % (k, attrs.getValue(k))
  File "/usr/local/Cellar/jython/2.5.2/libexec/Lib/xml/sax/drivers2/drv_javasax.py", line 266, in getValue
    value = self._attrs.getValue(_makeJavaNsTuple(name))
TypeError: getValue(): 1st arg can't be coerced to String, int

type(k) = <type 'unicode'>
type(k) = <type 'unicode'>
k = ABC
ABC = 0

This code functions correctly under Python 2.7.2 on OS X and Python 2.4.3 on CentOS 5.6. I dug around Jython bugs but couldn't find anything similar to this issue.

Is it a known Jython xml.sax handling problem? Or have I messed up something in my Handler that's 2.5.2 incompatible?


Edit: this appears to be a Jython 2.5.2 bug. I found a reference to it: http://sourceforge.net/mailarchive/message.php?msg_id=27783080 -- suggestions for a workaround welcome.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

川水往事 2024-12-13 09:22:09

因此,这是 Jython 中报告的错误。我花了一些时间挖掘,但我在他们的错误存档中找到了它:

http://bugs.jython.org/issue1768

关于该错误的第二条评论提供了解决该问题的方法:使用 _attrs.getValue() 方法从属性列表中检索值。像这样:

attrs._attrs.getValue('id')

那么我重写的代码就可以工作:

print '%s = %s' % (k, attrs.getValue(k))

如果我将行:更改为:

print '%s = %s' % (k, attrs._attrs.getValue(k))

更灵活的works-in-python-and-jython解决方案是构建一个助手,

def _attrsGetValue(attrs, name, default=None):
    value = None
    if 'jython' in sys.executable.lower():
        value = attrs._attrs.getValue(name)
        if not name:
            value = default
    else:
        value = attrs.get(name, default)
    return value

So, this is a reported bug in Jython. It took some digging but I found it in their bug archive:

http://bugs.jython.org/issue1768

The second comment on the bug provides a work-around for the issue: use the _attrs.getValue() method to retrieve values off the attributes list. Like so:

attrs._attrs.getValue('id')

My re-written code works if I change the line:

print '%s = %s' % (k, attrs.getValue(k))

to:

print '%s = %s' % (k, attrs._attrs.getValue(k))

The more flexible works-in-python-and-jython solution is to build a helper:

def _attrsGetValue(attrs, name, default=None):
    value = None
    if 'jython' in sys.executable.lower():
        value = attrs._attrs.getValue(name)
        if not name:
            value = default
    else:
        value = attrs.get(name, default)
    return value
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文