检测序列参数的正确方法?

发布于 2024-07-09 08:21:30 字数 301 浏览 8 评论 0原文

我想编写一个接受参数的函数,该参数可以是序列或单个值。 值的类型是 str、int 等,但我希望将其限制为硬编码列表。 换句话说,我想知道参数 X 是一个序列还是我必须转换为序列以避免以后出现特殊情况的东西。 我可以

在 (list, tuple) 中执行 type(X) ,

但可能还有其他我不知道的序列类型,并且没有公共基类。

-N。

编辑:请参阅下面我的“答案”,了解为什么大多数答案对我没有帮助。 也许你有更好的建议。

I want to write a function that accepts a parameter which can be either a sequence or a single value. The type of value is str, int, etc., but I don't want it to be restricted to a hardcoded list.
In other words, I want to know if the parameter X is a sequence or something I have to convert to a sequence to avoid special-casing later. I could do

type(X) in (list, tuple)

but there may be other sequence types I'm not aware of, and no common base class.

-N.

Edit: See my "answer" below for why most of these answers don't help me. Maybe you have something better to suggest.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

爱*していゐ 2024-07-16 08:21:30

从 2.6 开始,使用抽象基类

>>> import collections
>>> isinstance([], collections.Sequence)
True
>>> isinstance(0, collections.Sequence)
False

此外,可以自定义 ABC 以考虑异常,例如不将字符串视为序列。 这里有一个例子:

import abc
import collections

class Atomic(object):
    __metaclass__ = abc.ABCMeta
    @classmethod
    def __subclasshook__(cls, other):
        return not issubclass(other, collections.Sequence) or NotImplemented

Atomic.register(basestring)

注册后,Atomic类可以与isinstanceissubclass一起使用:

assert isinstance("hello", Atomic) == True

这仍然比硬编码列表好得多,因为您只需要注册规则的例外情况,代码的外部用户可以注册自己的例外情况。

请注意,在Python 3中,指定元类的语法发生了变化,并且basestring抽象超类被删除,这需要使用类似以下内容:

class Atomic(metaclass=abc.ABCMeta):
    @classmethod
    def __subclasshook__(cls, other):
        return not issubclass(other, collections.Sequence) or NotImplemented

Atomic.register(str)

如果需要,可以编写兼容 Python 2.6+ 和 3.x 的代码,但这样做需要使用稍微复杂的技术,动态创建所需的抽象基类,从而避免由于元类语法而导致的语法错误不同之处。 这本质上与 Benjamin Peterson 的 six 模块的with_metaclass()函数确实如此。

class _AtomicBase(object):
    @classmethod
    def __subclasshook__(cls, other):
        return not issubclass(other, collections.Sequence) or NotImplemented

class Atomic(abc.ABCMeta("NewMeta", (_AtomicBase,), {})):
    pass

try:
    unicode = unicode
except NameError:  # 'unicode' is undefined, assume Python >= 3
    Atomic.register(str)  # str includes unicode in Py3, make both Atomic
    Atomic.register(bytes)  # bytes will also be considered Atomic (optional)
else:
    # basestring is the abstract superclass of both str and unicode types
    Atomic.register(basestring)  # make both types of strings Atomic

在2.6之前的版本中,operator模块中有类型检查器。

>>> import operator
>>> operator.isSequenceType([])
True
>>> operator.isSequenceType(0)
False

As of 2.6, use abstract base classes.

>>> import collections
>>> isinstance([], collections.Sequence)
True
>>> isinstance(0, collections.Sequence)
False

Furthermore ABC's can be customized to account for exceptions, such as not considering strings to be sequences. Here an example:

import abc
import collections

class Atomic(object):
    __metaclass__ = abc.ABCMeta
    @classmethod
    def __subclasshook__(cls, other):
        return not issubclass(other, collections.Sequence) or NotImplemented

Atomic.register(basestring)

After registration the Atomic class can be used with isinstance and issubclass:

assert isinstance("hello", Atomic) == True

This is still much better than a hard-coded list, because you only need to register the exceptions to the rule, and external users of the code can register their own.

Note that in Python 3 the syntax for specifying metaclasses changed and the basestring abstract superclass was removed, which requires something like the following to be used instead:

class Atomic(metaclass=abc.ABCMeta):
    @classmethod
    def __subclasshook__(cls, other):
        return not issubclass(other, collections.Sequence) or NotImplemented

Atomic.register(str)

If desired, it's possible to write code which is compatible both both Python 2.6+ and 3.x, but doing so requires using a slightly more complicated technique which dynamically creates the needed abstract base class, thereby avoiding syntax errors due to the metaclass syntax difference. This is essentially the same as what Benjamin Peterson's six module'swith_metaclass()function does.

class _AtomicBase(object):
    @classmethod
    def __subclasshook__(cls, other):
        return not issubclass(other, collections.Sequence) or NotImplemented

class Atomic(abc.ABCMeta("NewMeta", (_AtomicBase,), {})):
    pass

try:
    unicode = unicode
except NameError:  # 'unicode' is undefined, assume Python >= 3
    Atomic.register(str)  # str includes unicode in Py3, make both Atomic
    Atomic.register(bytes)  # bytes will also be considered Atomic (optional)
else:
    # basestring is the abstract superclass of both str and unicode types
    Atomic.register(basestring)  # make both types of strings Atomic

In versions before 2.6, there are type checkers in theoperatormodule.

>>> import operator
>>> operator.isSequenceType([])
True
>>> operator.isSequenceType(0)
False
[浮城] 2024-07-16 08:21:30

以上所有问题
提到的方法是 str 是
被认为是一个序列(它是可迭代的,
getitem 等)但它是
通常被视为单个项目。

例如,一个函数可以接受
参数可以是文件名
或文件名列表。 什么是
该函数最Pythonic的方式
从后者中检测出第一个?

根据修改后的问题,听起来您想要的更像是:

def to_sequence(arg):
    ''' 
    determine whether an arg should be treated as a "unit" or a "sequence"
    if it's a unit, return a 1-tuple with the arg
    '''
    def _multiple(x):  
        return hasattr(x,"__iter__")
    if _multiple(arg):  
        return arg
    else:
        return (arg,)

>>> to_sequence("a string")
('a string',)
>>> to_sequence( (1,2,3) )
(1, 2, 3)
>>> to_sequence( xrange(5) )
xrange(5)

这不能保证处理所有类型,但它可以很好地处理您提到的情况,并且应该做正确的事情对于大多数内置类型。

使用它时,请确保接收此输出的任何内容都可以处理可迭代对象。

The problem with all of the above
mentioned ways is that str is
considered a sequence (it's iterable,
has getitem, etc.) yet it's
usually treated as a single item.

For example, a function may accept an
argument that can either be a filename
or a list of filenames. What's the
most Pythonic way for the function to
detect the first from the latter?

Based on the revised question, it sounds like what you want is something more like:

def to_sequence(arg):
    ''' 
    determine whether an arg should be treated as a "unit" or a "sequence"
    if it's a unit, return a 1-tuple with the arg
    '''
    def _multiple(x):  
        return hasattr(x,"__iter__")
    if _multiple(arg):  
        return arg
    else:
        return (arg,)

>>> to_sequence("a string")
('a string',)
>>> to_sequence( (1,2,3) )
(1, 2, 3)
>>> to_sequence( xrange(5) )
xrange(5)

This isn't guaranteed to handle all types, but it handles the cases you mention quite well, and should do the right thing for most of the built-in types.

When using it, make sure whatever receives the output of this can handle iterables.

旧人 2024-07-16 08:21:30

恕我直言,Python 的方法是将列表作为 *list 传递。 如:

myfunc(item)
myfunc(*items)

IMHO, the python way is to pass the list as *list. As in:

myfunc(item)
myfunc(*items)
前事休说 2024-07-16 08:21:30

序列描述如下:
https: //docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange

因此序列与可迭代对象不同。 我认为顺序必须实施
__getitem__,而可迭代对象必须实现__iter__
例如,字符串是序列,不实现 __iter__,xrange 对象是序列,不实现 __getslice__

但从你所看到的想要做的事情来看,我不确定你想要序列,而是可迭代的对象。
因此,如果您想要序列,请选择 hasattr("__getitem__", X) ,但如果您不需要字符串,请选择 hasattr("__iter__", X)

Sequences are described here:
https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange

So sequences are not the same as iterable objects. I think sequence must implement
__getitem__, whereas iterable objects must implement __iter__.
So for example string are sequences and don't implement __iter__, xrange objects are sequences and don't implement __getslice__.

But from what you seen to want to do, I'm not sure you want sequences, but rather iterable objects.
So go for hasattr("__getitem__", X) you want sequences, but go rather hasattr("__iter__", X) if you don't want strings for example.

笑着哭最痛 2024-07-16 08:21:30

在这种情况下,我更喜欢始终采用序列类型或始终采用标量。 字符串并不是唯一在此设置中表现不佳的类型; 相反,任何具有聚合用途并允许对其各部分进行迭代的类型都可能会出现错误行为。

In cases like this, I prefer to just always take the sequence type or always take the scalar. Strings won't be the only types that would behave poorly in this setup; rather, any type that has an aggregate use and allows iteration over its parts might misbehave.

似梦非梦 2024-07-16 08:21:30

最简单的方法是检查是否可以将其转换为迭代器。 即,

try:
    it = iter(X)
    # Iterable
except TypeError:
    # Not iterable

如果您需要确保它是可重新启动或随机访问序列(即不是生成器等),那么这种方法还不够。

正如其他人指出的那样,字符串也是可迭代的,因此如果您需要排除它们(如果递归通过项目尤其重要,因为 list(iter('a')) 再次给出 ['a'] ,那么您可能需要专门排除他们:

 if not isinstance(X, basestring)

The simplest method would be to check if you can turn it into an iterator. ie

try:
    it = iter(X)
    # Iterable
except TypeError:
    # Not iterable

If you need to ensure that it's a restartable or random access sequence (ie not a generator etc), this approach won't be sufficient however.

As others have noted, strings are also iterable, so if you need so exclude them (particularly important if recursing through items, as list(iter('a')) gives ['a'] again, then you may need to specifically exclude them with:

 if not isinstance(X, basestring)
情绪失控 2024-07-16 08:21:30

我是新来的,所以我不知道正确的方法是什么。 我想回答我的答案:

上述所有方法的问题在于 str 被认为是一个序列(它是可迭代的,具有 __getitem__等)但它通常被视为单个项目。

例如,函数可以接受一个参数,该参数可以是文件名或文件名列表。 该函数从后者中检测第一个的最 Pythonic 方法是什么?

我应该将此作为一个新问题发布吗? 修改一下原来的吗?

I'm new here so I don't know what's the correct way to do it. I want to answer my answers:

The problem with all of the above mentioned ways is that str is considered a sequence (it's iterable, has __getitem__, etc.) yet it's usually treated as a single item.

For example, a function may accept an argument that can either be a filename or a list of filenames. What's the most Pythonic way for the function to detect the first from the latter?

Should I post this as a new question? Edit the original one?

孤凫 2024-07-16 08:21:30

我认为我要做的是检查该对象是否具有某些表明它是序列的方法。 我不确定序列的组成部分是否有官方定义。 我能想到的最好的办法是,它必须支持切片。 所以您可以说:

is_sequence = '__getslice__' in dir(X)

您还可以检查您将要使用的特定功能。

正如 pi 在评论中指出的那样,一个问题是字符串是一个序列,但您可能不想将其视为一个序列。 您可以添加一个显式测试来判断该类型不是 str。

I think what I would do is check whether the object has certain methods that indicate it is a sequence. I'm not sure if there is an official definition of what makes a sequence. The best I can think of is, it must support slicing. So you could say:

is_sequence = '__getslice__' in dir(X)

You might also check for the particular functionality you're going to be using.

As pi pointed out in the comment, one issue is that a string is a sequence, but you probably don't want to treat it as one. You could add an explicit test that the type is not str.

獨角戲 2024-07-16 08:21:30

如果字符串是问题所在,请检测序列并过滤掉字符串的特殊情况:

def is_iterable(x):
  if type(x) == str:
    return False
  try:
    iter(x)
    return True
  except TypeError:
    return False

If strings are the problem, detect a sequence and filter out the special case of strings:

def is_iterable(x):
  if type(x) == str:
    return False
  try:
    iter(x)
    return True
  except TypeError:
    return False
画中仙 2024-07-16 08:21:30

你问错了问题。 您不会尝试检测 Python 中的类型;而是尝试检测类型。 您检测到行为。

  1. 编写另一个处理单个值的函数。 (我们称之为_use_single_val)。
  2. 编写一个处理序列参数的函数。 (我们称之为_use_sequence)。
  3. 编写第三个父函数来调用上面的两个函数。 (称之为 use_seq_or_val)。 用异常处理程序包围每个调用以捕获无效参数(即不是单个值或序列)。
  4. 编写单元测试以通过正确的测试 父函数的参数不正确,以确保它正确捕获异常。

    def _use_single_val(v):
        print v + 1  # this will fail if v is not a value type

    def _use_sequence(s):
        print s[0]   # this will fail if s is not indexable

    def use_seq_or_val(item):    
        try:
            _use_single_val(item)
        except TypeError:
            pass

        try:
            _use_sequence(item)
        except TypeError:
            pass

        raise TypeError, "item not a single value or sequence"

编辑:修改为处理问题中询问的“序列或单个值”。

You're asking the wrong question. You don't try to detect types in Python; you detect behavior.

  1. Write another function that handles a single value. (let's call it _use_single_val).
  2. Write one function that handles a sequence parameter. (let's call it _use_sequence).
  3. Write a third parent function that calls the two above. (call it use_seq_or_val). Surround each call with an exception handler to catch an invalid parameter (i.e. not single value or sequence).
  4. Write unit tests to pass correct & incorrect parameters to the parent function to make sure it catches the exceptions properly.

    def _use_single_val(v):
        print v + 1  # this will fail if v is not a value type

    def _use_sequence(s):
        print s[0]   # this will fail if s is not indexable

    def use_seq_or_val(item):    
        try:
            _use_single_val(item)
        except TypeError:
            pass

        try:
            _use_sequence(item)
        except TypeError:
            pass

        raise TypeError, "item not a single value or sequence"

EDIT: Revised to handle the "sequence or single value" asked about in the question.

っ〆星空下的拥抱 2024-07-16 08:21:30

修订后的答案:

我不知道你对“序列”的想法是否与Python手册中所说的“序列类型",但如果确实如此,您应该寻找 __Contains__ 方法。 这是Python用来实现检查“如果对象中有东西:”的方法

if hasattr(X, '__contains__'):
    print "X is a sequence"

我原来的答案:

我会检查你收到的对象是否实现了迭代器接口:

if hasattr(X, '__iter__'):
    print "X is a sequence"

对我来说,这是最接近的匹配你对序列的定义,因为这将允许你做类似的事情:

for each in X:
    print each

Revised answer:

I don't know if your idea of "sequence" matches what the Python manuals call a "Sequence Type", but in case it does, you should look for the __Contains__ method. That is the method Python uses to implement the check "if something in object:"

if hasattr(X, '__contains__'):
    print "X is a sequence"

My original answer:

I would check if the object that you received implements an iterator interface:

if hasattr(X, '__iter__'):
    print "X is a sequence"

For me, that's the closest match to your definition of sequence since that would allow you to do something like:

for each in X:
    print each
﹏半生如梦愿梦如真 2024-07-16 08:21:30

您可以在内置 len() 函数中传递参数并检查这是否会导致错误。 正如其他人所说,字符串类型需要特殊处理。

根据文档,len 函数可以接受序列(字符串、列表、元组)或字典。

您可以使用以下代码检查对象是否是字符串:

x.__class__ == "".__class__

You could pass your parameter in the built-in len() function and check whether this causes an error. As others said, the string type requires special handling.

According to the documentation the len function can accept a sequence (string, list, tuple) or a dictionary.

You could check that an object is a string with the following code:

x.__class__ == "".__class__
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文