验证python dataclasses中的详细类型

发布于 2025-02-07 18:40:49

Python 3.7已发行了一段时间,我想测试一些花式新 dataclass +打字功能。通过本机类型和键入模块的提示正确工作非常容易:

>>> import dataclasses
>>> import typing as ty
... @dataclasses.dataclass
... class Structure:
...     a_str: str
...     a_str_list: ty.List[str]
>>> my_struct = Structure(a_str='test', a_str_list=['t', 'e', 's', 't'])
>>> my_struct.a_str_list[0].  # IDE suggests all the string methods :)

但是我想尝试的另一件事是强迫类型提示作为运行时的条件,即可能存在 Dataclass 具有不正确类型的。它可以用 >

>>> @dataclasses.dataclass
... class Structure:
...     a_str: str
...     a_str_list: ty.List[str]
...     def validate(self):
...         ret = True
...         for field_name, field_def in self.__dataclass_fields__.items():
...             actual_type = type(getattr(self, field_name))
...             if actual_type != field_def.type:
...                 print(f"\t{field_name}: '{actual_type}' instead of '{field_def.type}'")
...                 ret = False
...         return ret
...     def __post_init__(self):
...         if not self.validate():
...             raise ValueError('Wrong types')

这种 validate 函数适用于本机类型和自定义类,而不是键入模块指定的函数:

>>> my_struct = Structure(a_str='test', a_str_list=['t', 'e', 's', 't'])
Traceback (most recent call last):
  a_str_list: '<class 'list'>' instead of 'typing.List[str]'
  ValueError: Wrong types

是否有更好的方法来验证一个键入 typed的列表?最好是一个不包括检查任何 list dict tuple set streat set 中的所有元素类型>那是 dataclass '属性。

在几年后重新访问这个问题,我现在移动使用 pydantic < /a>在我想验证通常仅定义数据级别的类的情况下。不过,我将留下当前接受的答案,因为它正确回答了原始问题并具有出色的教育价值。

Python 3.7 was released a while ago, and I wanted to test some of the fancy new dataclass+typing features. Getting hints to work right is easy enough, with both native types and those from the typing module:

>>> import dataclasses
>>> import typing as ty
... @dataclasses.dataclass
... class Structure:
...     a_str: str
...     a_str_list: ty.List[str]
>>> my_struct = Structure(a_str='test', a_str_list=['t', 'e', 's', 't'])
>>> my_struct.a_str_list[0].  # IDE suggests all the string methods :)

But one other thing that I wanted to try was forcing the type hints as conditions during runtime, i.e. it should not be possible for a dataclass with incorrect types to exist. It can be implemented nicely with __post_init__:

>>> @dataclasses.dataclass
... class Structure:
...     a_str: str
...     a_str_list: ty.List[str]
...     def validate(self):
...         ret = True
...         for field_name, field_def in self.__dataclass_fields__.items():
...             actual_type = type(getattr(self, field_name))
...             if actual_type != field_def.type:
...                 print(f"\t{field_name}: '{actual_type}' instead of '{field_def.type}'")
...                 ret = False
...         return ret
...     def __post_init__(self):
...         if not self.validate():
...             raise ValueError('Wrong types')

This kind of validate function works for native types and custom classes, but not those specified by the typing module:

>>> my_struct = Structure(a_str='test', a_str_list=['t', 'e', 's', 't'])
Traceback (most recent call last):
  a_str_list: '<class 'list'>' instead of 'typing.List[str]'
  ValueError: Wrong types

Is there a better approach to validate an untyped list with a typing-typed one? Preferably one that doesn't include checking the types of all elements in any list, dict, tuple, or set that is a dataclass' attribute.

Revisiting this question after a couple of years, I've now moved to use pydantic in cases where I want to validate classes that I'd normally just define a dataclass for. I'll leave my mark with the currently accepted answer though, since it correctly answers the original question and has outstanding educational value.

北城孤痞 2025-02-14 18:40:49

您应该使用 isInstance ,而不是检查类型平等。但是您不能使用参数化的通用类型( typing.list [int] )来做到这一点,您必须使用“通用”版本( typing.list.list )。因此,您将能够检查容器类型,而不能检查包含的类型。参数化的通用类型定义 __ Origin __ 您可以使用的属性。

与Python 3.6相反,在Python 3.7中,大多数类型的提示都具有有用的 __ Origin __ 属性。比较:

# Python 3.6
>>> import typing
>>> typing.List.__origin__
>>> typing.List[int].__origin__


# Python 3.7
>>> import typing
>>> typing.List.__origin__
<class 'list'>
>>> typing.List[int].__origin__
<class 'list'>

3.8引入 () 内省功能:

# Python 3.8
>>> import typing
>>> typing.get_origin(typing.List)
<class 'list'>
>>> typing.get_origin(typing.List[int])
<class 'list'>

值得注意的例外是键入。 ,任何 typing._specialform 均未定义 __ oincor __ 。幸运的是:

>>> isinstance(typing.Union, typing._SpecialForm)
>>> isinstance(typing.Union[int, str], typing._SpecialForm)
>>> typing.get_origin(typing.Union[int, str])

但是参数化类型定义 __ args __ 属性将其参数存储为元组; Python 3.8介绍 typing> typing.get_args(get_args() /a>检索它们的函数:

# Python 3.7
>>> typing.Union[int, str].__args__
(<class 'int'>, <class 'str'>)

# Python 3.8
>>> typing.get_args(typing.Union[int, str])
(<class 'int'>, <class 'str'>)


for field_name, field_def in self.__dataclass_fields__.items():
    if isinstance(field_def.type, typing._SpecialForm):
        # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)
        actual_type = field_def.type.__origin__
    except AttributeError:
        # In case of non-typing types (such as <class 'int'>, for instance)
        actual_type = field_def.type
    # In Python 3.8 one would replace the try/except with
    # actual_type = typing.get_origin(field_def.type) or field_def.type
    if isinstance(actual_type, typing._SpecialForm):
        # case of typing.Union[…] or typing.ClassVar[…]
        actual_type = field_def.type.__args__

    actual_value = getattr(self, field_name)
    if not isinstance(actual_value, actual_type):
        print(f"\t{field_name}: '{type(actual_value)}' instead of '{field_def.type}'")
        ret = False

这不是完美的,因为它不会说明 typing.classvar [typing.union [int,str]] 或<代码> typing.optional [typing.list [int]] ,但应该开始工作。


我不使用 __ post_init __ ,而是去装饰器路线:可以在带有类型提示的任何内容上使用,不仅是 dataclasses

import inspect
import typing
from contextlib import suppress
from functools import wraps

def enforce_types(callable):
    spec = inspect.getfullargspec(callable)

    def check_types(*args, **kwargs):
        parameters = dict(zip(spec.args, args))
        for name, value in parameters.items():
            with suppress(KeyError):  # Assume un-annotated parameters can be any type
                type_hint = spec.annotations[name]
                if isinstance(type_hint, typing._SpecialForm):
                    # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)
                    actual_type = type_hint.__origin__
                except AttributeError:
                    # In case of non-typing types (such as <class 'int'>, for instance)
                    actual_type = type_hint
                # In Python 3.8 one would replace the try/except with
                # actual_type = typing.get_origin(type_hint) or type_hint
                if isinstance(actual_type, typing._SpecialForm):
                    # case of typing.Union[…] or typing.ClassVar[…]
                    actual_type = type_hint.__args__

                if not isinstance(value, actual_type):
                    raise TypeError('Unexpected type for \'{}\' (expected {} but found {})'.format(name, type_hint, type(value)))

    def decorate(func):
        def wrapper(*args, **kwargs):
            check_types(*args, **kwargs)
            return func(*args, **kwargs)
        return wrapper

    if inspect.isclass(callable):
        callable.__init__ = decorate(callable.__init__)
        return callable

    return decorate(callable)


class Point:
    x: float
    y: float

def foo(bar: typing.Union[int, str]):

appart appart vely某些类型的提示为某些类型的提示为在上一节中建议的是,此方法仍然有一些缺点:

  • 使用字符串类型提示( class foo:def __init __(self:'foo'):Pass ):)未通过<考虑。代码> getullargspec :您可能需要使用 typing.get_type_hints instead;

  • 未验证不是适当类型的默认值:

     def foo(bar:int = none):

    不提出任何 typeerror 。您可能需要使用 bistect.signature.signature.bind

  • 变量的参数数量无法验证,因为您必须定义 def foo(*args:typing.sequence。开始,我们只能验证容器而不包含对象。


获得了一些知名度,并且a library 受其启发的重大启发。解除上述缺点正在成为现实。因此,我在键入模块中播放了更多内容,并将在此处提出一些发现和新方法。


>>> def foo(a: int, b: str, c: typing.List[str] = None):
...   pass
>>> typing.get_type_hints(foo)
{'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.Union[typing.List[str], NoneType]}

这是非常整洁的,并且绝对是对 Inspect.getfullargspec 的改进,所以更好地将其当作它还可以正确处理字符串作为类型提示。但是 typing.get_type_hints 将为其他类型的默认值纾困:

>>> def foo(a: int, b: str, c: typing.List[str] = 3):
...   pass
>>> typing.get_type_hints(foo)
{'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.List[str]}


接下来是键入提示用作 typing._specialform 的参数的提示,例如 typing.optional [typing.list.list [str]] typing.final [typing.union [typing.sequence,typing.mapping]] 。由于这些 __ args __ typing._specialform s始终是一个元组,因此可以递归地找到该元组中包含的提示的 __ arount __ oink 。结合上述检查,我们将需要过滤任何键入。_specialform左。


import inspect
import typing
from functools import wraps

def _find_type_origin(type_hint):
    if isinstance(type_hint, typing._SpecialForm):
        # case of typing.Any, typing.ClassVar, typing.Final, typing.Literal,
        # typing.NoReturn, typing.Optional, or typing.Union without parameters

    actual_type = typing.get_origin(type_hint) or type_hint  # requires Python 3.8
    if isinstance(actual_type, typing._SpecialForm):
        # case of typing.Union[…] or typing.ClassVar[…] or …
        for origins in map(_find_type_origin, typing.get_args(type_hint)):
            yield from origins
        yield actual_type

def _check_types(parameters, hints):
    for name, value in parameters.items():
        type_hint = hints.get(name, typing.Any)
        actual_types = tuple(_find_type_origin(type_hint))
        if actual_types and not isinstance(value, actual_types):
            raise TypeError(
                    f"Expected type '{type_hint}' for argument '{name}'"
                    f" but received type '{type(value)}' instead"

def enforce_types(callable):
    def decorate(func):
        hints = typing.get_type_hints(func)
        signature = inspect.signature(func)

        def wrapper(*args, **kwargs):
            parameters = dict(zip(signature.parameters, args))
            _check_types(parameters, hints)

            return func(*args, **kwargs)
        return wrapper

    if inspect.isclass(callable):
        callable.__init__ = decorate(callable.__init__)
        return callable

    return decorate(callable)

def enforce_strict_types(callable):
    def decorate(func):
        hints = typing.get_type_hints(func)
        signature = inspect.signature(func)

        def wrapper(*args, **kwargs):
            bound = signature.bind(*args, **kwargs)
            parameters = dict(zip(signature.parameters, bound.args))
            _check_types(parameters, hints)

            return func(*args, **kwargs)
        return wrapper

    if inspect.isclass(callable):
        callable.__init__ = decorate(callable.__init__)
        return callable

    return decorate(callable)

感谢@aran-fey 这帮助我改善了这个答案。

Instead of checking for type equality, you should use isinstance. But you cannot use a parametrized generic type (typing.List[int]) to do so, you must use the "generic" version (typing.List). So you will be able to check for the container type but not the contained types. Parametrized generic types define an __origin__ attribute that you can use for that.

Contrary to Python 3.6, in Python 3.7 most type hints have a useful __origin__ attribute. Compare:

# Python 3.6
>>> import typing
>>> typing.List.__origin__
>>> typing.List[int].__origin__


# Python 3.7
>>> import typing
>>> typing.List.__origin__
<class 'list'>
>>> typing.List[int].__origin__
<class 'list'>

Python 3.8 introduce even better support with the typing.get_origin() introspection function:

# Python 3.8
>>> import typing
>>> typing.get_origin(typing.List)
<class 'list'>
>>> typing.get_origin(typing.List[int])
<class 'list'>

Notable exceptions being typing.Any, typing.Union and typing.ClassVar… Well, anything that is a typing._SpecialForm does not define __origin__. Fortunately:

>>> isinstance(typing.Union, typing._SpecialForm)
>>> isinstance(typing.Union[int, str], typing._SpecialForm)
>>> typing.get_origin(typing.Union[int, str])

But parametrized types define an __args__ attribute that store their parameters as a tuple; Python 3.8 introduce the typing.get_args() function to retrieve them:

# Python 3.7
>>> typing.Union[int, str].__args__
(<class 'int'>, <class 'str'>)

# Python 3.8
>>> typing.get_args(typing.Union[int, str])
(<class 'int'>, <class 'str'>)

So we can improve type checking a bit:

for field_name, field_def in self.__dataclass_fields__.items():
    if isinstance(field_def.type, typing._SpecialForm):
        # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)
        actual_type = field_def.type.__origin__
    except AttributeError:
        # In case of non-typing types (such as <class 'int'>, for instance)
        actual_type = field_def.type
    # In Python 3.8 one would replace the try/except with
    # actual_type = typing.get_origin(field_def.type) or field_def.type
    if isinstance(actual_type, typing._SpecialForm):
        # case of typing.Union[…] or typing.ClassVar[…]
        actual_type = field_def.type.__args__

    actual_value = getattr(self, field_name)
    if not isinstance(actual_value, actual_type):
        print(f"\t{field_name}: '{type(actual_value)}' instead of '{field_def.type}'")
        ret = False

This is not perfect as it won't account for typing.ClassVar[typing.Union[int, str]] or typing.Optional[typing.List[int]] for instance, but it should get things started.

Next is the way to apply this check.

Instead of using __post_init__, I would go the decorator route: this could be used on anything with type hints, not only dataclasses:

import inspect
import typing
from contextlib import suppress
from functools import wraps

def enforce_types(callable):
    spec = inspect.getfullargspec(callable)

    def check_types(*args, **kwargs):
        parameters = dict(zip(spec.args, args))
        for name, value in parameters.items():
            with suppress(KeyError):  # Assume un-annotated parameters can be any type
                type_hint = spec.annotations[name]
                if isinstance(type_hint, typing._SpecialForm):
                    # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)
                    actual_type = type_hint.__origin__
                except AttributeError:
                    # In case of non-typing types (such as <class 'int'>, for instance)
                    actual_type = type_hint
                # In Python 3.8 one would replace the try/except with
                # actual_type = typing.get_origin(type_hint) or type_hint
                if isinstance(actual_type, typing._SpecialForm):
                    # case of typing.Union[…] or typing.ClassVar[…]
                    actual_type = type_hint.__args__

                if not isinstance(value, actual_type):
                    raise TypeError('Unexpected type for \'{}\' (expected {} but found {})'.format(name, type_hint, type(value)))

    def decorate(func):
        def wrapper(*args, **kwargs):
            check_types(*args, **kwargs)
            return func(*args, **kwargs)
        return wrapper

    if inspect.isclass(callable):
        callable.__init__ = decorate(callable.__init__)
        return callable

    return decorate(callable)

Usage being:

class Point:
    x: float
    y: float

def foo(bar: typing.Union[int, str]):

Appart from validating some type hints as suggested in the previous section, this approach still have some drawbacks:

  • type hints using strings (class Foo: def __init__(self: 'Foo'): pass) are not taken into account by inspect.getfullargspec: you may want to use typing.get_type_hints and inspect.signature instead;

  • a default value which is not the appropriate type is not validated:

     def foo(bar: int = None):

    does not raise any TypeError. You may want to use inspect.Signature.bind in conjuction with inspect.BoundArguments.apply_defaults if you want to account for that (and thus forcing you to define def foo(bar: typing.Optional[int] = None));

  • variable number of arguments can't be validated as you would have to define something like def foo(*args: typing.Sequence, **kwargs: typing.Mapping) and, as said at the beginning, we can only validate containers and not contained objects.


After this answer got some popularity and a library heavily inspired by it got released, the need to lift the shortcomings mentioned above is becoming a reality. So I played a bit more with the typing module and will propose a few findings and a new approach here.

For starter, typing is doing a great job in finding when an argument is optional:

>>> def foo(a: int, b: str, c: typing.List[str] = None):
...   pass
>>> typing.get_type_hints(foo)
{'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.Union[typing.List[str], NoneType]}

This is pretty neat and definitely an improvement over inspect.getfullargspec, so better use that instead as it can also properly handle strings as type hints. But typing.get_type_hints will bail out for other kind of default values:

>>> def foo(a: int, b: str, c: typing.List[str] = 3):
...   pass
>>> typing.get_type_hints(foo)
{'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.List[str]}

So you may still need extra strict checking, even though such cases feels very fishy.

Next is the case of typing hints used as arguments for typing._SpecialForm, such as typing.Optional[typing.List[str]] or typing.Final[typing.Union[typing.Sequence, typing.Mapping]]. Since the __args__ of these typing._SpecialForms is always a tuple, it is possible to recursively find the __origin__ of the hints contained in that tuple. Combined with the above checks, we will then need to filter any typing._SpecialForm left.

Proposed improvements:

import inspect
import typing
from functools import wraps

def _find_type_origin(type_hint):
    if isinstance(type_hint, typing._SpecialForm):
        # case of typing.Any, typing.ClassVar, typing.Final, typing.Literal,
        # typing.NoReturn, typing.Optional, or typing.Union without parameters

    actual_type = typing.get_origin(type_hint) or type_hint  # requires Python 3.8
    if isinstance(actual_type, typing._SpecialForm):
        # case of typing.Union[…] or typing.ClassVar[…] or …
        for origins in map(_find_type_origin, typing.get_args(type_hint)):
            yield from origins
        yield actual_type

def _check_types(parameters, hints):
    for name, value in parameters.items():
        type_hint = hints.get(name, typing.Any)
        actual_types = tuple(_find_type_origin(type_hint))
        if actual_types and not isinstance(value, actual_types):
            raise TypeError(
                    f"Expected type '{type_hint}' for argument '{name}'"
                    f" but received type '{type(value)}' instead"

def enforce_types(callable):
    def decorate(func):
        hints = typing.get_type_hints(func)
        signature = inspect.signature(func)

        def wrapper(*args, **kwargs):
            parameters = dict(zip(signature.parameters, args))
            _check_types(parameters, hints)

            return func(*args, **kwargs)
        return wrapper

    if inspect.isclass(callable):
        callable.__init__ = decorate(callable.__init__)
        return callable

    return decorate(callable)

def enforce_strict_types(callable):
    def decorate(func):
        hints = typing.get_type_hints(func)
        signature = inspect.signature(func)

        def wrapper(*args, **kwargs):
            bound = signature.bind(*args, **kwargs)
            parameters = dict(zip(signature.parameters, bound.args))
            _check_types(parameters, hints)

            return func(*args, **kwargs)
        return wrapper

    if inspect.isclass(callable):
        callable.__init__ = decorate(callable.__init__)
        return callable

    return decorate(callable)

Thanks to @Aran-Fey that helped me improve this answer.

别低头,皇冠会掉 2025-02-14 18:40:49


pydantic 可以为Dataclasses做完整的类型验证。 (入学:我构建了Pydantic)


from datetime import datetime
from pydantic.dataclasses import dataclass

class User:
    id: int
    name: str = 'John Doe'
    signup_ts: datetime = None

print(User(id=42, signup_ts='2032-06-21T12:00'))
User(id=42, name='John Doe', signup_ts=datetime.datetime(2032, 6, 21, 12, 0))

User(id='not int', signup_ts='2032-06-21T12:00')


pydantic.error_wrappers.ValidationError: 1 validation error
  value is not a valid integer (type=type_error.integer)

Just found this question.

pydantic can do full type validation for dataclasses out of the box. (admission: I built pydantic)

Just use pydantic's version of the decorator, the resulting dataclass is completely vanilla.

from datetime import datetime
from pydantic.dataclasses import dataclass

class User:
    id: int
    name: str = 'John Doe'
    signup_ts: datetime = None

print(User(id=42, signup_ts='2032-06-21T12:00'))
User(id=42, name='John Doe', signup_ts=datetime.datetime(2032, 6, 21, 12, 0))

User(id='not int', signup_ts='2032-06-21T12:00')

The last line will give:

pydantic.error_wrappers.ValidationError: 1 validation error
  value is not a valid integer (type=type_error.integer)
阪姬 2025-02-14 18:40:49

我创建了一个很小的Python库: https://github.com/tamuhey/tamuhey/tamuhey/dataclass_utils

为此, 可以将库应用于保存另一个数据级别(嵌套数据类)和嵌套容器类型的数据类(例如 tuple [list [dict ...

I created a tiny Python library for this purpose: https://github.com/tamuhey/dataclass_utils

This library can be applied for such dataclass that holds another dataclass (nested dataclass), and nested container type (like Tuple[List[Dict...)

葬花如无物 2025-02-14 18:40:49


For typing aliases, you must separately check the annotation.
I did like this:

