
发布于 2025-01-16 21:36:15 字数 673 浏览 1 评论 0原文

假设您想像这样包装 dataclass 装饰器:

from dataclasses import dataclass

def something_else(klass):
    return klass

def my_dataclass(klass):
    return something_else(dataclass(klass))

How should my_dataclass 和/或 something_else 注释以指示返回类型是数据类? 请参阅以下示例,了解内置 @dataclass 如何工作,但自定义 @my_dataclass 则不然:

class TestA:
    a: int
    b: str

TestA(0, "") # fine

class TestB:
    a: int
    b: str

TestB(0, "") # error: Too many arguments for "TestB" (from mypy)

Say you want to wrap the dataclass decorator like so:

from dataclasses import dataclass

def something_else(klass):
    return klass

def my_dataclass(klass):
    return something_else(dataclass(klass))

How should my_dataclass and/or something_else be annotated to indicate that the return type is a dataclass?
See the following example on how the builtin @dataclass works but a custom @my_dataclass does not:

class TestA:
    a: int
    b: str

TestA(0, "") # fine

class TestB:
    a: int
    b: str

TestB(0, "") # error: Too many arguments for "TestB" (from mypy)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。



需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。


━╋う一瞬間旳綻放 2025-01-23 21:36:15

PEP 681 之前,没有可行的方法可以做到这一点。

dataclass 不描述类型,而是描述转换。其实际效果无法用 Python 的类型系统来表达 - @dataclassMyPy 插件 用于检查代码,而不仅仅是类型。这是在特定装饰器上触发而不了解它们的实现。

dataclass_makers: Final = {

虽然可以提供自定义 MyPy 插件,但这通常超出了大多数项目的范围。 PEP 681 (Python 3.11) 添加了一个通用的“此装饰器的行为类似于 @dataclass”标记,可用于从注释到字段的所有转换器。

PEP 681 可通过 typing_extensions 适用于早期的 Python 版本。


对于纯类型替代方案,定义自定义装饰器来获取数据类并修改它。数据类可以通过其 __dataclass_fields__ 字段来标识。

from typing import Protocol, Any, TypeVar, Type, ClassVar
from dataclasses import Field

class DataClass(Protocol):
    __dataclass_fields__: ClassVar[dict[str, Field[Any]]]

DC = TypeVar("DC", bound=DataClass)

def my_dataclass(klass: Type[DC]) -> Type[DC]:

这使得类型检查器能够理解并验证是否需要 dataclass 类。

class TestB:
    a: int
    b: str

TestB(0, "")  # note: Revealed type is "so_test.TestB"

class TestC:  # error: Value of type variable "DC" of "my_dataclass" cannot be "TestC"
    a: int
    b: str


PEP 681 dataclass_transform 装饰器是其他装饰器的标记,用于显示它们的行为“类似于”@dataclass。为了匹配@dataclass的行为,必须使用field_specifiers来指示字段以相同的方式表示。

from typing import dataclass_transform, TypeVar, Type
import dataclasses

T = TypeVar("T")

    field_specifiers=(dataclasses.Field, dataclasses.field),
def my_dataclass(klass: Type[T]) -> Type[T]:
    return something_else(dataclasses.dataclass(klass))

自定义数据类装饰器可以将所有关键字视为@dataclassdataclass_transform 可用于标记它们各自的默认值,即使装饰器本身不接受作为关键字也是如此。

There is no feasible way to do this prior to PEP 681.

A dataclass does not describe a type but a transformation. The actual effects of this cannot be expressed by Python's type system – @dataclass is handled by a MyPy Plugin which inspects the code, not just the types. This is triggered on specific decorators without understanding their implementation.

dataclass_makers: Final = {

While it is possible to provide custom MyPy plugins, this is generally out of scope for most projects. PEP 681 (Python 3.11) adds a generic "this decorator behaves like @dataclass"-marker that can be used for all transformers from annotations to fields.

PEP 681 is available to earlier Python versions via typing_extensions.

Enforcing dataclasses

For a pure typing alternative, define your custom decorator to take a dataclass and modify it. A dataclass can be identified by its __dataclass_fields__ field.

from typing import Protocol, Any, TypeVar, Type, ClassVar
from dataclasses import Field

class DataClass(Protocol):
    __dataclass_fields__: ClassVar[dict[str, Field[Any]]]

DC = TypeVar("DC", bound=DataClass)

def my_dataclass(klass: Type[DC]) -> Type[DC]:

This allows the type checker to understand and verify that a dataclass class is needed.

class TestB:
    a: int
    b: str

TestB(0, "")  # note: Revealed type is "so_test.TestB"

class TestC:  # error: Value of type variable "DC" of "my_dataclass" cannot be "TestC"
    a: int
    b: str

Custom dataclass-like decorators

The PEP 681 dataclass_transform decorator is a marker for other decorators to show that they act "like" @dataclass. In order to match the behaviour of @dataclass, one has to use field_specifiers to indicate that fields are denoted the same way.

from typing import dataclass_transform, TypeVar, Type
import dataclasses

T = TypeVar("T")

    field_specifiers=(dataclasses.Field, dataclasses.field),
def my_dataclass(klass: Type[T]) -> Type[T]:
    return something_else(dataclasses.dataclass(klass))

It is possible for the custom dataclass decorator to take all keywords as @dataclass. dataclass_transform can be used to mark their respective defaults, even when not accepted as keywords by the decorator itself.

止于盛夏 2025-01-23 21:36:15

问题是 mypy 理解元类装饰器及其魔力

class Test:
    a: int

# Mypy: Revealed type is "def (self: Test, a: builtins.int)"

class Test2:
    a: int

#Mypy: Revealed type is "def (self: builtins.object)"


The problem is that mypy understands the metaclass decorator and its magic
about __init__, but does not understand the dataclass function:

class Test:
    a: int

# Mypy: Revealed type is "def (self: Test, a: builtins.int)"

class Test2:
    a: int

#Mypy: Revealed type is "def (self: builtins.object)"

As you can see, the __init__ method of Test2 does not accept any arguments.

烟花易冷人易散 2025-01-23 21:36:15



  1. 向类型检查器通知 __init__ 签名
  2. 向类型检查器通知数据类。
  3. 显式通过类型提示注释数据类。
  4. 以通用方式执行 3。

1 & 2 由 dataclassdataclass_transform 装饰器完成,或者更确切地说,它们是由类型检查器处理的。然而,对于人类来说,这在类型提示或注释中不可见,他们需要查看装饰器。

2& 3 仅隐式可以通过 typeshed is_dataclass(obj) -> TypeIs[DataclassInstance] 形成条件交集类型。

4 对于人类来说只能通过 AnnotatedTypeAliasType 完成(请参阅此答案的底部)。


如果 Python 中的显式 Intersection 类型可以在 DataclassInstance 协议(请参阅本文结尾)和泛型之间形成子类:(type[T] )->类型[DataclassInstance & T]

def my_dataclass(klass : type[T]):
# -> type[<subclass of DataclassInstance and T.bound>]`
    obj = cast(Any, do_something(dataclass(klass)))
    assert is_dataclass(obj) and issubclass(obj, klass)
    return obj

类型检查器可以使用的上述类型保护是 python 的 typeshed 包提供的重载。
typeshed 是大多数类型检查器使用的官方库,它提供超出正常类型提示的类型提示带注释的模块的数量。 _typeshed 类型提示在运行时不可用,必须适当保护,例如 if TYPE_CHECKING,通过将类型提示封装在字符串,或使用 from __future__ import 注解

您的 IDE 类型检查器可能已经可以使用它,但如果需要,您可以通过以下方式安装它:

pip install types-dataclasses


from typing import TYPE_CHECKING # not necessary in a stub file
if TYPE_CHECKING:                # not necessary in a stub file
   from _typeshed import DataclassInstance

# at best put this with @overload in a stub file!
    field_specifiers=(dataclasses.Field, dataclasses.field),
def my_dataclass(klass) -> type["DataclassInstance"]

class Foo:
  x : int
  y : list[int]

foo = Foo(1, [2]) # ok

class Baz(Foo):
  z : str

baz = Baz(1, [2], "b") # ok, because of dataclass_transform

if isinstance(bar, Foo): # deemed unnecessary
    assert_never(bar)  # ok


# Do not in a functional way

class Bar: ...

Bar_dc = my_dataclass(Bar)
assert is_dataclass(Bar_dc)    # error! Does not check for `type["DataclassInstance"]`
assert issubclass(Bar_dc, Bar) # Not True for type-checkers!

# ------

# Check for dataclass outside is the exact type is known.
if is_dataclass(foo):
    print("yes")      # reachable
    assert_never(foo) # pyright: error, mypy: ok

if is_dataclass(Foo):
    print("yes")       # unreachable
    assert_never(Foo) # reachable -> error

为什么最后的语句会失败?这是因为类型检查器不会将 Foo 视为 Foo & 的子类。数据类实例。目前这样的类型保护或特殊情况还不存在。对于 @dataclass ,这取决于类型检查器如何解释该类:

class Fox():
    a : int
if is_dataclass(Fox):
    reveal_type(Fox) # _pyright: unreachable | _mypy: Fox
    reveal_type(Fox) # _pyright: Fox | _mypy: no-output


正如已经回答的,您可以使用 Protocol_typeshed.DataclassInstance 实际上是一个。

from dataclasses import Field
from typing import ClassVar, Any, Protocol

# experimental
# Might not work as expected for pyright, see
#   https://github.com/python/typeshed/pull/9362
#   https://github.com/microsoft/pyright/issues/4339
class DataclassInstance(Protocol):
    __dataclass_fields__: ClassVar[dict[str, Field[Any]]]


通用 TypeAliasType 注释


DataClass = TypeAliasType("DataClass", T, type_params=(T,))
Describes a value that is a dataclass.

    This type-hint is similar to an annotation and does not
    provide any additional information to the type-checker, e.g. for `is_dataclass`.

def make_dataclass(klass : type[T]) -> type[Dataclass[T]]: ...

# However without an additional cast the type-hint is not revealed in most situations.

foo = cast(DataClass[Foo], Foo(1, "yey")) # however reportUnnecessaryCast warning
# This cast displays the type of the instance as `DataClass[Foo]` in the tooltip instead of just `Foo`.

The currently accepted answer and comments describe why this is currently problematic.

There are the following problems, that are mostly independent:

  1. Informing the type-checker about the __init__ signature
  2. Informing the type-checker about a dataclass.
  3. Explicitly Annotating a dataclass via type-hints.
  4. Doing 3 in a generic way.

1 & 2 are done by the dataclass and dataclass_transform decorator, or rather how they are handled by the type-checkers. However for a human this is not visible in type-hints or annotations, they would need to see the decorator.
While they are not annotations they solve the problem of the original question.

2 & 3 implicitly only can be done by with the overloaded type-guard from typeshed is_dataclass(obj) -> TypeIs[DataclassInstance] to form an conditional intersection type.

4 for humans only can be done via Annotated or TypeAliasType (see bottom of this answer).

In the following I answer the question about how to annotate such a function or object explicitly and what problems there are.

It could be done if there were the possibility for an explicit Intersection type in Python to form a subclass between the DataclassInstance Protocol (see end of this post) and a generic: (type[T]) -> type[DataclassInstance & T].
Currently the best that is possibly is a conditional implicit intersection through type-checkers which is bounded by T and therefore very limited:

def my_dataclass(klass : type[T]):
# -> type[<subclass of DataclassInstance and T.bound>]`
    obj = cast(Any, do_something(dataclass(klass)))
    assert is_dataclass(obj) and issubclass(obj, klass)
    return obj

The mentioned type-guard that type-checkers can use is an overload provided by python's typeshed package.
typeshed is the official library used by most type-checkers which provides type-hints beyond the normal type-hints of the modules that are annotated. _typeshed type-hints are not available at runtime and have to be guarded appropriately, e.g. if TYPE_CHECKING, forward-references by encapsulating type-hints in strings, or using from __future__ import annotations.
While it looks simple and you can actually do it, it is not encouraged to include _typeshed in real code and it is intended for stub files (.pyi), i.e. if you use it you should write a stub file for your function.

Your IDE type-checker might already can work with it but if needed you can install it via:

pip install types-dataclasses

NOTE: The following code shows that it can be done explicitly, but its not a perfect solution as it cannot be done in a generic way without problems.

from typing import TYPE_CHECKING # not necessary in a stub file
if TYPE_CHECKING:                # not necessary in a stub file
   from _typeshed import DataclassInstance

# at best put this with @overload in a stub file!
    field_specifiers=(dataclasses.Field, dataclasses.field),
def my_dataclass(klass) -> type["DataclassInstance"]

class Foo:
  x : int
  y : list[int]

foo = Foo(1, [2]) # ok

class Baz(Foo):
  z : str

baz = Baz(1, [2], "b") # ok, because of dataclass_transform

if isinstance(bar, Foo): # deemed unnecessary
    assert_never(bar)  # ok

What does not work is:

# Do not in a functional way

class Bar: ...

Bar_dc = my_dataclass(Bar)
assert is_dataclass(Bar_dc)    # error! Does not check for `type["DataclassInstance"]`
assert issubclass(Bar_dc, Bar) # Not True for type-checkers!

# ------

# Check for dataclass outside is the exact type is known.
if is_dataclass(foo):
    print("yes")      # reachable
    assert_never(foo) # pyright: error, mypy: ok

if is_dataclass(Foo):
    print("yes")       # unreachable
    assert_never(Foo) # reachable -> error

Why do the last statements fail? It is because type-checkers do not see Foo as a subclass of Foo & DataclassInstance. Currently such a type-guard or special case does not yet exist. For @dataclass it depends how the type-checker interprets the class:

class Fox():
    a : int
if is_dataclass(Fox):
    reveal_type(Fox) # _pyright: unreachable | _mypy: Fox
    reveal_type(Fox) # _pyright: Fox | _mypy: no-output

The _typeshed.DataclassInstance

As already answered you can use a Protocol and _typeshed.DataclassInstance is in fact one.

from dataclasses import Field
from typing import ClassVar, Any, Protocol

# experimental
# Might not work as expected for pyright, see
#   https://github.com/python/typeshed/pull/9362
#   https://github.com/microsoft/pyright/issues/4339
class DataclassInstance(Protocol):
    __dataclass_fields__: ClassVar[dict[str, Field[Any]]]

It is currently not possible to annotate objects explicitly as a DataclassInstance and keep their original class at the same time.

Generic TypeAliasType Annotation

Without any great benefit for the type-checker but a generic annotation for human provides the following code:

DataClass = TypeAliasType("DataClass", T, type_params=(T,))
Describes a value that is a dataclass.

    This type-hint is similar to an annotation and does not
    provide any additional information to the type-checker, e.g. for `is_dataclass`.

def make_dataclass(klass : type[T]) -> type[Dataclass[T]]: ...

# However without an additional cast the type-hint is not revealed in most situations.

foo = cast(DataClass[Foo], Foo(1, "yey")) # however reportUnnecessaryCast warning
# This cast displays the type of the instance as `DataClass[Foo]` in the tooltip instead of just `Foo`.
作死小能手 2025-01-23 21:36:15

现有的答案似乎没有太多关于如何使用 dataclass_transform() 的信息,因此这里有一个实用指南。

@dataclass_transform 是一个装饰器,用于另一个装饰器函数、类或元类。这些装饰符号将成为“数据类工厂”。



# 1. Another decorator

def magic[T](cls: type[T]) -> type[T]: ...

class C:
    a: str
    b: int

reveal_type(C.__init__)  # (self: C, a: str, b: int) -> None

(游乐场:Mypy, Pyright)

# 2. Another class

class Magic: ...

class C(Magic):
    a: str
    b: int

reveal_type(C.__init__)  # (self: C, a: str, b: int) -> None

(游乐场:Mypy, Pyright)

# 3. A metaclass

class Magic(type): ...

class Base(metaclass=Magic):
    pass  # No fields

class C(metaclass=Magic):
    a: str
    b: int

class D(Base):
    a: str
    b: int

reveal_type(C.__init__)  # (self: C, a: str, b: int) -> None
reveal_type(D.__init__)  # (self: D, a: str, b: int) -> None

@dataclass_transform 的参数

根据工厂的行为方式,您可以提供 相应的关键字参数 @dataclass_transform

(游乐场:Mypy, Pyright)

# This means, if the `frozen` keyword-only argument is not provided,
# the decorated symbol will produce "frozen" instances by default.
@dataclass_transform(frozen_default = True)
def magic[T](*, frozen: bool = True) -> Callable[[type[T]], type[T]]: ...
#               ^^^^^^^^^^^^^^^^^^^ Not necessary if only frozen instances are desired.

@magic()  # or @magic(frozen = True)
class C:
    a: str
    b: int

c = C(a = 'lorem', b = 42)
c.a = ''  # error: Cannot modify attribute

@magic(frozen = False)
class D:
    a: str
    b: int

d = D(a = 'lorem', b = 42)
d.a = ''  # fine


@dataclass_transform 还有一个名为 field_specifiers 的参数。赋予此参数的参数必须是可调用元组,每个可调用元组都可用于描述分配给它的字段的行为。

下面是一个如何使用 dataclasses.field 的小示例:

(playgrounds: Mypy, Pyright)

class C:
    # Defaults to 0 if not given
    a: int = field(default = 0)
    # Calls the lambda to retrieve the default value if not given
    b: list[int] = field(default_factory = lambda: [1, 2, 3])

c = C()
reveal_type(c.a)  # int / 0
reveal_type(c.b)  # list[int] / [1, 2, 3]


(playgrounds: Mypy, Pyright)

# Holder for field-related data.
# This should be read and handled by @magic internally.
@dataclass(kw_only = True)
class MagicField:
    default: Any = None
    default_factory: Callable[[], Any] | None = None

# Field specifier
def magic_field(
    default: Any = None,
    default_factory: Callable[[], Any] | None = None
) -> Any: ...
#    ^^^ This is important.
# At runtime, return `MagicField` so that @magic can read it.

# Can also be defined as multiple overloads
# (`default` and `default_factory` are mutually exclusive).
def magic_field[T](*, default: T | None = None) -> T: ...

def magic_field[T](*, default_factory: Callable[[], T] | None = None) -> T: ...

def magic_field[...](...) -> ...: ...  # Omitted here for brevity

#                                       vvvvvvvvvvvvvv
@dataclass_transform(field_specifiers = (magic_field,))
def magic(...) -> ...: ...  # Omitted here for brevity
class C:
    #        vvvvvv Prefix added, but nothing else
    a: int = magic_field(default = 0)
    b: list[int] = magic_field(default_factory = lambda: [1, 2, 3])

c = C()
reveal_type(c.a)  # int / 0
reveal_type(c.b)  # list[int] / [1, 2, 3]

可以找到字段说明符参数列表 规范中


(游乐场:Mypy, Pyright)

class D:
    # Technically, parameters to field specifiers don't need
    # to be keyword-only. However, the corresponding arguments
    # must be passed as keyword arguments to be recognized.
    e: int = magic_field(0)  # silently ignored

d = D()  # error: Missing argument (0 is not recognized as default value)

Existing answers don't seem to have much information on how to use dataclass_transform(), so here's a practical guide.

@dataclass_transform is a decorator, meant to be used on another decorator function, a class, or a metaclass. These decorated symbols will then become "dataclass factories".

Basic usages

(playgrounds: Mypy, Pyright)

# 1. Another decorator

def magic[T](cls: type[T]) -> type[T]: ...

class C:
    a: str
    b: int

reveal_type(C.__init__)  # (self: C, a: str, b: int) -> None

(playgrounds: Mypy, Pyright)

# 2. Another class

class Magic: ...

class C(Magic):
    a: str
    b: int

reveal_type(C.__init__)  # (self: C, a: str, b: int) -> None

(playgrounds: Mypy, Pyright)

# 3. A metaclass

class Magic(type): ...

class Base(metaclass=Magic):
    pass  # No fields

class C(metaclass=Magic):
    a: str
    b: int

class D(Base):
    a: str
    b: int

reveal_type(C.__init__)  # (self: C, a: str, b: int) -> None
reveal_type(D.__init__)  # (self: D, a: str, b: int) -> None

@dataclass_transform's parameters

Depending on how the factory behaves, you can provide corresponding keyword arguments to @dataclass_transform:

(playgrounds: Mypy, Pyright)

# This means, if the `frozen` keyword-only argument is not provided,
# the decorated symbol will produce "frozen" instances by default.
@dataclass_transform(frozen_default = True)
def magic[T](*, frozen: bool = True) -> Callable[[type[T]], type[T]]: ...
#               ^^^^^^^^^^^^^^^^^^^ Not necessary if only frozen instances are desired.

@magic()  # or @magic(frozen = True)
class C:
    a: str
    b: int

c = C(a = 'lorem', b = 42)
c.a = ''  # error: Cannot modify attribute

@magic(frozen = False)
class D:
    a: str
    b: int

d = D(a = 'lorem', b = 42)
d.a = ''  # fine

Fields and field specifiers

@dataclass_transform also has a parameter named field_specifiers. The argument given to this parameter must be a tuple of callables, each of which can be used to describe the behaviour of the field it is assigned to.

Here's a small example of how dataclasses.field can be used:

(playgrounds: Mypy, Pyright)

class C:
    # Defaults to 0 if not given
    a: int = field(default = 0)
    # Calls the lambda to retrieve the default value if not given
    b: list[int] = field(default_factory = lambda: [1, 2, 3])

c = C()
reveal_type(c.a)  # int / 0
reveal_type(c.b)  # list[int] / [1, 2, 3]

And here's how to do the same with a custom decorator and field specifier(s):

(playgrounds: Mypy, Pyright)

# Holder for field-related data.
# This should be read and handled by @magic internally.
@dataclass(kw_only = True)
class MagicField:
    default: Any = None
    default_factory: Callable[[], Any] | None = None

# Field specifier
def magic_field(
    default: Any = None,
    default_factory: Callable[[], Any] | None = None
) -> Any: ...
#    ^^^ This is important.
# At runtime, return `MagicField` so that @magic can read it.

# Can also be defined as multiple overloads
# (`default` and `default_factory` are mutually exclusive).
def magic_field[T](*, default: T | None = None) -> T: ...

def magic_field[T](*, default_factory: Callable[[], T] | None = None) -> T: ...

def magic_field[...](...) -> ...: ...  # Omitted here for brevity

#                                       vvvvvvvvvvvvvv
@dataclass_transform(field_specifiers = (magic_field,))
def magic(...) -> ...: ...  # Omitted here for brevity
class C:
    #        vvvvvv Prefix added, but nothing else
    a: int = magic_field(default = 0)
    b: list[int] = magic_field(default_factory = lambda: [1, 2, 3])

c = C()
reveal_type(c.a)  # int / 0
reveal_type(c.b)  # list[int] / [1, 2, 3]

A list of field specifier parameters can be found in the spec.

A small caveat:

(playgrounds: Mypy, Pyright)

class D:
    # Technically, parameters to field specifiers don't need
    # to be keyword-only. However, the corresponding arguments
    # must be passed as keyword arguments to be recognized.
    e: int = magic_field(0)  # silently ignored

d = D()  # error: Missing argument (0 is not recognized as default value)
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。