这是创建Python Metaclass的好案例吗？

发布于 2025-02-11 17:07:19 字数 5079 浏览 1 评论 0原文

我的任务是重构一堆旧代码。我要修复的一件事是，即使对象本身是Excel工作表的表示，有一类对象具有直接数据库访问的方法。因此，在我看来，应该还有另一类的对象来调节数据库访问。

我们使用的这些助手功能称为“处理程序”，该功能列出了参数列表并从数据库中返回数据。 Excelworksheet类实现一种使运行处理程序异步运行的方法，每个子类都使用一个或多个处理程序。这也可能很难遵循IMO：

data_handler = {
    # We are always going to use facebook here so add the arg here
    "metrics": lambda *args: metrics_handler(*["facebook"] + list(args)),
    "top_content": lambda *args: content_handler(*["facebook"] + list(args)),
}

@property
def data_requirements(self):
    return [
        ["metrics", self.start, self.end, [self.mapping_id], [self.category_id], self.metrics],
        ["top_content", self.start, self.end, "total_actions", "user", [self.mapping_id], 10],
        ["top_content", self.start, self.end, "total_actions", "category", [self.category_id], 10],
        ["top_content", self.start, self.end, "comments_count", "user", [self.mapping_id], 5],
        ["top_content", self.start, self.end, "comments_count", "category", [self.category_id], 5],
        ["top_content", self.start, self.end, "shares_count", "user", [self.mapping_id], 5],
        ["top_content", self.start, self.end, "shares_count", "category", [self.category_id], 5],
    ]

@staticmethod
def _call_async(fns):
    """
    Call the fns in a thread pool asynchronously and return a
    generator of results
    """
    futures = []
    with CloseConnectionThreadPoolExecutor(max_workers=4) as executor:
        for indx, f in enumerate(fns):
            LOGGER.warn("Getting data dependency %s out of %s" % (indx + 1, len(fns)))

            future = executor.submit(f)
            futures.append(future)

    return (i.result() for i in futures)

def require_data(self):
    """
    Returns a generator of all the required data specified by the
    worksheets in this workbook in the order specified.

    Calls self.data_handler to get back a function for the given
    key. If the requirement is a list, the first argument is the
    key and subsequent items are passed as arguments to the function
    returned.

    Data requirement format:
    [
        'resource_1',
        'resource_2',
        ['resource_3', 'arg1', 'arg2'],
        ['resource_3', 'arg3', 'arg4']
    ]
    """
    if not self.data_handler:
        raise NotImplementedError("No data_handler specified")

    # For each requirement, route to the data handler
    data_fns = []
    for r in self.data_requirements:
        # A requirement can be specified as a list where the
        # first item is the resource and the rest are
        # arguments to the function to fulfill the requirement
        if isinstance(r, list):
            key = r[0]
            args = r[1:]
            fn = self.data_handler[key](*args)
        else:
            fn = self.data_handler[r]

        data_fns.append(fn)

    return self._call_async(data_fns)

因此，我得出结论，我应该创建一个单独的数据源对象。但是，我不喜欢此代码的另一件事是这些处理程序的调用方式。似乎真的很复杂，很难理解。因此，我希望能够这样使用我的数据源对象：

datasource.add_request_for_metrics(arg1,arg2,arg3...)
datasource.add_request_for_top_content(arg1,arg2,arg3...)
results = datasource.execute_requests()

我可以手工定义所有这些方法，但是有很多处理程序，将来可能会添加更多。因此，我考虑在此处使用一个元素，该元素将为每个给定的处理程序定义“ add_request_for_x”方法。到目前为止，这就是我想到的：

class DataSourceMeta(type):
"""
This is a metaclass for defining a datasource class
data_handler_mappings - a dict that defines which handlers to use when looking for data. For example:
{"metrics":metrics_handler, "time_series":time_series_handler} - class objects will use metrics_handler
to search for "metrics" data category and time_series_handler to search for "time_series" data category

For every item in the dictionary the metaclass will add a new accordingly named class method for adding a new request.
In this case the methods will be named "add_request_for_metrics" and "add_request_for_time_series" ("add_request_for_{data_category}").
These requests can then be executed by calling the "execute_requests" method, which should return a list of results

Each class using this meta should implement the following interface:
add_request(): adds a request to be executed
_run_requests(): defines how requests are run(ex. sync vs async)
_clean_up(): performs any necessary cleanup after running the "execute_requests" method (clear list of requests?)
"""
def __new__(cls,*args,**kwargs):
    data_handler_mappings=kwargs.get("data_handler_mappings")
    newclass = super(DataSourceMeta,cls).__new__(cls,*args)
    for data_category,data_category_handler in data_handler_mappings.items():
        data_category=data_category.lower()
        method_name = "add_request_for_"+data_category
        def method(self,*args,**kwargs):
            newclass.add_request(self,data_category_handler(*args,**kwargs))
        setattr(newclass,method_name,method)
    def execute_requests(self):
        results = newclass._run_requests(self)
        newclass._clean_up(self)
        return results
    newclass.execute_requests = execute_requests
    return newclass

这是一个好方法吗？有什么我可以做得更好吗？我只是一个初级程序员，这将是我第一次使用元口以进行一个严重的项目

原文

I am tasked with refactoring a bunch of old code. One thing I wanted to fix is that there's a class of objects which has methods for direct database access, even though the objects themselves are representations of Excel worksheets. So, in my mind there should be another class of objects to mediate database access.

We're using these helper functions called "handlers", which take a list of parameters and return data from the database. The ExcelWorksheet class implements a method for running handlers asynchronously, and each subclass uses one or more handlers. This can also be quite hard to follow imo:

data_handler = {
    # We are always going to use facebook here so add the arg here
    "metrics": lambda *args: metrics_handler(*["facebook"] + list(args)),
    "top_content": lambda *args: content_handler(*["facebook"] + list(args)),
}

@property
def data_requirements(self):
    return [
        ["metrics", self.start, self.end, [self.mapping_id], [self.category_id], self.metrics],
        ["top_content", self.start, self.end, "total_actions", "user", [self.mapping_id], 10],
        ["top_content", self.start, self.end, "total_actions", "category", [self.category_id], 10],
        ["top_content", self.start, self.end, "comments_count", "user", [self.mapping_id], 5],
        ["top_content", self.start, self.end, "comments_count", "category", [self.category_id], 5],
        ["top_content", self.start, self.end, "shares_count", "user", [self.mapping_id], 5],
        ["top_content", self.start, self.end, "shares_count", "category", [self.category_id], 5],
    ]

@staticmethod
def _call_async(fns):
    """
    Call the fns in a thread pool asynchronously and return a
    generator of results
    """
    futures = []
    with CloseConnectionThreadPoolExecutor(max_workers=4) as executor:
        for indx, f in enumerate(fns):
            LOGGER.warn("Getting data dependency %s out of %s" % (indx + 1, len(fns)))

            future = executor.submit(f)
            futures.append(future)

    return (i.result() for i in futures)

def require_data(self):
    """
    Returns a generator of all the required data specified by the
    worksheets in this workbook in the order specified.

    Calls self.data_handler to get back a function for the given
    key. If the requirement is a list, the first argument is the
    key and subsequent items are passed as arguments to the function
    returned.

    Data requirement format:
    [
        'resource_1',
        'resource_2',
        ['resource_3', 'arg1', 'arg2'],
        ['resource_3', 'arg3', 'arg4']
    ]
    """
    if not self.data_handler:
        raise NotImplementedError("No data_handler specified")

    # For each requirement, route to the data handler
    data_fns = []
    for r in self.data_requirements:
        # A requirement can be specified as a list where the
        # first item is the resource and the rest are
        # arguments to the function to fulfill the requirement
        if isinstance(r, list):
            key = r[0]
            args = r[1:]
            fn = self.data_handler[key](*args)
        else:
            fn = self.data_handler[r]

        data_fns.append(fn)

    return self._call_async(data_fns)

So I came to the conclusion I should create a separate DataSource object. But another thing I don't like about this code is the way these handlers are called. It seems really complicated and hard to understand. So I would like to be able to use my DataSource object like this:

datasource.add_request_for_metrics(arg1,arg2,arg3...)
datasource.add_request_for_top_content(arg1,arg2,arg3...)
results = datasource.execute_requests()

I could define all these methods by hand, however there are a whole bunch of handlers and there could be even more added in the future. So I thought about using a metaclass here that would define an "add_request_for_x" method for every given handler. This is what I came up with so far:

class DataSourceMeta(type):
"""
This is a metaclass for defining a datasource class
data_handler_mappings - a dict that defines which handlers to use when looking for data. For example:
{"metrics":metrics_handler, "time_series":time_series_handler} - class objects will use metrics_handler
to search for "metrics" data category and time_series_handler to search for "time_series" data category

For every item in the dictionary the metaclass will add a new accordingly named class method for adding a new request.
In this case the methods will be named "add_request_for_metrics" and "add_request_for_time_series" ("add_request_for_{data_category}").
These requests can then be executed by calling the "execute_requests" method, which should return a list of results

Each class using this meta should implement the following interface:
add_request(): adds a request to be executed
_run_requests(): defines how requests are run(ex. sync vs async)
_clean_up(): performs any necessary cleanup after running the "execute_requests" method (clear list of requests?)
"""
def __new__(cls,*args,**kwargs):
    data_handler_mappings=kwargs.get("data_handler_mappings")
    newclass = super(DataSourceMeta,cls).__new__(cls,*args)
    for data_category,data_category_handler in data_handler_mappings.items():
        data_category=data_category.lower()
        method_name = "add_request_for_"+data_category
        def method(self,*args,**kwargs):
            newclass.add_request(self,data_category_handler(*args,**kwargs))
        setattr(newclass,method_name,method)
    def execute_requests(self):
        results = newclass._run_requests(self)
        newclass._clean_up(self)
        return results
    newclass.execute_requests = execute_requests
    return newclass

Is this a good approach? Is there anything I can do better? I am only a junior programmer and this would be my first time using a metaclass for a serious project

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

世态炎凉 2025-02-18 17:07:19

这个答案涵盖了您可以避免元口的选项。

我对您的问题的理解是，您想提高代码的可读性，并使我认为适合生产系统的代码（“严重项目”），

我认为这种情况可以避免使用元素，并且更适合适合在生产系统中这样做，因为您可以通过非动力类别引入更好的类型安全性。此外，我认为以您提出的方式的动态类使您很难一目了然地了解给定的“数据源”可以要求，即可读性很低。

我认为是这种情况的主要原因是，您有一个用于获取数据的方法字典（data_handlers）。我的理解是，这些方法是固定的，并且您正在以特定方式将它们组合为不同的数据库（Excel表）。

在这种情况下，我的建议是使用一系列混合物。

从提供的代码片段中，如果不同的数据库需要不同的方法（即需要自定义数据处理程序）才能获取数据，则有一个限制。但是，如果仅是特定的少数参数，则可以将它们作为类变量添加到混合物中。

import abc

class BaseDataSourceMixin(abc.ABC):

    @abc.abstractmethod
    def _run_requests(self):
        raise NotImplementedError()
    
    @abc.abstractmethod
    def _clean_up(self):
        """defines how requests are run(ex. sync vs async)"""
        raise NotImplementedError()

    def execute_requests(self):
        """Performs any necessary cleanup after running the "execute_requests" method (clear list of requests?)"""
        results = self._run_requests()
        self._clean_up()
        return results

class RequestMixin(abc.ABC):

    @abc.abstractproperty
    def data_source_name(self) -> str:
        pass

class MetricsRequest(RequestMixin):

    def add_request_for_metrics(self, *args): # can use meaningful variable names
        pass


class TopContentRequest(RequestMixin):

    def add_request_for_top_content(self, *args):
        print(self.data_source_name) # can be used here, and the subclass will define it
        pass

class FacebookDataSource(
    BaseDataSourceMixin,
    # Now when looking at the code we can just see what we are inheriting
    # and know what methods to expect in this function
    MetricsRequest,
    TopContentRequest,
):

    @property
    def data_source_name(self) -> str:
        # This as an example returns a string, but I may suggest using a NamedTuple
        # especially if there is a specific interface you are using
        # that always requires the same number/type of variables
        return "facebook" 

    def _run_requests(self):
        return

    def _clean_up(self):
        return


datasource =  FacebookDataSource()
datasource.add_request_for_metrics(arg1,arg2,arg3...)
datasource.add_request_for_top_content(arg1,arg2,arg3...)
results = datasource.execute_requests()

以这种方式使用它，我们可以轻松地从继承的方法中获取DOCSTRINGS，这些方法可以暴露于我们的IDE上，因为开发人员查看/理解代码很容易：

This answer is covering the option where you can avoid a metaclass.

My understanding from your question is that you want to improve the readability of your code, and also make the code suitable for a production system ("serious project")

In my opinion, this case described can avoid a metaclass, and is better suited to do so in a production system as you can introduce better type safety with non-dynamic classes. Additionally, I think a dynamic class in your proposed way makes it harder to at a glance understand what is a given "datasource" can request, ie the readability is low.

The main reason I think this is the case, is that you have a dictionary of methods used to fetch data (the data_handlers). My understanding is that these methods are fixed and you are combining them in specific ways for different databases (excel sheets).

My suggestion in this case would be to use a series of mixins.

There is a limitation for this, from the code snippet provided, if different databases require different methods (ie data handlers need to be customised) to get data this method may not be ideal. But if it is just a handful of arguments that are specific, they can be added as class variables to the mixin.

import abc

class BaseDataSourceMixin(abc.ABC):

    @abc.abstractmethod
    def _run_requests(self):
        raise NotImplementedError()
    
    @abc.abstractmethod
    def _clean_up(self):
        """defines how requests are run(ex. sync vs async)"""
        raise NotImplementedError()

    def execute_requests(self):
        """Performs any necessary cleanup after running the "execute_requests" method (clear list of requests?)"""
        results = self._run_requests()
        self._clean_up()
        return results

class RequestMixin(abc.ABC):

    @abc.abstractproperty
    def data_source_name(self) -> str:
        pass

class MetricsRequest(RequestMixin):

    def add_request_for_metrics(self, *args): # can use meaningful variable names
        pass


class TopContentRequest(RequestMixin):

    def add_request_for_top_content(self, *args):
        print(self.data_source_name) # can be used here, and the subclass will define it
        pass

class FacebookDataSource(
    BaseDataSourceMixin,
    # Now when looking at the code we can just see what we are inheriting
    # and know what methods to expect in this function
    MetricsRequest,
    TopContentRequest,
):

    @property
    def data_source_name(self) -> str:
        # This as an example returns a string, but I may suggest using a NamedTuple
        # especially if there is a specific interface you are using
        # that always requires the same number/type of variables
        return "facebook" 

    def _run_requests(self):
        return

    def _clean_up(self):
        return


datasource =  FacebookDataSource()
datasource.add_request_for_metrics(arg1,arg2,arg3...)
datasource.add_request_for_top_content(arg1,arg2,arg3...)
results = datasource.execute_requests()

Using it in this way we can easily get docstrings from the inherited methods, that can be exposed to our IDE, making is easier as a developer looking/understanding the code: