Python：deferToThread XMLRPC 服务器 - Twisted - Cherrypy？

发布于 2024-08-21 00:34:49 字数 1514 浏览 4 评论 0原文

这个问题与我在这里问的其他问题有关，主要是关于对内存中的大量数据进行排序。

基本上这就是我想要/拥有的：

Twisted XMLRPC 服务器正在运行。该服务器在内存中保存了几个 (32) Foo 类的实例。每个 Foo 类都包含一个列表栏（其中将包含数百万条记录）。有一个服务可以从数据库检索数据，并将其传递到 XMLRPC 服务器。数据基本上是一个字典，键对应于每个 Foo 实例，值是字典列表，如下所示：

data = {'foo1':[{'k1':'v1', 'k2':'v2'}, {'k1':'v1', 'k2':'v2'}], 'foo2':...}

然后向每个 Foo 实例传递与其键对应的值，并且 Foo.bar 字典被更新和排序。

class XMLRPCController(xmlrpc.XMLRPC):

    def __init__(self):
        ...
        self.foos = {'foo1':Foo(), 'foo2':Foo(), 'foo3':Foo()}
        ...

    def update(self, data):
        for k, v in data:
            threads.deferToThread(self.foos[k].processData, v)

    def getData(self, fookey):
        # return first 10 records of specified Foo.bar
        return self.foos[fookey].bar[0:10]

class Foo():

    def __init__(self):
        bar = []

    def processData(self, new_bar_data):
        for record in new_bar_data:
            # do processing, and add record, then sort
            # BUNCH OF PROCESSING CODE
            self.bar.sort(reverse=True)

问题是，当在 XMLRPCController 中调用包含大量记录（例如 100K 以上）的更新函数时，它会停止响应我的 getData 调用，直到所有 32 个 Foo 实例都完成 process_data 方法。我认为 deferToThread 会起作用，但我认为我误解了问题所在。

任何建议...我愿意使用其他东西，比如 Cherrypy，如果它支持这种必需的行为。

编辑

@Troy：这就是反应堆的设置方式

reactor.listenTCP(port_no, server.Site(XMLRPCController)
reactor.run()

就GIL而言，改变它是否是一个可行的选择 sys.setcheckinterval() 值变小，因此数据上的锁被释放以便可以读取？

原文

This question is related to others I have asked on here, mainly regarding sorting huge sets of data in memory.

Basically this is what I want / have:

Twisted XMLRPC server running. This server keeps several (32) instances of Foo class in memory. Each Foo class contains a list bar (which will contain several million records). There is a service that retrieves data from a database, and passes it to the XMLRPC server. The data is basically a dictionary, with keys corresponding to each Foo instance, and values are a list of dictionaries, like so:

data = {'foo1':[{'k1':'v1', 'k2':'v2'}, {'k1':'v1', 'k2':'v2'}], 'foo2':...}

Each Foo instance is then passed the value corresponding to it's key, and the Foo.bar dictionaries are updated and sorted.

class XMLRPCController(xmlrpc.XMLRPC):

    def __init__(self):
        ...
        self.foos = {'foo1':Foo(), 'foo2':Foo(), 'foo3':Foo()}
        ...

    def update(self, data):
        for k, v in data:
            threads.deferToThread(self.foos[k].processData, v)

    def getData(self, fookey):
        # return first 10 records of specified Foo.bar
        return self.foos[fookey].bar[0:10]

class Foo():

    def __init__(self):
        bar = []

    def processData(self, new_bar_data):
        for record in new_bar_data:
            # do processing, and add record, then sort
            # BUNCH OF PROCESSING CODE
            self.bar.sort(reverse=True)

The problem is that when the update function is called in the XMLRPCController with a lot of records (say 100K +) it stops responding to my getData calls until all 32 Foo instances have completed the process_data method. I thought deferToThread would work, but I think I am misunderstanding where the problem is.

Any suggestions... I am open to using something else, like Cherrypy if it supports this required behavior.

EDIT

@Troy: This is how the reactor is set up

reactor.listenTCP(port_no, server.Site(XMLRPCController)
reactor.run()

As far as GIL, would it be a viable option to change
sys.setcheckinterval()
value to something smaller, so the lock on the data is released so it can be read?

分享到QQ

分享到微博