MongoDB:非常巨大的日志文件?

发布于 2025-01-31 01:11:20 字数 4295 浏览 3 评论 0原文

我已经设置了MongoDB(在Windows上),以收听Websocket。最近,我正在检查到目前为止收集的数据集的大小。现在,我的数据文件夹的总大小现在约为7GB。然后,我发现日志文件的大小为175GB(!!!)。现在,我只是使用旋转命令生成一个新的日志文件并删除旧文件。但是,当然,我很好奇为什么要生成这个极大的日志文件。我检查了文件的内容。在数百万(或数十亿?)的消息线中,这里有一些示例,有关消息通常看起来的示例:

{"t":{"$date":"2022-01-27T01:52:07.723+01:00"},"s":"I",  "c":"-",        "id":20883,   "ctx":"conn298926","msg":"Interrupted operation as its client disconnected","attr":{"opId":33104664}}

{"t":{"$date":"2022-01-27T01:52:07.723+01:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn298926","msg":"Connection ended","attr":{"remote":"127.0.0.1:61724","uuid":"88da63c3-37dc-416f-a044-dcc3eae36d8b","connectionId":298926,"connectionCount":39}}

{"t":{"$date":"2022-01-27T01:52:07.726+01:00"},"s":"I",  "c":"-",        "id":20883,   "ctx":"conn298929","msg":"Interrupted operation as its client disconnected","attr":{"opId":33104670}}

{"t":{"$date":"2022-01-27T01:52:07.727+01:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn298929","msg":"Connection ended","attr":{"remote":"127.0.0.1:61727","uuid":"7fc3e5c7-5687-474f-aa0e-c83f501c16cc","connectionId":298929,"connectionCount":38}}

{"t":{"$date":"2022-01-27T01:52:07.737+01:00"},"s":"I",  "c":"-",        "id":20883,   "ctx":"conn298932","msg":"Interrupted operation as its client disconnected","attr":{"opId":33104677}}

{"t":{"$date":"2022-01-27T01:52:07.737+01:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn298932","msg":"Connection ended","attr":{"remote":"127.0.0.1:61730","uuid":"0c8c0ecf-7aef-4bab-980b-267a3a0c78b7","connectionId":298932,"connectionCount":37}}

...

似乎有一些连接 - 案件(在各个端口上)正在进行,但我不知道在哪里这来自或导致这一点。

在阅读之前:首先不需要浏览以下所有代码的简单问题;如果这些消息不是“关键”(这意味着它们表明出现问题了),是否可以简单地将其关闭?那将是一个简单的解决方案。从我拥有的数据集中,我看到我的脚本正常运行,并且数据准确地收集了数据,因此我认为,简单地关闭此类型的日志记录消息就足够了。进一步阅读以获取更多详细信息...


我的基本async python代码用于连接到Websockets并接收消息外观(缩短)这样的:

async def run_socket(manager,subscription=None):

    async with websockets.connect(manager.url) as socket: 

        # receive messages and process
        while True: 

            # receive message
            message = await socket.recv()

            # process data and store
            manager.process_message(message)

    return 

def run_stream(manager,subscription):
    while True: 
        try: 
            # create a new event loop 
            loop = asyncio.new_event_loop()
            loop.run_until_complete(run_socket(manager,subscription))
        except KeyboardInterrupt as e: 
            print(e)
            loop.close()
            return 
        except Exception as e:
            # reconnect in case of exception
            print(e)
            loop.close()
            sleep(5)
    return 

# MAIN LOOP 

if __name__ == "__main__":

    # manager 
    manager = create_websocket_manager(stream)

    threads = []
    for subscription in manager.generate_subscriptions(): 
        print(subscription)

        # create a new event loop 
        loop = asyncio.new_event_loop()

        # create thread for the stream
        thread = threading.Thread(target=run_stream,args=(manager,subscription))
        thread.daemon = False

        # collect threads
        threads.append(thread)

        # run 
        thread.start()

    for thread in threads: 
        thread.join()

function create_websocket_manager(此处未显示)创建一个自定义的“ WebSocket Manager”,解析和存储从不同流到达的消息的对象。所有对象共享相同的基类,该基类(缩短)锁定为锁:

class WebsocketManager:

    def __init__(self,stream):
        # ... set_collection(self)
        return

    def set_collection(self):
        # ... uri_mongo, db_name, col_name
        self.col = MongoClient(uri_mongo)[db_name][col_name]

    def parse_message(self,message): # -> list
        # ... message -> results
        return results

    def process_message(self,message):
        results = self.parse_message(message)
        if len(results)>0:
            self.col.insert_many(results)
        return

管理器为流(self.col = mongoclient(uri_mongo)设置集合将到达消息存储到集合中。

上面显示的主要循环运行了由同一管理器对象处理的几个流的各种线程。此外,我为不同管理人员运行了主循环文件的几个Python实例。也许运行Websocket的CMD的不同实例是导致连接 - 案件的事物的原因?

如果您阅读到这里,请感谢您的漫长问题,我感谢您的反馈::)

最好,JZ

I have set up MongoDB (on windows) a while ago to listen to WebSockets. Recently, I was checking the size of the dataset I had collected so far. The total size of my data folder where the collections are stored is about 7GB now. Then, I discovered that surprisingly the size of the log file is 175GB (!!!). For now, I just used the rotate command to generate a new log file and delete the old one. But of course, I am curious as to why this extremely large log file is generated. I checked the content of the file. Amongst the millions (or billions?) of lines of messages, here are a few examples of how the messages typically look:

{"t":{"$date":"2022-01-27T01:52:07.723+01:00"},"s":"I",  "c":"-",        "id":20883,   "ctx":"conn298926","msg":"Interrupted operation as its client disconnected","attr":{"opId":33104664}}

{"t":{"$date":"2022-01-27T01:52:07.723+01:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn298926","msg":"Connection ended","attr":{"remote":"127.0.0.1:61724","uuid":"88da63c3-37dc-416f-a044-dcc3eae36d8b","connectionId":298926,"connectionCount":39}}

{"t":{"$date":"2022-01-27T01:52:07.726+01:00"},"s":"I",  "c":"-",        "id":20883,   "ctx":"conn298929","msg":"Interrupted operation as its client disconnected","attr":{"opId":33104670}}

{"t":{"$date":"2022-01-27T01:52:07.727+01:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn298929","msg":"Connection ended","attr":{"remote":"127.0.0.1:61727","uuid":"7fc3e5c7-5687-474f-aa0e-c83f501c16cc","connectionId":298929,"connectionCount":38}}

{"t":{"$date":"2022-01-27T01:52:07.737+01:00"},"s":"I",  "c":"-",        "id":20883,   "ctx":"conn298932","msg":"Interrupted operation as its client disconnected","attr":{"opId":33104677}}

{"t":{"$date":"2022-01-27T01:52:07.737+01:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn298932","msg":"Connection ended","attr":{"remote":"127.0.0.1:61730","uuid":"0c8c0ecf-7aef-4bab-980b-267a3a0c78b7","connectionId":298932,"connectionCount":37}}

...

It seems, there is some connect-disconnect thing (on various ports?) going on, but I have no idea where this comes from or what causes this.

BEFORE YOU READ ON: a simple question first that may not require going through all the code below; if these messages are not "critical" (meaning they indicate that something is going wrong), is it possible to simply turn them off in logging? That would be a simple solution. From the dataset I have, I see that my script is functioning correctly and the data is accurately collected, so I assume, it could be sufficient to simply turn off the logging messages of this type. READ FURTHER FOR MORE DETAILS...


My basic async python code for connecting to the WebSockets and receiving messages looks (shortened) like this:

async def run_socket(manager,subscription=None):

    async with websockets.connect(manager.url) as socket: 

        # receive messages and process
        while True: 

            # receive message
            message = await socket.recv()

            # process data and store
            manager.process_message(message)

    return 

def run_stream(manager,subscription):
    while True: 
        try: 
            # create a new event loop 
            loop = asyncio.new_event_loop()
            loop.run_until_complete(run_socket(manager,subscription))
        except KeyboardInterrupt as e: 
            print(e)
            loop.close()
            return 
        except Exception as e:
            # reconnect in case of exception
            print(e)
            loop.close()
            sleep(5)
    return 

# MAIN LOOP 

if __name__ == "__main__":

    # manager 
    manager = create_websocket_manager(stream)

    threads = []
    for subscription in manager.generate_subscriptions(): 
        print(subscription)

        # create a new event loop 
        loop = asyncio.new_event_loop()

        # create thread for the stream
        thread = threading.Thread(target=run_stream,args=(manager,subscription))
        thread.daemon = False

        # collect threads
        threads.append(thread)

        # run 
        thread.start()

    for thread in threads: 
        thread.join()

The function create_websocket_manager (not shown here) creates a custom "WebSocket manager" object that parses and stores the messages that arrive from different streams. All objects share the same base class which (shortened) locks like this:

class WebsocketManager:

    def __init__(self,stream):
        # ... set_collection(self)
        return

    def set_collection(self):
        # ... uri_mongo, db_name, col_name
        self.col = MongoClient(uri_mongo)[db_name][col_name]

    def parse_message(self,message): # -> list
        # ... message -> results
        return results

    def process_message(self,message):
        results = self.parse_message(message)
        if len(results)>0:
            self.col.insert_many(results)
        return

The manager sets a collection for the stream (self.col = MongoClient(uri_mongo)[db_name][col_name]), then parses and stores the arriving messages to the collection.

The main loop shown further above runs various threads for several streams that are processed by the same manager object. Additionally, I run several python instances of the main loop file for different managers. Maybe the different instances of cmd running the WebSockets are what cause the connect-disconnect thing?

If you read till here, thanks already for going through this long question, and I appreciate your feedback :)

Best, JZ

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文