Nodejs可以将HTTP处理HTTP成功响应,这是由于(可能)大量的承诺 - 增长问题

发布于 2025-01-30 04:47:48 字数 3714 浏览 4 评论 0原文

我发现自己对如何理解这个问题的根本原因而感到盲目。任何方向都非常受欢迎 我也做了一些测试,我将在此处显示

a)---一般风景-----

我在gcp上有两个集群 调度程序(2-15服务器)和消费者(6-100)服务器 调度程序获取需要完成的任务,然后将其发送到消费者

调度程序消费者,每分钟大约有2.2k消息,并且创建了更多调度程序,更多的任务被划分为其中。

问题:

  1. 消费者处理100%请求的100%bellow 3秒钟
  2. 约0.5%,调度程序 logs logs显示timeout即使<<代码>消费者已在2秒内处理了请求。我敢肯定,因为每个请求都有一个唯一的ID。
  3. Axios超时设置为20秒
  4. 时,当错误登录调度程序时,从请求开始到异常处理时间为45秒< /strong>。因此,超过20年代的Axios到期时间。
  5. 作为前提,在Google GCP本地网络中,我在网络
  6. 调度程序中没有问题 CPU正在上升,但不一定淹没,即不到80% 调度程序将最多完成100个任务并一次处理15(Promise.all())并等待响应,而不是接下来的

15风景向我暗示,调度程序 nodejs无法处理消费者回复它的响应,并以某种方式缺少一些答复。

因此,假设是这样的:

调度程序能够发送大量消息,但无法获得结果,因为axios超时将过期在eventloop能够解决消费者的200响应之前。

b)------测试-------

我们可以在测试中重现问题。我们创建了一个发件人和一个接收器。发件人只需向接收者发送一个增量号,一次是100次,

在有几种情况下,接收器说它处理了一项特定任务(编号),但发件人正在显示超时。

这里是发件人和接收者

发件人

import Aigle from 'aigle';
import axios from 'axios';
import * as _ from 'lodash';


async function main(): Promise<void> {
    const TIMEOUT: number = 1000;
    const LIMIT: number = 10000;
    const items: unknown[] = _.range(LIMIT)
    let REMOTE_MACHINE: string = ''
    const hash: { [key in string]: true } = {};


    if (!REMOTE_MACHINE) throw new Error('No remote mahchine')

    try {

        const out = await Aigle
            .resolve(items)
            .mapLimit(100, async (address, index) => {
                try {
                    const response = await axios.get(`http://${REMOTE_MACHINE}:4000/${index}`, { timeout: TIMEOUT })
                    print({ type: 'response', index })
                    // 
                    return response;
                } catch (err) {
                    hash[index] = true;
                    print({ type: 'err', index })
                    // 
                    return err
                }
            })
        ;

        console.log('Amount errors', Object.keys(hash).length)

        await axios.get(`http://${REMOTE_MACHINE}:4000/${LIMIT}`, { timeout: TIMEOUT })
        // 
    } catch (err) {
        console.log({ err })
    }

    function print(item: any) {
        return console.log(item)
    }
    
}
main();

问题的接收者

const express = require('express')
const fs = require('fs')


function main() {
        const app = express()
        let count = 0;
        const LIMIT = 10000;
        const hash = {}

app.get('/:num?', async (req, res) => {
            // if (req.params.num !== count.toString()) console.log(req.params.num)
            count++;
            hash[req.params.num] = true;
            if (Number(req.params.num) >= LIMIT) {
                    const difference = Array(LIMIT).fill(0).map((_, index) => index).filter((_, index) => !hash[index])

                    console.log({ difference });
                    fs.writeFileSync('out', JSON.stringify({ difference, hash }))
            }
            console.log({ count, body: req.params.num })
            res.send({ message: 'Out' });
        });
        app.listen(4000, () => console.log('listening'))
}

main()

,因此问题是

  1. 该问题的原因 :超时
  2. 我们必须尊重安全的平行性阈值吗?它与并行过程和/或IO结合吗?
  3. 我应该使用任何模式吗?

I am finding my self blind on how to understand the root cause of this problem. Any direction is very welcomed
I´ve made some tests as well which I am going to show here

A) --- General Scenery -----

I have two cluster on GCP
Scheduler (2-15 servers) and Consumer (6-100) servers
Scheduler gets the tasks that need to be done and send it to the Consumer.

There are about 2.2K messages per minute from Scheduler to Consumer and as more Schedulers are created, more the tasks are divided among them.

The problem:

  1. Consumer process 100% of the requests bellow 3 seconds
  2. About 0.5% of the cases, Scheduler logs shows a timeout even if the Consumer has processed the request in <2 seconds. I am sure because each request has an unique ID.
  3. Axios timeout are set to 20 seconds
  4. When the errors are logged in Scheduler, the elapsed time from the beginning of the request until the exception handling is about 45 seconds . So way more than the 20s Axios expiring time.
  5. Being at Google GCP local network means, as a premise, that I have no problem in network
  6. Scheduler CPU is rising but not necessarily overwhelmed, i.e, less than 80%
    Scheduler will get up to 100 tasks and process them 15 at a time (Promise.all()) and wait for the responses, than the next 15 and so on

What this scenery suggests to me is that Scheduler nodejs is not being able to process the responses that Consumers replies it and somehow it is missing some replies.

So, the hypothesis goes like this:

Scheduler is able to send a high volume of messages but unable to get the results because Axios timeout will expire before the eventloop is able to tackle the 200 response from the Consumer.

B) ------ The Tests ------

We could reproduce the problem in the tests. We've created a sender and a receiver. sender just send an incremental number to receiver, 100 at a time, 10000 times

There are several cases where receiver says it processed a specific task (number) but sender is presenting a timeout.

Here is sender and receiver

Sender

import Aigle from 'aigle';
import axios from 'axios';
import * as _ from 'lodash';


async function main(): Promise<void> {
    const TIMEOUT: number = 1000;
    const LIMIT: number = 10000;
    const items: unknown[] = _.range(LIMIT)
    let REMOTE_MACHINE: string = ''
    const hash: { [key in string]: true } = {};


    if (!REMOTE_MACHINE) throw new Error('No remote mahchine')

    try {

        const out = await Aigle
            .resolve(items)
            .mapLimit(100, async (address, index) => {
                try {
                    const response = await axios.get(`http://${REMOTE_MACHINE}:4000/${index}`, { timeout: TIMEOUT })
                    print({ type: 'response', index })
                    // 
                    return response;
                } catch (err) {
                    hash[index] = true;
                    print({ type: 'err', index })
                    // 
                    return err
                }
            })
        ;

        console.log('Amount errors', Object.keys(hash).length)

        await axios.get(`http://${REMOTE_MACHINE}:4000/${LIMIT}`, { timeout: TIMEOUT })
        // 
    } catch (err) {
        console.log({ err })
    }

    function print(item: any) {
        return console.log(item)
    }
    
}
main();

Receiver

const express = require('express')
const fs = require('fs')


function main() {
        const app = express()
        let count = 0;
        const LIMIT = 10000;
        const hash = {}

app.get('/:num?', async (req, res) => {
            // if (req.params.num !== count.toString()) console.log(req.params.num)
            count++;
            hash[req.params.num] = true;
            if (Number(req.params.num) >= LIMIT) {
                    const difference = Array(LIMIT).fill(0).map((_, index) => index).filter((_, index) => !hash[index])

                    console.log({ difference });
                    fs.writeFileSync('out', JSON.stringify({ difference, hash }))
            }
            console.log({ count, body: req.params.num })
            res.send({ message: 'Out' });
        });
        app.listen(4000, () => console.log('listening'))
}

main()

So the questions are:

  1. What could be the reasons scheduler falsely appoint a timeout
  2. Is there a safe parallelism threshold that we must respect? Is it bind to parallel process and/or IO?
  3. Any pattern I should use?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文