MongoDB 二级数据库没有跟上

发布于 2024-11-06 19:02:53 字数 2358 浏览 0 评论 0原文

我有一个副本集,我正在尝试将主副本集升级为具有更多内存和升级磁盘空间的副本集。因此,我在新的主服务器上一起袭击了几个磁盘,从辅助服务器上同步数据并将其添加到副本集。检查 rs.status() 后,我注意到所有辅助节点都比主节点晚大约 12 小时。因此,当我尝试将新服务器强制到主要位置时,它将无法工作,因为它不是最新的。

这似乎是一个大问题,因为如果主数据库发生故障,我们至少会落后 12 小时,有些甚至会落后 48 小时。

oplog 全部重叠,并且 oplogsize 相当大。我唯一能想到的是我正在主服务器上执行大量写入/读取,这可能会使服务器处于锁定状态,从而不允许正确的追赶。

有没有办法可以迫使辅助设备赶上主设备?

目前有 5 个服务器,最后 2 个将替换其他 2 个节点。 _id 为 6 的节点将成为替换主节点的节点。距离主优化时间最远的节点晚了 48 小时多一点。

{
"set" : "gryffindor",
"date" : ISODate("2011-05-12T19:34:57Z"),
"myState" : 2,
"members" : [
    {
        "_id" : 1,
        "name" : "10******:27018",
        "health" : 1,
        "state" : 2,
        "stateStr" : "SECONDARY",
        "uptime" : 20231,
        "optime" : {
            "t" : 1305057514000,
            "i" : 31
        },
        "optimeDate" : ISODate("2011-05-10T19:58:34Z"),
        "lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
    },
    {
        "_id" : 2,
        "name" : "10******:27018",
        "health" : 1,
        "state" : 2,
        "stateStr" : "SECONDARY",
        "uptime" : 20231,
        "optime" : {
            "t" : 1305056009000,
            "i" : 400
        },
        "optimeDate" : ISODate("2011-05-10T19:33:29Z"),
        "lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
    },
    {
        "_id" : 3,
        "name" : "10******:27018",
        "health" : 1,
        "state" : 1,
        "stateStr" : "PRIMARY",
        "uptime" : 20229,
        "optime" : {
            "t" : 1305228858000,
            "i" : 422
        },
        "optimeDate" : ISODate("2011-05-12T19:34:18Z"),
        "lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
    },
    {
        "_id" : 5,
        "name" : "10*******:27018",
        "health" : 1,
        "state" : 2,
        "stateStr" : "SECONDARY",
        "uptime" : 20231,
        "optime" : {
            "t" : 1305058009000,
            "i" : 226
        },
        "optimeDate" : ISODate("2011-05-10T20:06:49Z"),
        "lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
    },
    {
        "_id" : 6,
        "name" : "10*******:27018",
        "health" : 1,
        "state" : 2,
        "stateStr" : "SECONDARY",
        "optime" : {
            "t" : 1305050495000,
            "i" : 384
        },
        "optimeDate" : ISODate("2011-05-10T18:01:35Z"),
        "self" : true
    }
],
"ok" : 1
}

I have a replica set that I am trying to upgrade the primary to one with more memory and upgraded disk space. So I raided a couple disks together on the new primary, rsync'd the data from a secondary and added it to the replica set. After checking out the rs.status(), I noticed that all the secondaries are at about 12 hours behind the primary. So when I try to force the new server to the primary spot it won't work, because it is not up to date.

This seems like a big issue, because in case the primary fails, we are at least 12 hours and some almost 48 hours behind.

The oplogs all overlap and the oplogsize is fairly large. The only thing that I can figure is I am performing a lot of writes/reads on the primary, which could keep the server in lock, not allowing for proper catch up.

Is there a way to possibly force a secondary to catch up to the primary?

There are currently 5 Servers with last 2 are to replace 2 of the other nodes.
The node with _id as 6, is to be the one to replace the primary. The node that is the furthest from the primary optime is a little over 48 hours behind.

{
"set" : "gryffindor",
"date" : ISODate("2011-05-12T19:34:57Z"),
"myState" : 2,
"members" : [
    {
        "_id" : 1,
        "name" : "10******:27018",
        "health" : 1,
        "state" : 2,
        "stateStr" : "SECONDARY",
        "uptime" : 20231,
        "optime" : {
            "t" : 1305057514000,
            "i" : 31
        },
        "optimeDate" : ISODate("2011-05-10T19:58:34Z"),
        "lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
    },
    {
        "_id" : 2,
        "name" : "10******:27018",
        "health" : 1,
        "state" : 2,
        "stateStr" : "SECONDARY",
        "uptime" : 20231,
        "optime" : {
            "t" : 1305056009000,
            "i" : 400
        },
        "optimeDate" : ISODate("2011-05-10T19:33:29Z"),
        "lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
    },
    {
        "_id" : 3,
        "name" : "10******:27018",
        "health" : 1,
        "state" : 1,
        "stateStr" : "PRIMARY",
        "uptime" : 20229,
        "optime" : {
            "t" : 1305228858000,
            "i" : 422
        },
        "optimeDate" : ISODate("2011-05-12T19:34:18Z"),
        "lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
    },
    {
        "_id" : 5,
        "name" : "10*******:27018",
        "health" : 1,
        "state" : 2,
        "stateStr" : "SECONDARY",
        "uptime" : 20231,
        "optime" : {
            "t" : 1305058009000,
            "i" : 226
        },
        "optimeDate" : ISODate("2011-05-10T20:06:49Z"),
        "lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
    },
    {
        "_id" : 6,
        "name" : "10*******:27018",
        "health" : 1,
        "state" : 2,
        "stateStr" : "SECONDARY",
        "optime" : {
            "t" : 1305050495000,
            "i" : 384
        },
        "optimeDate" : ISODate("2011-05-10T18:01:35Z"),
        "self" : true
    }
],
"ok" : 1
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

歌入人心 2024-11-13 19:02:54

我不确定为什么在您的情况下同步失败,但强制重新同步的一种方法是删除副本上的数据文件并重新启动 mongod。它将启动重新同步。请参阅http://www.mongodb.org/display/DOCS/Halted+Replication。这可能需要相当长的时间,具体取决于数据库的大小。

I'm not sure why the syncing has failed in your case, but one way to brute force a resync is to remove the data files on the replica and restart the mongod. It will initiate a resync. See http://www.mongodb.org/display/DOCS/Halted+Replication. It is likely to take quite some time, dependent on the size of your database.

宛菡 2024-11-13 19:02:54

查看所有内容后,我看到一个错误,这使我回到在主数据库上运行的 MapReduce,该映射存在以下问题: https://jira.mongodb.org/browse/SERVER-2861 。因此,当尝试复制时,由于 oplog 中的错误/损坏操作,它无法同步。

After looking through everything I saw a single error, which led me back to a mapreduce that was run on the primary, which had this issue: https://jira.mongodb.org/browse/SERVER-2861 . So when replication was attempted it failed to sync because of a faulty/corrupt operation in the oplog.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文