关系型数据库可以利用一致性哈希的方式来做分区表吗?

发布于 2024-11-29 23:28:21 字数 382 浏览 1 评论 0原文

假设我们有一个用户表,按用户 id 作为整数 1,2,3...n 进行分区。我可以使用一致性哈希的方式来对表进行分区吗?

好处是如果分区数量增加或减少,旧索引可以相同。

问题A

使用一致性哈希算法做分区表是不是一个好主意?

问题B

任何关系数据库都内置支持此功能吗?

我猜一些 nosql 数据库已经使用它了。

但这里的数据库指的是关系型数据库。

我刚刚在面试中遇到了这个问题。在第一反应中,我只是回答了按长度取模,但随后受到挑战,如果将表划分为更多部分,那么就会导致问题。

Assume we have a user table to be partitioned by user id as integer 1,2,3...n . Can I use the way of consistent hashing used to partition the table?

The benefit would be if the number of partitions is increased or decreased, old index can be the same.

Question A.

Is it a good idea to use consistent hashing algorithm to do the partition table?

Question B.

Any relational database has this built in supported?

I guess some nosql database already use it.

But database here refer to relational database.

I just encountered this question in an interview. In the first reaction I just answered mod by length, but then be challenged if partitioning the table into more pieces, then it will cause problems.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

心病无药医 2024-12-06 23:28:21

在我研究了一些 wiki 参考页面如 分区(数据库) 之后,

我相信我的想法属于到复合分区

复合分区允许上述分区方案的某些组合,例如首先应用范围分区,然后应用散列分区。 一致散列可以被认为是散列和列表分区的组合,其中散列将密钥空间减少到可以列出的大小。

它还引入了一些概念,例如一致哈希< a href="http://en.wikipedia.org/wiki/Hash_table#Monotonic_keys" rel="nofollow">哈希表

但有些链接如 分区(数据库) 有点旧。如果有人能找到更多最新的参考资料那就更好了。我的回答确实不完整。希望有人能更好的解答!

更新

看起来 Jonathan Ellis 在他的博客中已经提到过,Cassandra 分布式数据库现在支持两种分区方案:传统的一致性哈希方案和保序分区器。
http://spyced.blogspot.com/2009/05 /confirm-hashing-vs-order-preserving.html

来自 Tom White 的博客。一致性哈希的 Java 实现示例

import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;

public class ConsistentHash<T> {
    private final HashFunction hashFunction;
    private final int numberOfReplicas;
    private final SortedMap<Integer, T> circle = new TreeMap<Integer, T>();

    public ConsistentHash(HashFunction hashFunction, int numberOfReplicas,
            Collection<T> nodes) {
        this.hashFunction = hashFunction;
        this.numberOfReplicas = numberOfReplicas;
        for (T node : nodes) {
            add(node);
        }
    }

    public void add(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.put(hashFunction.hash(node.toString() + i), node);
        }
    }

    public void remove(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.remove(hashFunction.hash(node.toString() + i));
        }
    }

    public T get(Object key) {
        if (circle.isEmpty()) {
            return null;
        }
        int hash = hashFunction.hash(key);
        if (!circle.containsKey(hash)) {
            SortedMap<Integer, T> tailMap = circle.tailMap(hash);
            hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
        }
        return circle.get(hash);
    }
}

After I researched some wiki reference pages like Partition (database)

I believe my idea belongs to Composite partitioning .

Composite partitioning allows for certain combinations of the above partitioning schemes, by for example first applying a range partitioning and then a hash partitioning. Consistent hashing could be considered a composite of hash and list partitioning where the hash reduces the key space to a size that can be listed.

It also introduces some concepts like Consistent hashing, and Hash table

But some link like Partition (database) is kind of old. If some one can find more latest reference that will be better. My answer is incomplete indeed. Hope some one can answer it better!

UPDATE

Looks like Jonathan Ellis already mentioned in his blog, The Cassandra distributed database supports two partitioning schemes now: the traditional consistent hashing scheme, and an order-preserving partitioner.
http://spyced.blogspot.com/2009/05/consistent-hashing-vs-order-preserving.html

From Tom White's blog. A sample implemtation in java of consistent hashing

import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;

public class ConsistentHash<T> {
    private final HashFunction hashFunction;
    private final int numberOfReplicas;
    private final SortedMap<Integer, T> circle = new TreeMap<Integer, T>();

    public ConsistentHash(HashFunction hashFunction, int numberOfReplicas,
            Collection<T> nodes) {
        this.hashFunction = hashFunction;
        this.numberOfReplicas = numberOfReplicas;
        for (T node : nodes) {
            add(node);
        }
    }

    public void add(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.put(hashFunction.hash(node.toString() + i), node);
        }
    }

    public void remove(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.remove(hashFunction.hash(node.toString() + i));
        }
    }

    public T get(Object key) {
        if (circle.isEmpty()) {
            return null;
        }
        int hash = hashFunction.hash(key);
        if (!circle.containsKey(hash)) {
            SortedMap<Integer, T> tailMap = circle.tailMap(hash);
            hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
        }
        return circle.get(hash);
    }
}
﹉夏雨初晴づ 2024-12-06 23:28:21

关于oracle哈希分区,部分来自oracle帮助文档

经过一番研究,oracle实际上确实通过默认的哈希分区支持一致性哈希。尽管它是如何做到的仍然是一个秘密并且没有公开。但它实际上利用了HashMap的方式,只是隐藏了一些分区。因此,当您添加/删除分区时,oracle 调整不同分区中的数据的工作量非常少。该算法仅确保将数据均匀地拆分为 2 的数字幂(例如 4)的分区。因此,如果不是,则合并/拆分一些分区。

神奇的是,如果从四个分区增加到五个,它实际上将一个分区分成两个。如果从四个分区减少到三个,实际上就是将两个分区合并为一个。

如果有人有更多见解,请添加更详细的答案。

哈希分区
哈希分区根据 Oracle 应用于您识别的分区键的哈希算法将数据映射到分区。哈希算法在分区之间均匀分布行,使分区的大小大致相同。

哈希分区是跨设备均匀分布数据的理想方法。哈希分区也是范围分区的一种易于使用的替代方案,特别是当要分区的数据不是历史数据或没有明显的分区键时。

注意:

您无法更改分区使用的哈希算法。

关于MYSQL哈希分区,部分来自mysql帮助文档

它提供了两种分区功能
一种是HASH分区。
另一种是按KEY分区。

按键分区与按哈希分区类似,不同之处在于哈希分区采用用户定义的表达式,用于键分区的哈希函数由 MySQL 服务器提供。 MySQL Cluster 使用MD5()来实现此目的;对于使用其他存储引擎的表,服务器使用自己的内部哈希函数,该函数基于与 PASSWORD() 相同的算法。
CREATE TABLE ... PARTITION BY KEY 的语法规则与创建按哈希分区的表的语法规则类似。

主要区别如下:

• 使用KEY 而不是HASH。

•KEY 仅采用一个或多个列名称的列表。从 MySQL 5.1.5 开始,用作分区键的一列或多列必须包含表的部分或全部主键(如果表有主键)。

CREATE TABLE k1 (
    id INT NOT NULL PRIMARY KEY,
    name VARCHAR(20)
)
PARTITION BY KEY()
PARTITIONS 2;

如果没有主键但有唯一键,则使用唯一键作为分区键:

CREATE TABLE k1 (
    id INT NOT NULL,
    name VARCHAR(20),
    UNIQUE KEY (id)
)
PARTITION BY KEY()
PARTITIONS 2;

但是,如果唯一键列未定义为 NOT NULL,则前面的语句将失败。

但它没有告诉它如何分区,必须研究代码。

About oracle hash partition, part from oracle help doc

After some research, oracle actually do support consistent hashing by the default hash partitioning. Though how it did is a secret and not published. But it actually leverage the way HashMap, but hidden some partitions. So when you add/remove partition, very less work for oracle to adjust the data in different partitions. The algorithms only ensures evenly splitting data into partitions of numbers power of 2 such as 4. So if it's not, then merge/split some partitions.

The magic is like if to increase from four partitions to five, it actually spilts one partition into two. If to decrease from four partitions into three, it actually merges two partitions into one.

If anyone has more insight, add a more detailed answer.

Hash Partitioning
Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to the partitioning key that you identify. The hashing algorithm evenly distributes rows among partitions, giving partitions approximately the same size.

Hash partitioning is the ideal method for distributing data evenly across devices. Hash partitioning is also an easy-to-use alternative to range partitioning, especially when the data to be partitioned is not historical or has no obvious partitioning key.

Note:

You cannot change the hashing algorithms used by partitioning.

About MYSQL hash partition, part from mysql help doc

It provides two partition function
One is partition by HASH.
The other is partition by KEY.

Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. MySQL Cluster uses MD5() for this purpose; for tables using other storage engines, the server employs its own internal hashing function which is based on the same algorithm as PASSWORD().
The syntax rules for CREATE TABLE ... PARTITION BY KEY are similar to those for creating a table that is partitioned by hash.

The major differences are listed here:

•KEY is used rather than HASH.

•KEY takes only a list of one or more column names. Beginning with MySQL 5.1.5, the column or columns used as the partitioning key must comprise part or all of the table's primary key, if the table has one.

CREATE TABLE k1 (
    id INT NOT NULL PRIMARY KEY,
    name VARCHAR(20)
)
PARTITION BY KEY()
PARTITIONS 2;

If there is no primary key but there is a unique key, then the unique key is used for the partitioning key:

CREATE TABLE k1 (
    id INT NOT NULL,
    name VARCHAR(20),
    UNIQUE KEY (id)
)
PARTITION BY KEY()
PARTITIONS 2;

However, if the unique key column were not defined as NOT NULL, then the previous statement would fail.

But it don't tell how it partitions, will have to look into code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文