从 Mysql 迁移到 Cassandra

发布于 2024-11-01 22:03:46 字数 1672 浏览 3 评论 0原文

以前,我使用此处找到的类将 userID 转换为某个随机字符串。

来自他的博客:

运行:

alphaID(9007199254740989);

将返回“PpQXn7COf”并且:

alphaID('PpQXn7COf', true);

将返回“9007199254740989”

所以想法是用户可以执行www.mysite.com/user/PpQXn7COf,我将其转换为普通整数,以便我

"Select * from Users where userID=".alphaID('PpQXn7COf', true)

现在 可以在mysql中执行我刚刚开始使用 Cassandra,正在寻找替代品。

  1. 我想要像 www.mysite.com/user/PpQXn7COf 这样的 url,而不是像 www.mysite.com/user/username1
  2. “PpQXn7COf”uuid 必须尽可能短。

在此处解释的 Twissandra 示例中: http://www .rackspace.com/cloud/blog/2010/05/12/cassandra-by-example/

他们创建了一些长的uuid(我猜它太长了,因为那么它几乎100%确定它是随机的)。

在 mysql 中,我只有一个自动增加的 userID 列,所以当我使用 alphaID() 函数时,我总是得到一个非常短的随机字符串。

有人知道如何尽可能干净地解决这个问题吗?


编辑:

它用于社交媒体网站,因此必须是持久的。 这也是为什么我不想在网址中使用用户名/真实姓名,如果需要,用户不能让谷歌不被检测到。

我只是有一个简单的想法,但我不知道它的可扩展性如何,

<?php
//createUUID() makes +- 14 char string with A-Z a-z 1-0 based on micro/milli/nanoseconds
while(get_count(createUUID()) > 0){//uuid  is unique
  //insert username pass, uuid etc into cassandra
  if($result == "1"){
      header('Location: http://www.mysite.com/usercenter');
  }else{
      echo "error";
  }
}
?>

当它达到 twitter/facebook 的大小时:

  1. 它会在可接受的时间内执行吗?
  2. 它是否仍会足够快地生成唯一的 uuid,因此如果每秒有 10000 个用户注册,它不会造成混乱?

Previously I was using the class found here to convert userID to some random string.

From his blog:

Running:

alphaID(9007199254740989);

will return 'PpQXn7COf' and:

alphaID('PpQXn7COf', true);

will return '9007199254740989'

So the idea was that users could do www.mysite.com/user/PpQXn7COf and i convert that to a normal integer so i could do in mysql

"Select * from Users where userID=".alphaID('PpQXn7COf', true)

Now i'm just started working with Cassandra an i'm looking for some replacement.

  1. I want url like www.mysite.com/user/PpQXn7COf not like www.mysite.com/user/username1
  2. The "PpQXn7COf" uuid must be as short as possible.

In the Twissandra example explained here: http://www.rackspace.com/cloud/blog/2010/05/12/cassandra-by-example/

They create some long uuid (i guess it is so long because then its almost 100 percent sure its random).

In mysql i just had a userID column with auto increasement so when i used the alphaID() function i always got a very short random string.

Anyone an idea how to solve this as clean as possible?


Edit:

It is used for a social media site so it must be persistent.
Thats also why i don't want to use usernames/realnames in urls, user cant remain google undetected if they need.

I just got a simple idea, however i don't know how scalable it is

<?php
//createUUID() makes +- 14 char string with A-Z a-z 1-0 based on micro/milli/nanoseconds
while(get_count(createUUID()) > 0){//uuid  is unique
  //insert username pass, uuid etc into cassandra
  if($result == "1"){
      header('Location: http://www.mysite.com/usercenter');
  }else{
      echo "error";
  }
}
?>

When this gets the size of lets say twitter/facebook:

  1. Will it execute in acceptable time?
  2. Will it still generate unique uuid fast enough so if 10000 users/second are registering it isnt cluttering up?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

〃温暖了心ぐ 2024-11-08 22:03:46

自动增量不适合健壮的分布式系统。仅当系统中的每个节点都可用时,您才能分配唯一 ID,以确保其唯一。

当然,您可以发明自己的唯一 ID 生成器,但您必须确保它将在您的基础设施中的任何位置生成唯一 ID。

例如,每个节点只能有一个文件(通过适当的锁定等)仅递增该文件,但您还需要确保它们不会冲突 - 例如,通过将服务器 ID 包含在生成算法中。

这在操作上可能很重要 - 您的运维工程师需要确保基础设施中的所有服务器都正确配置了自己的 ID 生成器设置,以便它们不会生成相同的 ID。然而,这是可能的。

UUID 是合理的选择,因为它们肯定是唯一的。

UUID是128位;如果我们每个字符存储 6 位(即 base64),则需要 22 个字符,这是一个相当长的 URI。如果您希望它更短,则需要以不同的方式生成唯一 ID。

另外,这一切都取决于您实际需要 ID 的“独特程度”。如果您的 ID 可以在几个月后安全地重复使用,那么您可能可以在 << 内完成。 60 位(还取决于您的基础设施中的服务器数量以及您需要生成它们的频率)。

我们使用

  • 服务器 ID
  • 时间(粒度 = 2 秒),但在几个月后回绕
  • 每个服务器计数器(频繁回绕,但不在 2 秒内)

并将所有位粘在一起。这会生成一个 << 的 ID。 64 位长,但保证在所需的时间长度内是唯一的(在我们的例子中只有几个月)


如果出现以下情况,我们的算法将发生故障并生成重复的 ID:

  • 我们节点之一上的系统时钟向后移动与计数器回绕相同的时间。
  • 我们的运维工程师犯了一个错误,将相同的服务器ID分配给了两台服务器。
  • 最终,大约9个月后。

Auto-increments are not suitable for a robust distributed system. You can only assign a unique ID if every node in your system is available, to ensure it's unique.

You can of course, invent your own unique-id generator, but you must then ensure that it will generate unique IDs anywhere in your infrastructure.

For example, each node can just have a file which it (with suitable locking etc) just increments, but you will also need to ensure that they don't clash - for instance, by having the server ID included in the generation algorithm.

This may be operationally nontrivial - your ops engineers will need to ensure that all the servers in the infrastructure are configured correctly with their own ID generators set up so that they don't generate the same ID. However, it's possible.

UUIDs are the reasonable alternative, because they will definitely be unique.

A UUID is 128 bits; if we store 6 bits per character (i.e. base64) then that takes 22 characters, which is quite a long URI. If you want it shorter, you will need to generate unique IDs a different way.

Plus it all depends on "how unique" you actually need your IDs to be. If your IDs can safely be reused after a few months, you can probably do it in < 60 bits (depending also on the number of servers in your infrastructure, and how frequently you need to generate them).

We use

  • Server ID
  • Time (granularity = 2 seconds), but wraps after a few months
  • A per-server counter (which wraps frequently, but not within 2 seconds)

And stick all the bits together. This generates an ID which is < 64 bits long, but is guaranteed to be unique for the length of time it needs to be (which in our case is only a couple of months)


Our algorithm will malfunction and generate a duplicate ID if:

  • The system clock on one of our nodes goes backwards by the same amount of time in which the counter wraps.
  • Our operations engineers make a mistake and assign the same server ID to two servers.
  • Eventually, after about 9 months.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文