如果我使用 GUID 作为主键,COMB GUID 是否适合 Rails 3.1?

发布于 2024-12-09 10:20:12 字数 3315 浏览 0 评论 0原文

我正在使用 Rails 3.1 和 PostgreSQL 8.4。假设我想要/需要使用 GUID 主键。一个潜在的缺点是索引碎片。在 MS SQL 中,推荐的解决方案是使用特殊的顺序 GUID。一种方法来实现顺序GUID 是组合 GUID,它用 6 字节时间戳替换 GUID 末尾的 MAC 地址部分。这已得到一些主流采用:COMB 在 NHibernate 中原生可用 (NHibernate/Id/GuidCombGenerator.cs)。

我想我已经弄清楚如何在 Rails 中创建 COMB GUID(在 UUIDTools 2.1.2 gem 的帮助下),但它留下了一些悬而未决的问题:

  • 当 PRIMARY KEY 为 UUID 类型时,PostgreSQL 是否会遭受索引碎片?
  • 如果 GUID 的低位 6 字节是连续的,是否可以避免碎片?
  • 下面实现的 COMB GUID 是在 Rails 中创建顺序 GUID 的可接受的、可靠的方法吗?

谢谢你的想法。


create_contacts.rb 迁移

class CreateContacts < ActiveRecord::Migration
  def up
    create_table :contacts, :id => false do |t|
      t.column :id, :uuid, :null => false # manually create :id with underlying DB type UUID
      t.string :first_name
      t.string :last_name
      t.string :email

      t.timestamps
    end
    execute "ALTER TABLE contacts ADD PRIMARY KEY (id);"
  end

    # Can't use reversible migration because it will try to run 'execute' again
  def down
    drop_table :contacts # also drops primary key
  end
end

/app/models/contact.rb

class Contact < ActiveRecord::Base
  require 'uuid_helper' #rails 3 does not autoload from lib/*
  include UUIDHelper

  set_primary_key :id
end

/lib/uuid_tools.rb

require 'uuidtools'

module UUIDHelper
  def self.included(base)
    base.class_eval do
      include InstanceMethods
      attr_readonly :id       # writable only on a new record
      before_create :set_uuid
    end
  end

  module InstanceMethods
  private
    def set_uuid
      # MS SQL syntax:  CAST(CAST(NEWID() AS BINARY(10)) + CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)

      # Get current Time object
      utc_timestamp = Time.now.utc

      # Convert to integer with milliseconds:  (Seconds since Epoch * 1000) + (6-digit microsecond fraction / 1000)
      utc_timestamp_with_ms_int = (utc_timestamp.tv_sec * 1000) + (utc_timestamp.tv_usec / 1000)

      # Format as hex, minimum of 12 digits, with leading zero.  Note that 12 hex digits handles to year 10889 (*).
      utc_timestamp_with_ms_hexstring = "%012x" % utc_timestamp_with_ms_int

      # If we supply UUIDTOOLS with a MAC address, it will use that rather than retrieving from system.
      # Use a regular expression to split into array, then insert ":" characters so it "looks" like a MAC address.
      UUIDTools::UUID.mac_address = (utc_timestamp_with_ms_hexstring.scan /.{2}/).join(":")

      # Generate Version 1 UUID (see RFC 4122).
      comb_guid = UUIDTools::UUID.timestamp_create().to_s 

      # Assign generted COMBination GUID to .id
      self.id = comb_guid

      # (*) A note on maximum time handled by 6-byte timestamp that includes milliseconds:
      # If utc_timestamp_with_ms_hexstring = "FFFFFFFFFFFF" (12 F's), then 
      # Time.at(Float(utc_timestamp_with_ms_hexstring.hex)/1000).utc.iso8601(10) = "10889-08-02T05:31:50.6550292968Z".
    end
  end
end

I'm using Rails 3.1 with PostgreSQL 8.4. Let's assume I want/need to use GUID primary keys. One potential drawback is index fragmentation. In MS SQL, a recommended solution for that is to use special sequential GUIDs. One approach to sequential GUIDs is the COMBination GUID that substitutes a 6-byte timestamp for the MAC address portion at the end of the GUID. This has some mainstream adoption: COMBs are available natively in NHibernate (NHibernate/Id/GuidCombGenerator.cs).

I think I've figured out how to create COMB GUIDs in Rails (with the help of the UUIDTools 2.1.2 gem), but it leaves some unanswered questions:

  • Does PostgreSQL suffer from index fragmentation when the PRIMARY KEY is type UUID?
  • Is fragmentation avoided if the low-order 6 bytes of the GUID are sequential?
  • Is the COMB GUID as implemented below an acceptable, reliable way to create sequential GUIDs in Rails?

Thanks for your thoughts.


create_contacts.rb migration

class CreateContacts < ActiveRecord::Migration
  def up
    create_table :contacts, :id => false do |t|
      t.column :id, :uuid, :null => false # manually create :id with underlying DB type UUID
      t.string :first_name
      t.string :last_name
      t.string :email

      t.timestamps
    end
    execute "ALTER TABLE contacts ADD PRIMARY KEY (id);"
  end

    # Can't use reversible migration because it will try to run 'execute' again
  def down
    drop_table :contacts # also drops primary key
  end
end

/app/models/contact.rb

class Contact < ActiveRecord::Base
  require 'uuid_helper' #rails 3 does not autoload from lib/*
  include UUIDHelper

  set_primary_key :id
end

/lib/uuid_tools.rb

require 'uuidtools'

module UUIDHelper
  def self.included(base)
    base.class_eval do
      include InstanceMethods
      attr_readonly :id       # writable only on a new record
      before_create :set_uuid
    end
  end

  module InstanceMethods
  private
    def set_uuid
      # MS SQL syntax:  CAST(CAST(NEWID() AS BINARY(10)) + CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)

      # Get current Time object
      utc_timestamp = Time.now.utc

      # Convert to integer with milliseconds:  (Seconds since Epoch * 1000) + (6-digit microsecond fraction / 1000)
      utc_timestamp_with_ms_int = (utc_timestamp.tv_sec * 1000) + (utc_timestamp.tv_usec / 1000)

      # Format as hex, minimum of 12 digits, with leading zero.  Note that 12 hex digits handles to year 10889 (*).
      utc_timestamp_with_ms_hexstring = "%012x" % utc_timestamp_with_ms_int

      # If we supply UUIDTOOLS with a MAC address, it will use that rather than retrieving from system.
      # Use a regular expression to split into array, then insert ":" characters so it "looks" like a MAC address.
      UUIDTools::UUID.mac_address = (utc_timestamp_with_ms_hexstring.scan /.{2}/).join(":")

      # Generate Version 1 UUID (see RFC 4122).
      comb_guid = UUIDTools::UUID.timestamp_create().to_s 

      # Assign generted COMBination GUID to .id
      self.id = comb_guid

      # (*) A note on maximum time handled by 6-byte timestamp that includes milliseconds:
      # If utc_timestamp_with_ms_hexstring = "FFFFFFFFFFFF" (12 F's), then 
      # Time.at(Float(utc_timestamp_with_ms_hexstring.hex)/1000).utc.iso8601(10) = "10889-08-02T05:31:50.6550292968Z".
    end
  end
end

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我不是你的备胎 2024-12-16 10:20:12
  • 当 PRIMARY KEY 为 UUID 类型时,PostgreSQL 是否会出现索引碎片问题?

是的,这是可以预料的。但如果您打算使用 COMB 策略,则不会发生这种情况。行将始终按顺序排列(这并不完全正确,但请耐心等待)。

此外,本机 pgsql UUID 与 VARCHAR 之间的性能不是一切都那么不同。另一点需要考虑。

  • 如果 GUID 的低位 6 字节是连续的,是否可以避免碎片?

在我的测试中,我发现 UUID1(RFC 4122) 是连续的,生成的 uuid 中已经添加了时间戳。但是,是的,在最后 6 个字节中添加时间戳将确保排序。无论如何,这就是我所做的,因为显然已经存在的时间戳并不能保证顺序。有关 COMB 的更多信息请此处

  • 下面实现的 COMB GUID 是在 Rails 中创建顺序 GUID 的可接受且可靠的方法吗?

我没有使用 Rails,但我将向您展示如何在 django 中做到这一点:

import uuid, time

def uuid1_comb(obj):
    return uuid.uuid1(node=int(time.time() * 1000))

其中 node 是标识硬件地址的 48 位正整数。

关于您的实现,使用 uuid 的主要优点之一是您可以在数据库外部安全地生成它们,因此,使用辅助类是一种有效的方法。您始终可以使用外部服务来生成 uuid,例如 snowflake,但此时可能还为时过早。

  • Does PostgreSQL suffer from index fragmentation when the PRIMARY KEY is type UUID?

Yes, it's to be expected. But if you're going to use the COMB strategy that won't happen. The rows will be always in order(that's not entirely true, but bear with me).

Also, the performance between native pgsql UUID vs VARCHAR is not all that different. Another point to consider.

  • Is fragmentation avoided if the low-order 6 bytes of the GUID are sequential?

In my test I've found that UUID1(RFC 4122) is sequential, there's already a timestamp added in the generated uuid. But yes, adding a timestamp in the last 6 bytes will reassure that ordering. That's what I did anyway, because apparently the timestamp already present is not guarantee of order. More about COMB here

  • Is the COMB GUID as implemented below an acceptable, reliable way to create sequential GUIDs in Rails?

I'm not using rails, but I'll show you how I did it in django:

import uuid, time

def uuid1_comb(obj):
    return uuid.uuid1(node=int(time.time() * 1000))

Where node is a 48-bit positive integer identifying the hardware address.

About your implementation, one of the main advantages of using uuid's is that you can safely generate them outside the database, so, using a helper class is one valid way to do it. You can always use an external service for uuid generation like snowflake, but it may be premature optimizacion at this point.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文