如果我使用 GUID 作为主键,COMB GUID 是否适合 Rails 3.1?
我正在使用 Rails 3.1 和 PostgreSQL 8.4。假设我想要/需要使用 GUID 主键。一个潜在的缺点是索引碎片。在 MS SQL 中,推荐的解决方案是使用特殊的顺序 GUID。一种方法来实现顺序GUID 是组合 GUID,它用 6 字节时间戳替换 GUID 末尾的 MAC 地址部分。这已得到一些主流采用:COMB 在 NHibernate 中原生可用 (NHibernate/Id/GuidCombGenerator.cs)。
我想我已经弄清楚如何在 Rails 中创建 COMB GUID(在 UUIDTools 2.1.2 gem 的帮助下),但它留下了一些悬而未决的问题:
- 当 PRIMARY KEY 为 UUID 类型时,PostgreSQL 是否会遭受索引碎片?
- 如果 GUID 的低位 6 字节是连续的,是否可以避免碎片?
- 下面实现的 COMB GUID 是在 Rails 中创建顺序 GUID 的可接受的、可靠的方法吗?
谢谢你的想法。
create_contacts.rb
迁移
class CreateContacts < ActiveRecord::Migration
def up
create_table :contacts, :id => false do |t|
t.column :id, :uuid, :null => false # manually create :id with underlying DB type UUID
t.string :first_name
t.string :last_name
t.string :email
t.timestamps
end
execute "ALTER TABLE contacts ADD PRIMARY KEY (id);"
end
# Can't use reversible migration because it will try to run 'execute' again
def down
drop_table :contacts # also drops primary key
end
end
/app/models/contact.rb
class Contact < ActiveRecord::Base
require 'uuid_helper' #rails 3 does not autoload from lib/*
include UUIDHelper
set_primary_key :id
end
/lib/uuid_tools.rb
require 'uuidtools'
module UUIDHelper
def self.included(base)
base.class_eval do
include InstanceMethods
attr_readonly :id # writable only on a new record
before_create :set_uuid
end
end
module InstanceMethods
private
def set_uuid
# MS SQL syntax: CAST(CAST(NEWID() AS BINARY(10)) + CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)
# Get current Time object
utc_timestamp = Time.now.utc
# Convert to integer with milliseconds: (Seconds since Epoch * 1000) + (6-digit microsecond fraction / 1000)
utc_timestamp_with_ms_int = (utc_timestamp.tv_sec * 1000) + (utc_timestamp.tv_usec / 1000)
# Format as hex, minimum of 12 digits, with leading zero. Note that 12 hex digits handles to year 10889 (*).
utc_timestamp_with_ms_hexstring = "%012x" % utc_timestamp_with_ms_int
# If we supply UUIDTOOLS with a MAC address, it will use that rather than retrieving from system.
# Use a regular expression to split into array, then insert ":" characters so it "looks" like a MAC address.
UUIDTools::UUID.mac_address = (utc_timestamp_with_ms_hexstring.scan /.{2}/).join(":")
# Generate Version 1 UUID (see RFC 4122).
comb_guid = UUIDTools::UUID.timestamp_create().to_s
# Assign generted COMBination GUID to .id
self.id = comb_guid
# (*) A note on maximum time handled by 6-byte timestamp that includes milliseconds:
# If utc_timestamp_with_ms_hexstring = "FFFFFFFFFFFF" (12 F's), then
# Time.at(Float(utc_timestamp_with_ms_hexstring.hex)/1000).utc.iso8601(10) = "10889-08-02T05:31:50.6550292968Z".
end
end
end
I'm using Rails 3.1 with PostgreSQL 8.4. Let's assume I want/need to use GUID primary keys. One potential drawback is index fragmentation. In MS SQL, a recommended solution for that is to use special sequential GUIDs. One approach to sequential GUIDs is the COMBination GUID that substitutes a 6-byte timestamp for the MAC address portion at the end of the GUID. This has some mainstream adoption: COMBs are available natively in NHibernate (NHibernate/Id/GuidCombGenerator.cs).
I think I've figured out how to create COMB GUIDs in Rails (with the help of the UUIDTools 2.1.2 gem), but it leaves some unanswered questions:
- Does PostgreSQL suffer from index fragmentation when the PRIMARY KEY is type UUID?
- Is fragmentation avoided if the low-order 6 bytes of the GUID are sequential?
- Is the COMB GUID as implemented below an acceptable, reliable way to create sequential GUIDs in Rails?
Thanks for your thoughts.
create_contacts.rb
migration
class CreateContacts < ActiveRecord::Migration
def up
create_table :contacts, :id => false do |t|
t.column :id, :uuid, :null => false # manually create :id with underlying DB type UUID
t.string :first_name
t.string :last_name
t.string :email
t.timestamps
end
execute "ALTER TABLE contacts ADD PRIMARY KEY (id);"
end
# Can't use reversible migration because it will try to run 'execute' again
def down
drop_table :contacts # also drops primary key
end
end
/app/models/contact.rb
class Contact < ActiveRecord::Base
require 'uuid_helper' #rails 3 does not autoload from lib/*
include UUIDHelper
set_primary_key :id
end
/lib/uuid_tools.rb
require 'uuidtools'
module UUIDHelper
def self.included(base)
base.class_eval do
include InstanceMethods
attr_readonly :id # writable only on a new record
before_create :set_uuid
end
end
module InstanceMethods
private
def set_uuid
# MS SQL syntax: CAST(CAST(NEWID() AS BINARY(10)) + CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)
# Get current Time object
utc_timestamp = Time.now.utc
# Convert to integer with milliseconds: (Seconds since Epoch * 1000) + (6-digit microsecond fraction / 1000)
utc_timestamp_with_ms_int = (utc_timestamp.tv_sec * 1000) + (utc_timestamp.tv_usec / 1000)
# Format as hex, minimum of 12 digits, with leading zero. Note that 12 hex digits handles to year 10889 (*).
utc_timestamp_with_ms_hexstring = "%012x" % utc_timestamp_with_ms_int
# If we supply UUIDTOOLS with a MAC address, it will use that rather than retrieving from system.
# Use a regular expression to split into array, then insert ":" characters so it "looks" like a MAC address.
UUIDTools::UUID.mac_address = (utc_timestamp_with_ms_hexstring.scan /.{2}/).join(":")
# Generate Version 1 UUID (see RFC 4122).
comb_guid = UUIDTools::UUID.timestamp_create().to_s
# Assign generted COMBination GUID to .id
self.id = comb_guid
# (*) A note on maximum time handled by 6-byte timestamp that includes milliseconds:
# If utc_timestamp_with_ms_hexstring = "FFFFFFFFFFFF" (12 F's), then
# Time.at(Float(utc_timestamp_with_ms_hexstring.hex)/1000).utc.iso8601(10) = "10889-08-02T05:31:50.6550292968Z".
end
end
end
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,这是可以预料的。但如果您打算使用 COMB 策略,则不会发生这种情况。行将始终按顺序排列(这并不完全正确,但请耐心等待)。
此外,本机 pgsql UUID 与 VARCHAR 之间的性能不是一切都那么不同。另一点需要考虑。
在我的测试中,我发现 UUID1(RFC 4122) 是连续的,生成的 uuid 中已经添加了时间戳。但是,是的,在最后 6 个字节中添加时间戳将确保排序。无论如何,这就是我所做的,因为显然已经存在的时间戳并不能保证顺序。有关 COMB 的更多信息请此处
我没有使用 Rails,但我将向您展示如何在 django 中做到这一点:
其中
node
是标识硬件地址的 48 位正整数。关于您的实现,使用 uuid 的主要优点之一是您可以在数据库外部安全地生成它们,因此,使用辅助类是一种有效的方法。您始终可以使用外部服务来生成 uuid,例如 snowflake,但此时可能还为时过早。
Yes, it's to be expected. But if you're going to use the COMB strategy that won't happen. The rows will be always in order(that's not entirely true, but bear with me).
Also, the performance between native pgsql UUID vs VARCHAR is not all that different. Another point to consider.
In my test I've found that UUID1(RFC 4122) is sequential, there's already a timestamp added in the generated uuid. But yes, adding a timestamp in the last 6 bytes will reassure that ordering. That's what I did anyway, because apparently the timestamp already present is not guarantee of order. More about COMB here
I'm not using rails, but I'll show you how I did it in django:
Where
node
is a 48-bit positive integer identifying the hardware address.About your implementation, one of the main advantages of using uuid's is that you can safely generate them outside the database, so, using a helper class is one valid way to do it. You can always use an external service for uuid generation like snowflake, but it may be premature optimizacion at this point.