pycassa 中的时间 UUID 类型
我在使用 time_uuid 类型作为列族中的键时遇到问题。我想存储我的记录,并在插入时按顺序排列它们,然后我认为 time_uuid 是一个好方法。这就是我设置列族的方式:
sys.create_column_family("keyspace", "records", comparator_type=TIME_UUID_TYPE)
当我尝试插入时,我这样做:
q=pycassa.ColumnFamily(pycassa.connect("keyspace"), "records")
myKey=pycassa.util.convert_time_to_uuid(datetime.datetime.utcnow())
q.insert(myKey,{'somedata':'comevalue'})
但是,当我插入数据时,我总是收到错误:
v1 UUID 列名称或值的参数既不是 UUID,也不是日期时间或数字。
如果我将 comparator_type 更改为 UTF8_TYPE,它可以工作,但返回的项目顺序不正确。我做错了什么?
I'm having problems with using the time_uuid type as a key in my columnfamily. I want to store my records, and have them ordered by when they were inserted, and then I figured that the time_uuid is a good way to go. This is how I've set up my column family:
sys.create_column_family("keyspace", "records", comparator_type=TIME_UUID_TYPE)
When I try to insert, I do this:
q=pycassa.ColumnFamily(pycassa.connect("keyspace"), "records")
myKey=pycassa.util.convert_time_to_uuid(datetime.datetime.utcnow())
q.insert(myKey,{'somedata':'comevalue'})
However, when I insert data, I always get an error:
Argument for a v1 UUID column name or value was neither a UUID, a datetime, or a number.
If I change the comparator_type to UTF8_TYPE, it works, but the order of the items when returned are not as they should be. What am I doing wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
列族的比较器用于对每行中的列进行排序。您看到该错误是因为“somedata”是有效的 utf-8,但不是有效的 uuid。
存储在 cassandra 中的行的顺序由分区器确定。您很可能使用 RandomPartitioner,它在集群中均匀分配负载,但不允许进行有意义的范围查询(行将以随机顺序返回。)
http://wiki.apache.org/cassandra/FAQ#range_rp
The comparator for a column family is used for ordering the columns within each row. You are seeing that error because 'somedata' is valid utf-8 but not a valid uuid.
The ordering of the rows stored in cassandra is determined by the partitioner. Most likely you are using RandomPartitioner which distributes load evenly across your cluster but does not allow for meaningful range queries (the rows will be returned in a random order.)
http://wiki.apache.org/cassandra/FAQ#range_rp
问题在于,在您的数据模型中,您使用时间作为行键。尽管这是可能的,但除非您还使用 ByteOrderedPartitioner,否则您不会获得有意义的排序。
因此,大多数人使用时间作为列名而不是行键来插入按时间排序的数据。在此模型中,您的插入语句如下所示:
其中
someKey
是与您要插入的整个时间序列相关的键(例如,用户名)。 (请注意,您不必将时间转换为 UUID,pycassa 会为您完成。)要存储多个值,请使用超级列或复合键。如果您确实想将时间存储在行键中,则需要指定
key_validation_class
,而不是comparator_type
。comparator_type
设置列名称的类型,而key_validation_class
设置行键的类型。请记住,除非您还使用 ByteOrderedPartitioner,否则不会对行进行排序。
The problem is that in your data model, you are using the time as a row key. Although this is possible, you won't get a meaningful ordering unless you also use the ByteOrderedPartitioner.
For this reason, most people insert time-ordered data using the time as a column name, not a row key. In this model, your insert statement would look like:
where
someKey
is a key that relates to the entire time series that you're inserting (for example, a username). (Note that you don't have to convert the time to UUID, pycassa does it for you.) To store something more than a single value, use a supercolumn or a composite key.If you really want to store the time in your row keys, then you need to specify
key_validation_class
, notcomparator_type
.comparator_type
sets the type of the column names, whilekey_validation_class
sets the type of the row keys.Remember the rows will not be sorted unless you also use the ByteOrderedPartitioner.