Cassandra 中无价值的列技术 - 数据库架构

发布于 2024-11-27 03:18:11 字数 1022 浏览 0 评论 0原文

我正在使用 Cassandra 0.8.2

我正在尝试使用“无值列”技术来设置我的 cassandra 架构。无价值专栏背后的想法如下：专栏的名称成为相关信息和内容。 “名称/值”对的值为空。这用于使查询更快——非规范化的一个例子。我希望该列的名称是反向链接的 url。行键是反向链接的目标 url 的 UUID。这甚至是一个好主意/架构设计吗？

我使用一个非常基本的例子来阐明我的问题的要点。以下是我使用 Cassandra-Cli 进行的设置：

create column family ArticleBackLinks 
with comparator = UTF8Type
and key_validation_class = UTF8Type
and default_validation_class = UTF8Type
and column_metadata = 
[
{column_name: www.arstechnica.com, validation_class: UTF8Type},        
{column_name: www.apple.com, validation_class:UTF8Type},         
{column_name: www.cnn.com, validation_class: UTF8Type},      
{column_name: www.stackoverflow.com, validation_class: UTF8Type}, 
{column_name: www.reddit.com, validation_class: UTF8Type}
];

我收到错误：

Command not found: `create column family ArticleBackLink...

我认为我的错误是由于我在 column_name 中使用的句点造成的。简而言之，我想知道你们中的一些人是否遇到了在 Cassandra 中使用“无价值列”想法的更好方法？无价值的色谱柱技术有什么好的/更好的例子吗？我的想法是否是使用无价值列技术的正确方法？

预先感谢各位。

原文

I'm using Cassandra 0.8.2

I am attempting to use the "valueless column" technique to set up my cassandra schema. The idea behind the valueless column is the following: The name of your column becomes the relevant information & the value of the "name/value" pair is empty. This is used to make queries faster - an example of denormalization. I want the name of the column to be the url of the back link. The row key is be a UUID of the target url of the back link. Is this even a good idea/schema design?

I'm using a very basic example to get the point of my question across. Here's what I have set up using the Cassandra-Cli:

create column family ArticleBackLinks 
with comparator = UTF8Type
and key_validation_class = UTF8Type
and default_validation_class = UTF8Type
and column_metadata = 
[
{column_name: www.arstechnica.com, validation_class: UTF8Type},        
{column_name: www.apple.com, validation_class:UTF8Type},         
{column_name: www.cnn.com, validation_class: UTF8Type},      
{column_name: www.stackoverflow.com, validation_class: UTF8Type}, 
{column_name: www.reddit.com, validation_class: UTF8Type}
];

I get the error:

Command not found: `create column family ArticleBackLink...

I think my error is due to the period I am using in the column_name. In short, I would like to know if some of you have come across better ways to use the "valueless column" idea in Cassandra? Any good/better examples of the valueless column technique? Is my idea even the right way to use the valueless column technique?

Thanks in advance guys.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绳情 2024-12-04 03:18:11

我认为 Cassandra 不喜欢 column_name 中的 dot，以下内容

[default@stackoverflow] create column family ArticleBackLinks with
...     comparator = UTF8Type and
...     default_validation_class = UTF8Type and
...     column_metadata =
...     [
...     {column_name: 'www.arstechnica.com', validation_class: UTF8Type},
...     {column_name: 'www.apple.com', validation_class:UTF8Type},
...     {column_name: 'www.cnn.com', validation_class: UTF8Type},
...     {column_name: 'www.stackoverflow.com', validation_class: UTF8Type},
...     {column_name: 'www.reddit.com', validation_class: UTF8Type}
...     ];
881b31f0-bc64-11e0-0000-242d50cf1ff7
Waiting for schema agreement...
... schemas agree across the cluster

有效顺便说一句，由于您使用的是 Cassandra 0.8.2，您应该利用 CQL

所以，像这样的语句将将来会有帮助

UPDATE <COLUMN FAMILY> [USING <CONSISTENCY> 
[AND TIMESTAMP <timestamp>] [AND TTL <timeToLive>]] 
SET name1 = value1, name2 = value2 WHERE <KEY> = keyname;

请参阅此

更新： 根据评论要求添加更多想法

将分组信息保留在一个地方是个好主意。它提高了 Cassandra 提供的效率。

例如，您的案例可以将 category 作为 RowKey，将 url 设为 column_name。因此，在前端，您可以快速显示分类视图，因为您知道 arstechnicia 和 stackoverflow 属于 technology 组，该组是 rowKey 。当您插入数据时，它会增加一点额外的工作。

我使用 Cassandra 0.6.x，遗憾的是我无法详细了解 Cassandra 0.7.0+ 支持的二级索引。但据说，您可以通过在主 CF 中添加一个列（例如，category）来实现上面解释的内容，该列的索引由 ArticleBackLink 保存，并且只需使用 CQL 的 select 进行查询...在哪里...。

您可能会研究二级索引，这可能会消除对新“索引 CF”的需要。您可能想研究这些：

Cassandra 0.7 中的二级索引
Cassandra Wiki 常见问题
<块引用>
问：创建二级索引与手动创建“索引”CF（例如“users_by_country”）之间有区别吗？
答：是的。首先，在创建自己的索引时，一个节点可能会索引另一个节点保存的数据。其次，索引和数据的更新不是原子的。

I think Cassandra does not like the dot in column_name, the following works

[default@stackoverflow] create column family ArticleBackLinks with
...     comparator = UTF8Type and
...     default_validation_class = UTF8Type and
...     column_metadata =
...     [
...     {column_name: 'www.arstechnica.com', validation_class: UTF8Type},
...     {column_name: 'www.apple.com', validation_class:UTF8Type},
...     {column_name: 'www.cnn.com', validation_class: UTF8Type},
...     {column_name: 'www.stackoverflow.com', validation_class: UTF8Type},
...     {column_name: 'www.reddit.com', validation_class: UTF8Type}
...     ];
881b31f0-bc64-11e0-0000-242d50cf1ff7
Waiting for schema agreement...
... schemas agree across the cluster

By the way, since you are using Cassandra 0.8.2 you should leverage CQL

So, statement like this will be helpful in future

UPDATE <COLUMN FAMILY> [USING <CONSISTENCY> 
[AND TIMESTAMP <timestamp>] [AND TTL <timeToLive>]] 
SET name1 = value1, name2 = value2 WHERE <KEY> = keyname;

Refer this

updated: added more thoughts as comment asked

It's a good idea to keep grouped information at one place. It adds on efficiency that Cassandra provides.

For example, your case can have category as RowKey and urls be column_name. So, on your front end, you can display categorized view quickly, because you know that arstechnicia and stackoverflow comes under technology group which is a rowKey. It adds a tiny bit of extra work when you insert data.

I use Cassandra 0.6.x, so sadly I can't tell a lot about secondary index that Cassandra 0.7.0+ supports. But supposedly, you can achieve what explained above by adding a column say, category, in the main CF whose index is held by ArticleBackLink and just query using CQL's select... where....

You might look into secondary index that might vanish the need of having a new 'index CF`. You may want to look into these:

Secondary Index in Cassandra 0.7
Cassandra Wiki FAQ
Q: Is there a difference between creating a secondary index vs creating an "index" CF manually such as "users_by_country"?
A: Yes. First, when creating your own index, a node may index data held by another node. Second, updates to the index and data are not atomic.

回复收藏 0 原文

~没有更多了~