获取hbase中的所有家庭

发布于 2024-11-04 13:06:51 字数 110 浏览 3 评论 0原文

我有一个 hbase 表

行:单词,族:日期

我想获取日期“d”处所有单词的扫描仪,我该怎么做?所以我不想指定行值。

I have a hbase table with

Rows: word, Families: date

I want to get a scanner for all the words at the date 'd', how I can do this? So I don't want to specify the row value.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

花桑 2024-11-11 13:06:58

您的问题不清楚您要从哪里获取扫描仪,因此我将像来自 HBase 命令行一样对待它。我使用 thrift 库与 hbase 交互,CLI 命令的转换非常明显。我认为它们也能很好地转换到您要使用扫描仪的任何其他界面。

要获取特定列族的所有行,您可以使用以下命令

scan 'table_name', {COLUMNS => 'col_family:'}

对于您的情况(减去“table_name”,因为我不知道),它看起来像这样

scan 'yourTable', {COLUMNS => 'd:'}

,它将返回列族中的所有行d

如果您还想指定从什么 RowKeys 开始,它将类似于

scan 'yourTable', {COLUMNS => 'd:', STARTROW => 'word'}

That command will START at the row key word 并获取该点之后的所有行。如果您想将其限制为仅 RowKey word,则还必须添加 STOPROWSTOPROW 不包含在结果中。因此,您不能扫描“yourTable”,{COLUMNS => 'd:',STARTROW => '单词', STOPROW => 'word'} 因为这不会返回任何内容。
指定 STOPROW 需要了解 RowKey 值的一些知识。我不知道你的价值观,所以很难给出一个好的例子。我经常做的是使用下一个字符(在 ASCII 集中)作为起始行的最后一个字符。在你的例子中,我会尝试,

scan 'yourTable', {COLUMNS => 'd:', STARTROW => 'word', STOPROW => 'wore'}

我不会保证这会一直有效,但在大多数情况下它可能会有效。也许所有情况,我只是还没有解决。 :)

希望有帮助。

HBase shell 命令的一个很好的资源是 http://wiki.apache.org/hadoop/Hbase/Shell

Your question isn't clear where you are trying to get a scanner from, so I'm going to treat it like it's from the HBase command line. I've used the thrift library to interact with hbase and the CLI commands translate pretty obviously to that. I assume they will also translate well to any other interface you are getting a scanner for.

To get all the rows for a specific Column Family, you would use the following command

scan 'table_name', {COLUMNS => 'col_family:'}

For your case (minus 'table_name' 'cause I don't know that) it would look something like

scan 'yourTable', {COLUMNS => 'd:'}

That will return all rows in the column family d.

If you also want to specify what RowKeys to start at, it will look something like

scan 'yourTable', {COLUMNS => 'd:', STARTROW => 'word'}

That command will START at the row key word and get all rows after that point. If you want to limit it to just the RowKey word, you will also have to add the STOPROW. The STOPROW isn't included in the results. So you CAN'T do scan 'yourTable', {COLUMNS => 'd:', STARTROW => 'word', STOPROW => 'word'} as that will return nothing.
Specifying a STOPROW takes some knowledge of the RowKey values. I don't know your values, so it's hard to give a good example. What I often do is use the next character (in the ASCII set) for the last character of my start row. In your example I'd try

scan 'yourTable', {COLUMNS => 'd:', STARTROW => 'word', STOPROW => 'wore'}

I'm not going to promise this will work all the time, but it is likely to work in most cases. Perhaps all cases, I just haven't worked it out. :)

Hopefully that helps.

A good resource for HBase shell commands is http://wiki.apache.org/hadoop/Hbase/Shell.

一场春暖 2024-11-11 13:06:58

我假设您正在谈论使用 Java API 的 scan 命令

如果我正确理解您的结构,您目前无法在不进行全表扫描的情况下按日期检索单词。 - 你可以 setFilter 在扫描上,但它仍然必须转到每一行来检查

您是否没有指定,但我猜每个单词可能会出现在许多日期中(如果您的意思是您有每个日期都有一个家庭然后注意不建议超过 2-3 个家庭)

如果你想要一种相对有效的存储方式,我建议你将结构更改为
关键字 Word0xDate 并将日期存储在 TimeStamp 中,然后将一些 1 字节值作为数据(这样就会存在一行)
在存储方面,它将与您当前的解决方案相同(加上 2 个字节,您可以通过缩短系列和限定符名称来抵消),并且您将能够扫描时间戳或时间戳范围(setTimestampsetTimeRange 分别)这会更有效,因为 hbase 将跳过存储不相关时间戳的文件)

I assume you are talking about using the scan command of the Java API

If I understand your structure correctly you don't currently have a way to retrive words by date without a full table scan. - you can setFilter on the scan but it will still have to go to each row to check that

You didn't specify but I'd guess each word can occur in many dates (if you meant you have a family for each date then note that it isn't recommended to have more than 2-3 families)

If you want a relatively efficient way to store that I'd suggest you change your structure to
Key Word0xDate and store the date in the the TimeStamp and then some 1 byte value as the data (so that a row will exist)
Storage-wise it would be the same as you current solution (plus 2 bytes which you can offset by shortening the family and qualifier names) and you'd be able to get a scan for a timestamp or a range of timestamps (setTimestamp and setTimeRange respectively) which well be more efficient as hbase will skip files where irrelevant timestamps are stored)

白云悠悠 2024-11-11 13:06:58

试试这个:

     HTable t = new HTable(conf,"YourROW");
     ResultScanner scanner = t.getScanner(new Scan());    
     for (Result rr = scanner.next(); rr != null; rr = scanner.next()) 
     {
           if (rr.getValue("YourFamily" , "YourQualifier").equals(Bytes.toBytes("d"))
           {
                Get g = new Get(key);
                Result row = t.get(g);
                System.out.println("" + row.toString()); //print all data from this row
           }
     }

Try this:

     HTable t = new HTable(conf,"YourROW");
     ResultScanner scanner = t.getScanner(new Scan());    
     for (Result rr = scanner.next(); rr != null; rr = scanner.next()) 
     {
           if (rr.getValue("YourFamily" , "YourQualifier").equals(Bytes.toBytes("d"))
           {
                Get g = new Get(key);
                Result row = t.get(g);
                System.out.println("" + row.toString()); //print all data from this row
           }
     }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文