获取hbase中的所有家庭
我有一个 hbase 表
行:单词,族:日期
我想获取日期“d”处所有单词的扫描仪,我该怎么做?所以我不想指定行值。
I have a hbase table with
Rows: word, Families: date
I want to get a scanner for all the words at the date 'd', how I can do this? So I don't want to specify the row value.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您的问题不清楚您要从哪里获取扫描仪,因此我将像来自 HBase 命令行一样对待它。我使用 thrift 库与 hbase 交互,CLI 命令的转换非常明显。我认为它们也能很好地转换到您要使用扫描仪的任何其他界面。
要获取特定列族的所有行,您可以使用以下命令
对于您的情况(减去“table_name”,因为我不知道),它看起来像这样
,它将返回列族中的所有行
d
。如果您还想指定从什么
RowKeys
开始,它将类似于That command will START at the row key
word
并获取该点之后的所有行。如果您想将其限制为仅RowKey word
,则还必须添加STOPROW
。STOPROW
不包含在结果中。因此,您不能扫描“yourTable”,{COLUMNS => 'd:',STARTROW => '单词', STOPROW => 'word'}
因为这不会返回任何内容。指定
STOPROW
需要了解 RowKey 值的一些知识。我不知道你的价值观,所以很难给出一个好的例子。我经常做的是使用下一个字符(在 ASCII 集中)作为起始行的最后一个字符。在你的例子中,我会尝试,我不会保证这会一直有效,但在大多数情况下它可能会有效。也许所有情况,我只是还没有解决。 :)
希望有帮助。
HBase shell 命令的一个很好的资源是 http://wiki.apache.org/hadoop/Hbase/Shell 。
Your question isn't clear where you are trying to get a scanner from, so I'm going to treat it like it's from the HBase command line. I've used the thrift library to interact with hbase and the CLI commands translate pretty obviously to that. I assume they will also translate well to any other interface you are getting a scanner for.
To get all the rows for a specific Column Family, you would use the following command
For your case (minus 'table_name' 'cause I don't know that) it would look something like
That will return all rows in the column family
d
.If you also want to specify what
RowKeys
to start at, it will look something likeThat command will START at the row key
word
and get all rows after that point. If you want to limit it to just theRowKey word
, you will also have to add theSTOPROW
. TheSTOPROW
isn't included in the results. So you CAN'T doscan 'yourTable', {COLUMNS => 'd:', STARTROW => 'word', STOPROW => 'word'}
as that will return nothing.Specifying a
STOPROW
takes some knowledge of the RowKey values. I don't know your values, so it's hard to give a good example. What I often do is use the next character (in the ASCII set) for the last character of my start row. In your example I'd tryI'm not going to promise this will work all the time, but it is likely to work in most cases. Perhaps all cases, I just haven't worked it out. :)
Hopefully that helps.
A good resource for HBase shell commands is http://wiki.apache.org/hadoop/Hbase/Shell.
我假设您正在谈论使用 Java API 的 scan 命令
如果我正确理解您的结构,您目前无法在不进行全表扫描的情况下按日期检索单词。 - 你可以 setFilter 在扫描上,但它仍然必须转到每一行来检查
您是否没有指定,但我猜每个单词可能会出现在许多日期中(如果您的意思是您有每个日期都有一个家庭然后注意不建议超过 2-3 个家庭)
如果你想要一种相对有效的存储方式,我建议你将结构更改为
关键字 Word0xDate 并将日期存储在 TimeStamp 中,然后将一些 1 字节值作为数据(这样就会存在一行)
在存储方面,它将与您当前的解决方案相同(加上 2 个字节,您可以通过缩短系列和限定符名称来抵消),并且您将能够扫描时间戳或时间戳范围(setTimestamp 和 setTimeRange 分别)这会更有效,因为 hbase 将跳过存储不相关时间戳的文件)
I assume you are talking about using the scan command of the Java API
If I understand your structure correctly you don't currently have a way to retrive words by date without a full table scan. - you can setFilter on the scan but it will still have to go to each row to check that
You didn't specify but I'd guess each word can occur in many dates (if you meant you have a family for each date then note that it isn't recommended to have more than 2-3 families)
If you want a relatively efficient way to store that I'd suggest you change your structure to
Key Word0xDate and store the date in the the TimeStamp and then some 1 byte value as the data (so that a row will exist)
Storage-wise it would be the same as you current solution (plus 2 bytes which you can offset by shortening the family and qualifier names) and you'd be able to get a scan for a timestamp or a range of timestamps (setTimestamp and setTimeRange respectively) which well be more efficient as hbase will skip files where irrelevant timestamps are stored)
试试这个:
Try this: