为什么 Mutation 不为现有列进行插入

发布于 2024-11-27 02:49:04 字数 885 浏览 7 评论 0原文

我正在将初始数据(爬虫程序的 URL 列表)加载到 Cassandra,状态已爬取 = 0。然后使用 Hadoop 我爬行所有链接并尝试将爬行从 0 更改为其他内容,例如 1 或 2 或 3。当我签入 Cassandra cli 界面时,获取 ColumnFamily['www.somedomain.com'] 爬虫的值列保持不变。如果在初始导入期间我没有提到已爬网列,它会正确添加。这只是算法的一部分,我需要通过其他 Map/Reduce 作业等进一步更新本专栏。

在 Thrift 和 Cassandra API 中,据说我们只有插入和删除。插入应该作为更新。

对于爬网列,我有 UTF8 类型。

突变类是这样的:

  private static Mutation getMutationCrawled(Text crawledVal)
  {
      Text column = new Text();
      column.set("crawled");

      Column c = new Column();

      c.setName(ByteBuffer.wrap(Arrays.copyOf(column.getBytes(), column.getLength())));
      c.setValue(ByteBuffer.wrap(crawledVal.getBytes()));
      c.setTimestamp(System.currentTimeMillis());

      Mutation m = new Mutation();
      m.setColumn_or_supercolumn(new ColumnOrSuperColumn());
      m.column_or_supercolumn.setColumn(c);

      return m;
  }

I am loading initial data (url list for a crawler) to Cassandra with status crawled=0. Then using Hadoop I crawl all the links and try to change crawled from 0 to something else, for example 1 or 2, or 3. When I check in Cassandra cli interface get ColumnFamily['www.somedomain.com'] the value of crawler column remains the same. If during initial import I have not mentioned crawled column, it adds correctly. This is only one part of the algorithm and I need further updates of this column with other Map/Reduce jobs, etc.

In Thrift and Cassandra API it is said that we have only inserts and deletions. Insert should work as an update.

For crawled column I have UTF8 type.

Mutation class is like this:

  private static Mutation getMutationCrawled(Text crawledVal)
  {
      Text column = new Text();
      column.set("crawled");

      Column c = new Column();

      c.setName(ByteBuffer.wrap(Arrays.copyOf(column.getBytes(), column.getLength())));
      c.setValue(ByteBuffer.wrap(crawledVal.getBytes()));
      c.setTimestamp(System.currentTimeMillis());

      Mutation m = new Mutation();
      m.setColumn_or_supercolumn(new ColumnOrSuperColumn());
      m.column_or_supercolumn.setColumn(c);

      return m;
  }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

栖迟 2024-12-04 02:49:04

Cassandra 使用突变的时间戳来解决冲突,最大的时间戳获胜。您可以将时间戳值设置为您想要的任何值,但约定是将时间戳设置为以微秒为单位的值。在上面的示例中,您设置时间戳,

 c.setTimestamp(System.currentTimeMillis());

最有可能填充值的初始导入代码是以微秒为单位设置时间戳。微秒时间戳值大于毫秒时间戳值,因此您的更新将被忽略。

Cassandra resolves conflicts using the timestamp of the mutation, with the largest timestamp winning. You can set the timestamp value to whatever you want, but the convention is to set the timestamp as a value in micro seconds. In the example above, you set the timestamp with,

 c.setTimestamp(System.currentTimeMillis());

Most likely the initial import code to populate the values is setting the timestamp in micro seconds. The micro second timestamp values are larger than the millisecond timestamp values, so your updates are being ignored.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文