Hadoop 似乎在对给定reduce 调用的值进行迭代期间修改了我的关键对象

发布于 2024-11-09 06:09:23 字数 2160 浏览 5 评论 0原文

Hadoop 版本：0.20.2（在 Amazon EMR 上）

问题：我有一个在映射阶段编写的自定义密钥，我在下面添加了该密钥。在reduce 调用期间，我对给定键的值进行一些简单的聚合。我面临的问题是，在减少调用中的值迭代期间，我的密钥发生了更改，并且我得到了该新密钥的值。

我的密钥类型：

 class MyKey implements WritableComparable<MyKey>, Serializable {
    private MyEnum type; //MyEnum is a simple enumeration.
    private TreeMap<String, String> subKeys;

    MyKey() {} //for hadoop
    public MyKey(MyEnum t, Map<String, String> sK) { type = t; subKeys = new TreeMap(sk); }

    public void readFields(DataInput in) throws IOException {
      Text typeT = new Text();
      typeT.readFields(in);
      this.type = MyEnum.valueOf(typeT.toString());

      subKeys.clear();
      int i = WritableUtils.readVInt(in);
      while ( 0 != i-- ) {
        Text keyText = new Text();
        keyText.readFields(in);

        Text valueText = new Text();
        valueText.readFields(in);

        subKeys.put(keyText.toString(), valueText.toString());
    }
  }

  public void write(DataOutput out) throws IOException {
    new Text(type.name()).write(out);

    WritableUtils.writeVInt(out, subKeys.size());
    for (Entry<String, String> each: subKeys.entrySet()) {
        new Text(each.getKey()).write(out);
        new Text(each.getValue()).write(out);
    }
  }

  public int compareTo(MyKey o) {
    if (o == null) {
        return 1;
    }

    int typeComparison = this.type.compareTo(o.type); 
    if (typeComparison == 0) {
        if (this.subKeys.equals(o.subKeys)) {
            return 0;
        }
        int x = this.subKeys.hashCode() - o.subKeys.hashCode();
        return (x != 0 ? x : -1);
    }
    return typeComparison;
  }
}

这个密钥的实现有什么问题吗？以下是我在减少调用中遇到键混合的代码：

reduce(MyKey k, Iterable<MyValue> values, Context context) {
   Iterator<MyValue> iterator = values.iterator();
   int sum = 0;
   while(iterator.hasNext()) {
        MyValue value = iterator.next();
        //when i come here in the 2nd iteration, if i print k, it is different from what it was in iteration 1.
        sum += value.getResult();
   }
   //write sum to context
}

对此的任何帮助将不胜感激。

原文

Hadoop Version: 0.20.2 (On Amazon EMR)

Problem: I have a custom key that i write during map phase which i added below. During the reduce call, I do some simple aggregation on values for a given key. Issue I am facing is that during the iteration of values in reduce call, my key got changed and i got values of that new key.

My key type:

 class MyKey implements WritableComparable<MyKey>, Serializable {
    private MyEnum type; //MyEnum is a simple enumeration.
    private TreeMap<String, String> subKeys;

    MyKey() {} //for hadoop
    public MyKey(MyEnum t, Map<String, String> sK) { type = t; subKeys = new TreeMap(sk); }

    public void readFields(DataInput in) throws IOException {
      Text typeT = new Text();
      typeT.readFields(in);
      this.type = MyEnum.valueOf(typeT.toString());

      subKeys.clear();
      int i = WritableUtils.readVInt(in);
      while ( 0 != i-- ) {
        Text keyText = new Text();
        keyText.readFields(in);

        Text valueText = new Text();
        valueText.readFields(in);

        subKeys.put(keyText.toString(), valueText.toString());
    }
  }

  public void write(DataOutput out) throws IOException {
    new Text(type.name()).write(out);

    WritableUtils.writeVInt(out, subKeys.size());
    for (Entry<String, String> each: subKeys.entrySet()) {
        new Text(each.getKey()).write(out);
        new Text(each.getValue()).write(out);
    }
  }

  public int compareTo(MyKey o) {
    if (o == null) {
        return 1;
    }

    int typeComparison = this.type.compareTo(o.type); 
    if (typeComparison == 0) {
        if (this.subKeys.equals(o.subKeys)) {
            return 0;
        }
        int x = this.subKeys.hashCode() - o.subKeys.hashCode();
        return (x != 0 ? x : -1);
    }
    return typeComparison;
  }
}

Is there anything wrong with this implementation of key? Following is the code where I am facing the mixup of keys in reduce call:

reduce(MyKey k, Iterable<MyValue> values, Context context) {
   Iterator<MyValue> iterator = values.iterator();
   int sum = 0;
   while(iterator.hasNext()) {
        MyValue value = iterator.next();
        //when i come here in the 2nd iteration, if i print k, it is different from what it was in iteration 1.
        sum += value.getResult();
   }
   //write sum to context
}

Any help in this would be greatly appreciated.

分享到QQ

分享到微博