Java:平等器? (从对象集合中删除重复项)

发布于 2024-08-16 11:06:15 字数 951 浏览 4 评论 0原文

我有一堆Puzzle类的对象。我已经重写了 equals()hashCode()。当需要向用户展示解决方案时,我想过滤掉所有“相似”的谜题(按照我定义的标准),因此用户只能看到其中的一个。

相似性是传递性的。

示例:

Result of computations:
A    (similar to A)
B    (similar to C)
C
D

在这种情况下,只会向用户呈现 A 或 D 和 B 或 C,而不是两个相似的谜题。两个相似的谜题同样有效。唯一重要的是它们不能同时显示给用户。

为了实现这一点,我想使用禁止重复的 ADT。但是,我不想更改 equals()hashCode() 方法来返回有关相似性的值。在这种情况下,我可以使用一些Equalator(例如Comparator)吗?或者我应该采取另一种方式来做到这一点?

我正在做的课程是一个维护字母网格的谜题。 (就像拼字游戏一样。)如果拼图包含相同的单词,但方向不同,则被认为是相似的。因此,以下谜题:

                                    (2, 2): A           
                                    (2, 1): C           
                                    (2, 0): T

将类似于:

                    (1, 2): A           
                    (1, 1): C           
                    (1, 0): T      

I have a bunch of objects of a class Puzzle. I have overridden equals() and hashCode(). When it comes time to present the solutions to the user, I'd like to filter out all the Puzzles that are "similar" (by the standard I have defined), so the user only sees one of each.

Similarity is transitive.

Example:

Result of computations:
A    (similar to A)
B    (similar to C)
C
D

In this case, only A or D and B or C would be presented to the user - but not two similar Puzzles. Two similar puzzles are equally valid. It is only important that they are not both shown to the user.

To accomplish this, I wanted to use an ADT that prohibits duplicates. However, I don't want to change the equals() and hashCode() methods to return a value about similarity instead. Is there some Equalator, like Comparator, that I can use in this case? Or is there another way I should be doing this?

The class I'm working on is a Puzzle that maintains a grid of letters. (Like Scrabble.) If a Puzzle contains the same words, but is in a different orientation, it is considered to be similar. So the following to puzzle:

                                    (2, 2): A           
                                    (2, 1): C           
                                    (2, 0): T

Would be similar to:

                    (1, 2): A           
                    (1, 1): C           
                    (1, 0): T      

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

请持续率性 2024-08-23 11:06:15

好吧,你有一种测量对象之间相似性的方法。这意味着它们形成了一个度量空间

问题是,你的空间是否也是一个 欧几里得空间 就像正常的三维空间,或者整数或类似的东西?如果是,那么您可以在任意多个维度上使用二进制空间分区

(问题基本上是:你的对象和 n 维实数向量之间是否存在同态?如果是,那么你可以使用测量 n 维空间中点的接近度的技术。)

现在,如果它是 不是欧几里得空间那么你就会遇到更大的问题。程序员可能最熟悉的非欧几里得空间的一个例子是 Levenshtein 距离到字符串之间。

如果您的问题类似于查看字符串与现有字符串列表的相似程度,那么我不知道有任何算法可以在没有 O(n2) 时间。也许那里有一些。


但另一个重要的问题是:你有多少时间?有多少个物体?如果您有时间或者您的数据集足够小,以至于 O(n2) 算法是实用的,那么您只需迭代对象列表即可查看它是否低于某个阈值。如果是这样,请拒绝它。

只需重载 AbstractCollection 并替换添加功能。使用 ArrayList 或其他。您的代码看起来有点像这样,

class SimilarityRejector<T> extends AbstractCollection<T>{
     ArrayList<T> base;
     double threshold;

    public SimilarityRejector(double threshold){
        base = new ArrayList<T>();
        this.threshold = threshold;
    }

    public void add(T t){
       boolean failed = false;
       for(T compare : base){
          if(similarityComparison(t,compare) < threshold) faled = true;
       }
       if(!failed) base.add(t);
     }

    public Iterator<T> iterator() {
        return base.iterator();
    }

    public int size() {
        return base.size();
    }
}

等等。显然 T 需要是您可以对其进行比较的某个类的子类。如果您有欧几里德度量,那么您可以使用空间分区,而不是遍历所有其他项目。

Okay you have a way of measuring similarity between objects. That means they form a Metric Space.

The question is, is your space also a Euclidean space like normal three dimensional space, or integers or something like that? If it is, then you could use a binary space partition in however many dimensions you've got.

(The question is, basically: is there a homomorphism between your objects and an n-dimensional real number vector? If so, then you can use techniques for measuring closeness of points in n-dimensional space.)

Now, if it's not a euclidean space then you've got a bigger problem. An example of a non-euclidean space that programers might be most familiar with would be the Levenshtein Distance between to strings.

If your problem is similar to seeing how similar a string is to a list of already existing strings then I don't know of any algorithms that would do that without O(n2) time. Maybe there are some out there.


But another important question is: how much time do you have? How many objects? If you have time or if your data set is small enough that an O(n2) algorithm is practical, then you just have to iterate through your list of objects to see if it's below a certain threshold. If so, reject it.

Just overload AbstractCollection and replace the Add function. Use an ArrayList or whatever. Your code would look kind of like this

class SimilarityRejector<T> extends AbstractCollection<T>{
     ArrayList<T> base;
     double threshold;

    public SimilarityRejector(double threshold){
        base = new ArrayList<T>();
        this.threshold = threshold;
    }

    public void add(T t){
       boolean failed = false;
       for(T compare : base){
          if(similarityComparison(t,compare) < threshold) faled = true;
       }
       if(!failed) base.add(t);
     }

    public Iterator<T> iterator() {
        return base.iterator();
    }

    public int size() {
        return base.size();
    }
}

etc. Obviously T would need to be a subclass of some class that you can perform a comparison on. If you have a euclidean metric, then you can use a space partition, rather then going through every other item.

清风疏影 2024-08-23 11:06:15

我将使用一个包装类来相应地覆盖 equalshashCode

private static class Wrapper {
    public static final Puzzle puzzle;
    public Wrapper(Puzzle puzzle) { 
        this.puzzle = puzzle; 
    }
    @Override 
    public boolean equals(Object object) {
        // ...
    }
    @Override 
    public int hashCode() {
        // ...
    }
}

然后你把所有的谜题包起来,把它们放在地图上,然后再把它们拿出来……

public Collection<Collection<Puzzle>> method(Collection<Puzzles> puzzles) {
    Map<Wrapper,<Collection<Puzzle>> map = new HashMap<Wrapper,<Collection<Puzzle>>();
    for (Puzzle each: puzzles) {
        Wrapper wrapper = new Wrapper(each);
        Collection<Puzzle> coll = map.get(wrapper);
        if (coll == null) map.put(wrapper, coll = new ArrayList<Puzzle>());
        coll.add(puzzle);
    }
    return map.values();
}

I'd use a wrapper class that overrides equals and hashCode accordingly.

private static class Wrapper {
    public static final Puzzle puzzle;
    public Wrapper(Puzzle puzzle) { 
        this.puzzle = puzzle; 
    }
    @Override 
    public boolean equals(Object object) {
        // ...
    }
    @Override 
    public int hashCode() {
        // ...
    }
}

and then you wrap all your puzzles, put them in a map, and get them out again…

public Collection<Collection<Puzzle>> method(Collection<Puzzles> puzzles) {
    Map<Wrapper,<Collection<Puzzle>> map = new HashMap<Wrapper,<Collection<Puzzle>>();
    for (Puzzle each: puzzles) {
        Wrapper wrapper = new Wrapper(each);
        Collection<Puzzle> coll = map.get(wrapper);
        if (coll == null) map.put(wrapper, coll = new ArrayList<Puzzle>());
        coll.add(puzzle);
    }
    return map.values();
}
梦纸 2024-08-23 11:06:15
  1. 使用比较器创建 TreeSet
  2. 将所有元素添加到集合中
  3. 所有重复项都被删除
  1. Create a TreeSet using your Comparator
  2. Adds all elements into the set
  3. All duplicates are stripped out
难理解 2024-08-23 11:06:15

通常“相似性”不是传递关系。因此,第一步是从等效性而不是相似性的角度来考虑这一点。等价是自反的、对称的和传递的。

这里的简单方法是定义一个谜题包装器,其 equals() 和 hashCode() 方法是根据所讨论的等价关系实现的。

完成后,将包装的对象放入 java.util.Set 中并过滤掉重复项。

Normally "similarity" is not a transitive relationship. So the first step would be to think of this in terms of equivalence rather than similarity. Equivalence is reflexive, symmetric and transitive.

Easy approach here is to define a puzzle wrapper whose equals() and hashCode() methods are implemented according to the equivalence relation in question.

Once you have that, drop the wrapped objects into a java.util.Set and that filters out duplicates.

海风掠过北极光 2024-08-23 11:06:15

恕我直言,Gili(带有自定义比较器的 TreeSet)描述了最优雅的方式。

但如果你想自己做,这似乎是最简单、最清晰的解决方案:

/**
 * Distinct input list values (cuts duplications)
 * @param items items to process
 * @param comparator comparator to recognize equal items
 * @return new collection with unique values
 */
public static <T> Collection<T> distinctItems(List<T> items, Comparator<T> comparator) {
    List<T> result = new ArrayList<>();

    for (int i = 0; i < items.size(); i++) {
        T item = items.get(i);

        boolean exists = false;
        for (int j = 0; j < result.size(); j++) {
            if (comparator.compare(result.get(j), item) == 0) {
                exists = true;
                break;
            }
        }

        if (!exists) {
            result.add(item);
        }
    }

    return result;
}

IMHO, most elegant way was described by Gili (TreeSet with custom Comparator).

But if you like to make it by yourself, seems this easiest and clearest solution:

/**
 * Distinct input list values (cuts duplications)
 * @param items items to process
 * @param comparator comparator to recognize equal items
 * @return new collection with unique values
 */
public static <T> Collection<T> distinctItems(List<T> items, Comparator<T> comparator) {
    List<T> result = new ArrayList<>();

    for (int i = 0; i < items.size(); i++) {
        T item = items.get(i);

        boolean exists = false;
        for (int j = 0; j < result.size(); j++) {
            if (comparator.compare(result.get(j), item) == 0) {
                exists = true;
                break;
            }
        }

        if (!exists) {
            result.add(item);
        }
    }

    return result;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文