合并一些顺序未知的排序列表

发布于 2024-10-11 05:11:17 字数 537 浏览 0 评论 0原文

我有一些元素数量可变的列表。每个列表都已排序,但排序算法未知。我想将这些列表合并到一个大列表中,其中包含按相同顺序排列的所有列表,没有重复项。

输入示例:

  1. XS,M,L,XL
  2. S,M,XXL
  3. XXS,XS,S,L

预期结果:

  • XXS,XS,S,M,L,XL,XXL

预期结果是通过匹配输入序列获得的为了获得包含按正确顺序排列的每个输入序列的元素的合并结果,如下所示:

    XS   M L XL
       S M      XXL
XXS XS S   L
-------------------
XXS XS S M L XL XXL

如果存在位置不明确的元素,则该函数应发出通知。在这里,它将是 XXL(它可以保留在 M、L 或 XL 之后),我需要在 XL 之后手动指定其位置(因为在这里我知道排序算法并且可以提供帮助)。 我考虑过定义每两个元素对,每对按原始列表中的顺序排列。由此可以构建完整的列表。

I've some lists with variable number of elements. Each list is sorted, but the sorting algorithm is not known. I would like to merge the lists into one big list which contains all lists in same order, without duplicates.

Example Input:

  1. XS,M,L,XL
  2. S,M,XXL
  3. XXS,XS,S,L

Expected Result:

  • XXS,XS,S,M,L,XL,XXL

The expected result is obtained by matching up the input sequences in order to obtain a merged result that contains the elements of each input sequence in the correct order, like this:

    XS   M L XL
       S M      XXL
XXS XS S   L
-------------------
XXS XS S M L XL XXL

The function should notify, if there are elements which have ambiguous positions. Here, it would be XXL (it could stay after M,L or XL) and I need to specify its position manually after XL (because here I know the sorting algorithm and can help).
I thought about defining pairs of every two elements, each pair in order as in original list. From this one could build the complete list.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

时常饿 2024-10-18 05:11:17

这可以通过拓扑排序算法来解决。

如果您将每个输入序列视为通过有向图的路径,则拓扑排序将从左到右对节点集进行排序,以使每个有向边都指向右侧。
拓扑排序后的有向图的图

拓扑排序 包括这个算法,由 Arthur Kahn 在 1962 年首次描述:

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
    remove a node n from S
    insert n into L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S
if graph has edges then
    return error (graph has at least one cycle)
else 
    return L (a topologically sorted order)

这个算法,如所写的,如果发现不明确的序列,实际上不会失败,但这很容易通过在循环的开头插入一个检查来添加,如下所示:

...
while S is non-empty do
    if S contains more than 1 item
        return error (inputs are ambiguous)
    remove a node n from S
    ...

我不知道您正在使用什么语言,但我已将这个 PHP 实现放在一起作为概念证明:

function mergeSequences($sequences, $detectAmbiguity = false) {

    // build a list of nodes, with each node recording a list of all incoming edges
    $nodes = array();
    foreach ($sequences as $seq) {
        foreach ($seq as $i => $item) {
            if (!isset($nodes[$item])) $nodes[$item] = array();
            if ($i !== 0) {
                $nodes[$item][] = $seq[$i-1];
            }
        }
    }

    // build a list of all nodes with no incoming edges
    $avail = array();
    foreach ($nodes as $item => $edges) {
        if (count($edges) == 0) {
            $avail[] = $item;
            unset($nodes[$item]);
        }
    }

    $sorted = array();
    $curr = '(start)';
    while (count($avail) > 0) {

        // optional: check for ambiguous sequence
        if ($detectAmbiguity && count($avail) > 1) {
           throw new Exception("Ambiguous sequence: {$curr} can be followed by " . join(' or ', $avail));
        }

        // get the next item and add it to the sorted list
        $curr = array_pop($avail);
        $sorted[] = $curr;

        // remove all edges from the currently selected items to all others
        foreach ($nodes as $item => $edges) {
            $nodes[$item] = array_diff($edges, array($curr));                
            if (count($nodes[$item]) == 0) {
                $avail[] = $item;
                unset($nodes[$item]);
            }
        }

    }

    if (count($nodes) > 0) {
        throw new Exception('Sequences contain conflicting information. Cannot continue after: ' . join(', ', $sorted));
    }

    return $sorted;
}

您可以像这样调用该函数:

$input = array(
    array('XS', 'M', 'L', 'XL'),
    array('S', 'M', 'XXL'),
    array('XXS', 'XS', 'S', 'L'),
);
echo(join(', ', mergeSequences($input)));
echo(join(', ', mergeSequences($input, true)));

得到以下输出:

XXS, XS, S, M, XXL, L, XL
Uncaught exception 'Exception' with message 'Ambiguous sequence: M can be followed by L or XXL'

This can be solved with a Topological Sort algorithm.

If you consider each of your input sequences to be a path through a directed graph, a topological sort will order your set of nodes from left to right in such a way that each directed edge points to the right.
Diagram of a directed graph after topological sorting

The wikipedia page on Topological Sorting includes this algorithm, first described by Arthur Kahn in 1962:

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
    remove a node n from S
    insert n into L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S
if graph has edges then
    return error (graph has at least one cycle)
else 
    return L (a topologically sorted order)

This algorithm, as written, doesn't actually fail if it finds ambiguous sequences, but that's easy to add by inserting a check at the beginning of the loop, like this:

...
while S is non-empty do
    if S contains more than 1 item
        return error (inputs are ambiguous)
    remove a node n from S
    ...

I don't know what language you're working in, but I've thrown together this PHP implementation as a proof of concept:

function mergeSequences($sequences, $detectAmbiguity = false) {

    // build a list of nodes, with each node recording a list of all incoming edges
    $nodes = array();
    foreach ($sequences as $seq) {
        foreach ($seq as $i => $item) {
            if (!isset($nodes[$item])) $nodes[$item] = array();
            if ($i !== 0) {
                $nodes[$item][] = $seq[$i-1];
            }
        }
    }

    // build a list of all nodes with no incoming edges
    $avail = array();
    foreach ($nodes as $item => $edges) {
        if (count($edges) == 0) {
            $avail[] = $item;
            unset($nodes[$item]);
        }
    }

    $sorted = array();
    $curr = '(start)';
    while (count($avail) > 0) {

        // optional: check for ambiguous sequence
        if ($detectAmbiguity && count($avail) > 1) {
           throw new Exception("Ambiguous sequence: {$curr} can be followed by " . join(' or ', $avail));
        }

        // get the next item and add it to the sorted list
        $curr = array_pop($avail);
        $sorted[] = $curr;

        // remove all edges from the currently selected items to all others
        foreach ($nodes as $item => $edges) {
            $nodes[$item] = array_diff($edges, array($curr));                
            if (count($nodes[$item]) == 0) {
                $avail[] = $item;
                unset($nodes[$item]);
            }
        }

    }

    if (count($nodes) > 0) {
        throw new Exception('Sequences contain conflicting information. Cannot continue after: ' . join(', ', $sorted));
    }

    return $sorted;
}

You can call the function like this:

$input = array(
    array('XS', 'M', 'L', 'XL'),
    array('S', 'M', 'XXL'),
    array('XXS', 'XS', 'S', 'L'),
);
echo(join(', ', mergeSequences($input)));
echo(join(', ', mergeSequences($input, true)));

To get the following output:

XXS, XS, S, M, XXL, L, XL
Uncaught exception 'Exception' with message 'Ambiguous sequence: M can be followed by L or XXL'
小傻瓜 2024-10-18 05:11:17

您正在尝试合并部分有序集或偏序集。合并中不明确的部分称为反链。因此,您需要一种算法来合并偏序集并告诉您反链是什么。

这里有一篇论文描述了合并偏序集和检测反链的算法作为第一作者主页的链接,以防您想联系他看看是否有是否有可用的源代码。

You are trying to merge partially ordered sets, or posets. The ambiguous parts of the merge are called antichains. So, you want an algorithm that merges posets and tells you what the antichains are.

Here is a paper describing an algorithm for merging posets and detecting antichains, as well as a link to the first author's homepage in case you want to contact him to see if there is any source code available.

千柳 2024-10-18 05:11:17

这就是我要做的:

  1. 预处理列表:找出 ​​XXS 小于 XS 小于 S 小于 ... XXL 是一个[约束满足问题](http://en.wikipedia.org/wiki/Constraint_satisfaction_problem )。此类问题涉及在给定原始列表中定义的约束的情况下在所有元素中搜索正确的顺序。
  2. 完成步骤 1 后,创建从集合 {XXS, ..., XXL} 到集合 {1, ..., 6} 的双向映射。
  3. 对于每个列表,使用 2 中定义的映射创建另一个列表 使用修改
  4. 后的[合并排序](http://en.wikipedia.org/wiki/Merge_sort) 组合两个列表。修改合并算法,以便报告正在比较的两个项目是否相同(并忽略正在合并的项目之一)。
  5. 对每对列表执行步骤 4,直到有一个列表。
  6. 使用 2 中定义的映射创建列表的文本版本。

Here's what I would do:

  1. Preprocess the lists: figuring out that XXS is smaller than XS is smaller than S is smaller than ... XXL is a [constraint satisfaction problem](http://en.wikipedia.org/wiki/Constraint_satisfaction_problem). This type of problem involves searching for a correct ordering among all the elements given the constraints defined in the original lists.
  2. Create a bidirectional mapping from the set {XXS, ..., XXL} to the set {1, ..., 6}, after you have done step 1.
  3. For each list, create another list by using the mapping defined in 2.
  4. Use a modified [merge sort](http://en.wikipedia.org/wiki/Merge_sort) to combine two lists. Modify the merge algorithm so that it reports if two items being compared are identical (and disregards one of the items being merged).
  5. Do step 4 for each pair of lists until there is one list.
  6. Using the mapping defined in 2, create the text-version of the list.
流年里的时光 2024-10-18 05:11:17

对于排序部分,根据您的描述,我认为合并排序就足够了。需要修改的一件事是在合并期间,如果输入数组的第一个元素与结果数组相同,我们应该跳过输入数组中的元素。

如果我理解正确,您想要构建整个可能的输入元素的总顺序。输入数组中已经定义了一些偏序(因为它们已经排序),而其他偏序则需要用户指定。例如,在问题中,顺序

'S'<'M'<'XXL'

'XS'<'M'<'L'<'XL'

'XXS'<'XS'<'S '<'L'

定义明确。但算法仍然不知道 'XXL' 是否大于或小于 'XL'、'L'。
既然三个输入数组已排序,则必须存在输入元素的全序。因此,我的建议是向您的数据提供商询问所有可能的数据元素的有序列表。听起来很愚蠢,但这是一个简单的方法。

如果这个列表不可用,一个简单的处理方法是提示用户进行配对排序,然后检查这是否与现有的输入序列冲突,并在算法遇到不明确的配对时记住它。我认为拓扑排序比这个应用程序更强大。由于我们处理单个数据元素,因此必须存在全序。而拓扑排序是处理偏序的。

For sorting part, I think Merge Sort is enough according to your description. One thing need to modify is during merging, we should skip elements at the input array if the first element of the input array is the same as the result array.

If I understand correctly, you want to build a total order of the whole possible input elements. Some partial order is already defined in the input arraies(since they are already sorted), while others need to specified by users. For example in the question, order

'S'<'M'<'XXL'

'XS'<'M'<'L'<'XL'

'XXS'<'XS'<'S'<'L'

is well defined. But algorithm still doesn't know whether 'XXL' is bigger or smaller than 'XL', 'L'.
Well since the three input arraies is sorted, there must exist a total order of the input elements. So my suggestion is to ask your data provider for a ordered list of all possible data elements. Sounds stupid, but it's a easy way.

If this list is not available, a easy way to deal is to prompt for a pair sort for the user, then check whether this conflict with existing input sequence and remember it, when the algorithm encounter into an ambiguous pair. I think topology sorting is more powerful than this application. Since we deal with single data elements, there must exit a total order. While topology sorting is to deal with partial order.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文