查找 Mathematica 列表中大于阈值的第一个元素

发布于 2024-09-18 11:17:26 字数 94 浏览 3 评论 0原文

我想知道如何获得(已排序)列表中大于给定阈值的第一个元素。

我不太了解 Mathematica 中的列表操作函数,也许有人可以给我一个技巧来有效地做到这一点。

I was wondering how I could obtain the first element of a (already ordered) list that is greater than a given threshold.

I don't know really well the list manipulation function in Mathematica, maybe someone can give me a trick to do that efficiently.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

夜司空 2024-09-25 11:17:26

Select 满足您的需要,并且保持一致,尊重列表预先存在的顺序:

Select[list, # > threshold &, 1]

例如:

In[1]:= Select[{3, 5, 4, 1}, # > 3 &, 1]

Out[1]= {5}

您可以在第二个参数中提供所需的任何阈值或标准函数。

第三个参数指定您仅匹配一个(即第一个)元素。

希望有帮助!

Select does what you need, and will be consistent, respecting the pre-existing order of the list:

Select[list, # > threshold &, 1]

For example:

In[1]:= Select[{3, 5, 4, 1}, # > 3 &, 1]

Out[1]= {5}

You can provide whatever threshold or criterion function you need in the second argument.

The third argument specifies you only one (i.e., the first) element that matches.

Hope that helps!

夜声 2024-09-25 11:17:26

乔在他的答案中正确陈述 人们期望二分搜索技术比 Select 更快,即使列表已排序,它似乎也只是进行线性搜索:

ClearAll[selectTiming]
selectTiming[length_, iterations_] := Module[
    {lst},
    lst = Sort[RandomInteger[{0, 100}, length]];
    (Do[Select[lst, # == 2 &, 1], {i, 1, iterations}] // Timing // 
     First)/iterations
  ]

在此处输入图像描述

(出于演示目的,我随意将阈值设置为 2)。

但是,Combinatorica 中的 BinarySearch 函数 a) 不合适(它返回一个与请求的元素匹配的元素,但不是第一个(最左边的),这正是问题所要求的。

要获取最左边的元素大于阈值,给定一个有序列表,我们可以递归地进行:

binSearch[lst_,threshold_]:= binSearchRec[lst,threshold,1,Length@lst]

(*
return position of leftmost element greater than threshold
breaks if the first element is greater than threshold
lst must be sorted
*)
binSearchRec[lst_,threshold_,min_,max_] :=
    Module[{i=Floor[(min+max)/2],element},
        element=lst[[i]];
        Which[
            min==max,max,
            element <= threshold,binSearchRec[lst,threshold,i+1,max],
            (element > threshold) && ( lst[[i-1]] <= threshold ), i,
            True, binSearchRec[lst,threshold,min,i-1]
        ]
    ]

或迭代地进行:

binSearchIterative[lst_,threshold_]:=Module[
    {min=1,max=Length@lst,i,element},
    While[
        min<=max,
        i=Floor[(min+max)/2];
        element=lst[[i]];
        Which[
            min==max, Break[],
            element<=threshold, min=i+1,
            (element>threshold) && (lst[[i-1]] <= threshold), Break[],
            True, max=i-1
        ]
    ];
    i
]

递归方法更清晰,但我会坚持使用迭代方法

来测试其速度,

ClearAll[binSearchTiming]
binSearchTiming[length_, iterations_] := Module[
    {lst},
    lst = Sort[RandomInteger[{0, 100}, length]];
    (Do[binSearchIterative[lst, 2], {i, 1, iterations}] // Timing // 
     First)/iterations
  ]

这会产生

在此处输入图像描述

因此,速度更快并且具有更好的缩放行为。

实际上没有必要编译它,但我还是做了。

总之,不要对长列表使用Select

下面是

我 对手动或通过 Combinatorica 包进行二进制搜索的一些评论。与 Combinatorica 中的 BinarySearch 进行二分搜索 请注意,这不会执行问题所要求的操作(Combinatorica 中的 BinarySearch 也不会执行此操作。);我上面给出的代码确实如此。

二分搜索可以迭代地实现

binarySearch = Compile[{{arg, _Integer}, {list, _Integer, 1}},
           Module[ {min = 1, max = Length@list,
                 i, x},
               While[
                    min <= max,
                    i = Floor[(min + max)/2];
                    x = list[[i]];
                    Which[
                         x == arg, min = max = i; Break[],
                         x < arg, min = i + 1,
                         True, max = i - 1
                     ]
                ];
               If[ 0 == max,
                   0,
                   max
               ]
           ], 
        CompilationTarget -> "C", 
        RuntimeOptions -> "Speed"
];

,我们现在可以将其与 Combinatorica 中的 BinarySearch 进行比较。请注意,a) 列表必须已排序 b) 这不会返回第一个匹配元素,而是返回a匹配元素。

lst = Sort[RandomInteger[{0, 100}, 1000000]];

让我们比较两个二分查找例程。重复50000次:

Needs["Combinatorica`"]
Do[binarySearch[2, lst], {i, 50000}] // Timing
Do[BinarySearch[lst, 2], {i, 50000}] // Timing
(*
{0.073437, Null}
{4.8354, Null}
*)

所以手写的速度更快。现在,由于事实上二分搜索仅访问这些参数列表中的 6-7 个点(例如 {500000, 250000, 125000, 62500, 31250, 15625, 23437}),显然区别只是开销;例如,也许 BinarySearch 更通用,或者未编译。

Joe correctly states in his answer that one would expect a binary search technique to be faster than Select, which seem to just do a linear search even if the list is sorted:

ClearAll[selectTiming]
selectTiming[length_, iterations_] := Module[
    {lst},
    lst = Sort[RandomInteger[{0, 100}, length]];
    (Do[Select[lst, # == 2 &, 1], {i, 1, iterations}] // Timing // 
     First)/iterations
  ]

enter image description here

(I arbitrarily put the threshold at 2 for demonstration purposes).

However, the BinarySearch function in Combinatorica is a) not appropriate (it returns an element which does match the requested one, but not the first (leftmost), which is what the question is asking.

To obtain the leftmost element that is larger than a threshold, given an ordered list, we may proceed either recursively:

binSearch[lst_,threshold_]:= binSearchRec[lst,threshold,1,Length@lst]

(*
return position of leftmost element greater than threshold
breaks if the first element is greater than threshold
lst must be sorted
*)
binSearchRec[lst_,threshold_,min_,max_] :=
    Module[{i=Floor[(min+max)/2],element},
        element=lst[[i]];
        Which[
            min==max,max,
            element <= threshold,binSearchRec[lst,threshold,i+1,max],
            (element > threshold) && ( lst[[i-1]] <= threshold ), i,
            True, binSearchRec[lst,threshold,min,i-1]
        ]
    ]

or iteratively:

binSearchIterative[lst_,threshold_]:=Module[
    {min=1,max=Length@lst,i,element},
    While[
        min<=max,
        i=Floor[(min+max)/2];
        element=lst[[i]];
        Which[
            min==max, Break[],
            element<=threshold, min=i+1,
            (element>threshold) && (lst[[i-1]] <= threshold), Break[],
            True, max=i-1
        ]
    ];
    i
]

The recursive approach is clearer but I'll stick to the iterative one.

To test its speed,

ClearAll[binSearchTiming]
binSearchTiming[length_, iterations_] := Module[
    {lst},
    lst = Sort[RandomInteger[{0, 100}, length]];
    (Do[binSearchIterative[lst, 2], {i, 1, iterations}] // Timing // 
     First)/iterations
  ]

which produces

enter image description here

so, much faster and with better scaling behaviour.

Actually it's not necessary to compile it but I did anyway.

In conclusion, then, don't use Select for long lists.

This concludes my answer. There follow some comments on doing a binary search by hand or via the Combinatorica package.

I compared the speed of a (compiled) short routine to do binary search vs the BinarySearch from Combinatorica. Note that this does not do what the question asks (and neither does BinarySearch from Combinatorica); the code I gave above does.

The binary search may be implemented iteratively as

binarySearch = Compile[{{arg, _Integer}, {list, _Integer, 1}},
           Module[ {min = 1, max = Length@list,
                 i, x},
               While[
                    min <= max,
                    i = Floor[(min + max)/2];
                    x = list[[i]];
                    Which[
                         x == arg, min = max = i; Break[],
                         x < arg, min = i + 1,
                         True, max = i - 1
                     ]
                ];
               If[ 0 == max,
                   0,
                   max
               ]
           ], 
        CompilationTarget -> "C", 
        RuntimeOptions -> "Speed"
];

and we can now compare this and BinarySearch from Combinatorica. Note that a) the list must be sorted b) this will not return the first matching element, but a matching element.

lst = Sort[RandomInteger[{0, 100}, 1000000]];

Let us compare the two binary search routines. Repeating 50000 times:

Needs["Combinatorica`"]
Do[binarySearch[2, lst], {i, 50000}] // Timing
Do[BinarySearch[lst, 2], {i, 50000}] // Timing
(*
{0.073437, Null}
{4.8354, Null}
*)

So the handwritten one is faster. Now since in fact a binary search just visits 6-7 points in the list for these parameters (something like {500000, 250000, 125000, 62500, 31250, 15625, 23437} for instance), clearly the difference is simply overhead; perhaps BinarySearch is more general, for instance, or not compiled.

傾城如夢未必闌珊 2024-09-25 11:17:26
list /. {___, y_ /; y > 3, ___} :> {y}

例如

{3, 5, 4, 1} /. {___, y_ /; y > 3, ___} :> {y}

{5}
list /. {___, y_ /; y > 3, ___} :> {y}

For example

{3, 5, 4, 1} /. {___, y_ /; y > 3, ___} :> {y}

{5}
别理我 2024-09-25 11:17:26

使用 Select 可以解决问题,但如果您关心效率,那么这是一个糟糕的解决方案。 Select 会遍历列表中的所有元素,因此所花费的时间与列表的长度成线性关系。

既然你说列表是有序的,那么最好使用 BinarySearch< /code>,它将在列表大小的对数时间内工作。表达式(编辑:我做了一个小调整,因为我之前编写的表达式没有正确处理列表中重复出现的元素。另一个编辑:这仍然不起作用当阈值本身作为重复元素出现在列表中时,请参阅注释):

Floor[BinarySearch[list,threshold]+1]

将为您提供所需元素的索引。如果所有元素都小于阈值,您将得到列表的长度加一。
ps 在使用 BinarySearch 之前不要忘记调用 Needs["Combinatorica'"]

Using Select will solve the problem, but it is a poor solution if you care about efficiency. Select goes over all the elements of the list, and therefore will take time which is linear in the length of the list.

Since you say the list is ordered, it is much better to use BinarySearch, which will work in a time which is logarithmic in the size of the list. The expression (edit: I have made a small adjustment since the previous expression I wrote did not handle correctly recurring elements in the list. another edit: this still doesn't work when the threshold itself appears in the list as a recurring element, see comments):

Floor[BinarySearch[list,threshold]+1]

will give you the index of the desired element. If all the elements are smaller than the threshold, you'll get the length of the list plus one.
p.s. don't forget to call Needs["Combinatorica'"] before using BinarySearch.

撧情箌佬 2024-09-25 11:17:26

仅供将来参考,从 v10 开始,您可以使用 SelectFirst

它有一些额外的优点,例如返回 Missing[] 或默认值。

来自文档:

SelectFirst[{e1,e2,…}, crit] 给出第一个 ei,其中 crit[ei]True,或Misss["NotFound"](如果未找到)。

SelectFirst[{e1,e2,…}, crit, default] 如果没有 ei 使得 给出 default crit[ei]True

对于您的具体情况,您将使用:

SelectFirst[list, # > threshold &]

Just for future reference, starting from v10 you can use SelectFirst

It has some added niceties, such as returning Missing[] or default values.

From the docs:

SelectFirst[{e1,e2,…}, crit] gives the first ei for which crit[ei] is True, or Missing["NotFound"] if none is found.

SelectFirst[{e1,e2,…}, crit, default] gives default if there is no ei such that crit[ei] is True.

For your specific case, you would use:

SelectFirst[list, # > threshold &]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文