二分查找时如何处理空值?
在对 List
进行二分搜索期间处理 null
的最佳方法是什么(嗯,它将是 >List
如果我可以事先读出所有值)?
int previous = 0;
int direction = -1;
if (itemToCompare == null) {
previous = mid;
for (int tries = 0; tries < 2; tries++) {
mid += direction;
itemToCompare = GetItem(mid);
while (itemToCompare == null && insideInclusiveRange(min, max, mid)) {
mid += direction;
itemToCompare = GetItem(mid);
}
if (!insideInclusiveRange(min, max, mid)) {
/* Reached an endpoint without finding anything,
try the other direction. */
mid = previous;
direction = -direction;
} else if (itemToCompare != null) {
break;
}
}
}
我目前正在做类似上面的事情 - 如果遇到 null,则沿一个方向线性搜索,直到遇到 非 null 或 超越端点,如果没有成功则在其他方向重复。在实际代码中,我从之前的比较结果中获取direction
,并且GetItem()
缓存它检索到的值。有没有更简单的方法,无需创建非空值的中间列表(对于我的目的而言,花费太长时间,因为上面的 GetItem() 函数很慢)?
我想我是在问是否有比降级为线性搜索更智能的方法来处理空值。很可能只有一小部分空值 (1-5%),但可能存在数百个空值序列。
编辑 - 数据看起来像这样,
aa aaa b bb bbb c cc d ddd
其中每一行都是一个单独的对象,并且并非保证所有单元格都被填充。用户需要能够搜索整行(以便“bb”和“bbb”都匹配整个第二行)。查询每个对象的速度足够慢,线性搜索将无法工作。出于同样的原因,创建一个没有空值的新列表实际上并不可行。
What's the best way to go about handling null
s during a binary search over a List<string>
(well, it would be a List<string>
if I could read all the values out beforehand)?
int previous = 0;
int direction = -1;
if (itemToCompare == null) {
previous = mid;
for (int tries = 0; tries < 2; tries++) {
mid += direction;
itemToCompare = GetItem(mid);
while (itemToCompare == null && insideInclusiveRange(min, max, mid)) {
mid += direction;
itemToCompare = GetItem(mid);
}
if (!insideInclusiveRange(min, max, mid)) {
/* Reached an endpoint without finding anything,
try the other direction. */
mid = previous;
direction = -direction;
} else if (itemToCompare != null) {
break;
}
}
}
I'm currently doing something like the above - if null is encountered, then linearly search in a direction until either non-null or beyond endpoint is encountered, if no success then repeat in other direction. In the actual code I'm getting direction
from the previous comparison result, and GetItem()
caches the values it retrieves. Is there an easier way, without making an intermediate list of non-null values (takes far too long for my purposes because the GetItem()
function above is slow)?
I guess I'm asking if there's a smarter way to handle null values than to degrade to a linear search. In all likelihood there will only be a small percentage of nulls (1-5%), but it's possible for there to be sequences of 100s of null.
Edit - The data looks something like this
aa aaa b bb bbb c cc d ddd
where each row is a separate object, and not all cells are guaranteed to be filled. The user needs to be able to search across an entire row (so that both "bb" and "bbb" would match the entire second row). Querying each object is slow enough that a linear search will not work. For the same reason, creating a new list without nulls is not really feasible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
除非有理由实际选择/查找
null
值(不确定这意味着什么,因为null
是单例,并且二分搜索通常是唯一的)值),请考虑根本不允许它们出现在列表中。[之前的答案:在对问题进行了更多思考之后,我决定 null 可能在问题空间中没有位置 - 酌情采取一些部分。]
如果需要 null 值,只需对列表进行排序,使 null 值位于第一个(或最后一个)并正确更新逻辑 - 然后确保不要对任何
null
值调用方法;-)这应该不会产生什么总体影响,因为已经需要进行排序。如果项目更改为
null
——这听起来像是一个令人讨厌的副作用! -- 然后只需“压缩”列表(例如“删除”空项目)。但是,除非有充分的理由,否则我不会修改排序列表。二分搜索仅真正设计/适合(完全)排序的数据。将其变成二元线性搜索是没有意义的。
快乐编码。
Unless there is a reason to actually select/find a
null
value (not sure what that means asnull
is a singleton and binary search is often most desirable on unique values), consider not allowing them in the list at all.[Previous answer: After reflecting on the question more I have decided that
null
s likely have no place in the problem-space -- take bits and parts as appropriate.]If nulls are desired, just sort the list such that null values are first (or last) and update the logic correctly -- then just make sure not to invoke a method upon any of the
null
values ;-)This should have little overall impact since a sort is already required. If items are changed to
null
-- which sounds like an icky side-effect! -- then just "compact" the List (e.g. "remove" the null item). I would, however, just not modify the sorted list unless there is a good reason.Binary search is only really designed/suitable for (entirely) sorted data. No point turning it into a binary-maybe-linear search.
Happy coding.