对未知数量的项目进行二分搜索
假设您不知道要搜索的元素数量,并且给定一个接受索引的 API,如果您超出范围,将返回 null(如此处使用 getWordFromDictionary 方法实现的),您如何执行二分搜索并实现客户端程序的 isWordInDictionary() 方法?
该解决方案有效,但我最终在找到初始高索引值的级别之上进行了串行搜索。对较低范围值的搜索受到此答案的启发。我还查看了 Reflector(C# 反编译器)中的 BinarySearch,但它的列表长度已知,因此仍在寻找填补空白。
private static string[] dictionary;
static void Main(string[] args)
{
dictionary = System.IO.File.ReadAllLines(@"C:\tmp\dictionary.txt");
Console.WriteLine(isWordInDictionary("aardvark", 0));
Console.WriteLine(isWordInDictionary("bee", 0));
Console.WriteLine(isWordInDictionary("zebra", 0));
Console.WriteLine(isWordInDictionaryBinary("aardvark"));
Console.WriteLine(isWordInDictionaryBinary("bee"));
Console.WriteLine(isWordInDictionaryBinary("zebra"));
Console.ReadLine();
}
static bool isWordInDictionaryBinary(string word)
{
// assume the size of the dictionary is unknown
// quick check for empty dictionary
string w = getWordFromDictionary(0);
if (w == null)
return false;
// assume that the length is very big.
int low = 0;
int hi = int.MaxValue;
while (low <= hi)
{
int mid = (low + ((hi - low) >> 1));
w = getWordFromDictionary(mid);
// If the middle element m you select at each step is outside
// the array bounds (you need a way to tell this), then limit
// the search to those elements with indexes small than m.
if (w == null)
{
hi = mid;
continue;
}
int compare = String.Compare(w, word);
if (compare == 0)
return true;
if (compare < 0)
low = mid + 1;
else
hi = mid - 1;
}
// punting on the search above the current value of hi
// to the (still unknown) upper limit
return isWordInDictionary(word, hi);
}
// serial search, works good for small number of items
static bool isWordInDictionary(string word, int startIndex)
{
// assume the size of the dictionary is unknown
int i = startIndex;
while (getWordFromDictionary(i) != null)
{
if (getWordFromDictionary(i).Equals(word, StringComparison.OrdinalIgnoreCase))
return true;
i++;
}
return false;
}
private static string getWordFromDictionary(int index)
{
try
{
return dictionary[index];
}
catch (IndexOutOfRangeException)
{
return null;
}
}
答案后的最终代码
static bool isWordInDictionaryBinary(string word)
{
// assume the size of the dictionary is unknown
// quick check for empty dictionary
string w = getWordFromDictionary(0);
if (w == null)
return false;
// assume that the number of elements is very big
int low = 0;
int hi = int.MaxValue;
while (low <= hi)
{
int mid = (low + ((hi - low) >> 1));
w = getWordFromDictionary(mid);
// treat null the same as finding a string that comes
// after the string you are looking for
if (w == null)
{
hi = mid - 1;
continue;
}
int compare = String.Compare(w, word);
if (compare == 0)
return true;
if (compare < 0)
low = mid + 1;
else
hi = mid - 1;
}
return false;
}
Assuming you don't know the number of elements you are searching and given an API that accepts an index and will return null if you are outside the bounds (as implemented here with the getWordFromDictionary method), how can you perform a binary search and implement the isWordInDictionary() method for client programs?
This solution works, but I ended up doing a serial search above the level where I found an initial high-index value. The search through the lower range of values was inspired by this answer. I also peeked at BinarySearch in Reflector (C# decompiler), but that has a known list length, so still looking to fill in the gaps.
private static string[] dictionary;
static void Main(string[] args)
{
dictionary = System.IO.File.ReadAllLines(@"C:\tmp\dictionary.txt");
Console.WriteLine(isWordInDictionary("aardvark", 0));
Console.WriteLine(isWordInDictionary("bee", 0));
Console.WriteLine(isWordInDictionary("zebra", 0));
Console.WriteLine(isWordInDictionaryBinary("aardvark"));
Console.WriteLine(isWordInDictionaryBinary("bee"));
Console.WriteLine(isWordInDictionaryBinary("zebra"));
Console.ReadLine();
}
static bool isWordInDictionaryBinary(string word)
{
// assume the size of the dictionary is unknown
// quick check for empty dictionary
string w = getWordFromDictionary(0);
if (w == null)
return false;
// assume that the length is very big.
int low = 0;
int hi = int.MaxValue;
while (low <= hi)
{
int mid = (low + ((hi - low) >> 1));
w = getWordFromDictionary(mid);
// If the middle element m you select at each step is outside
// the array bounds (you need a way to tell this), then limit
// the search to those elements with indexes small than m.
if (w == null)
{
hi = mid;
continue;
}
int compare = String.Compare(w, word);
if (compare == 0)
return true;
if (compare < 0)
low = mid + 1;
else
hi = mid - 1;
}
// punting on the search above the current value of hi
// to the (still unknown) upper limit
return isWordInDictionary(word, hi);
}
// serial search, works good for small number of items
static bool isWordInDictionary(string word, int startIndex)
{
// assume the size of the dictionary is unknown
int i = startIndex;
while (getWordFromDictionary(i) != null)
{
if (getWordFromDictionary(i).Equals(word, StringComparison.OrdinalIgnoreCase))
return true;
i++;
}
return false;
}
private static string getWordFromDictionary(int index)
{
try
{
return dictionary[index];
}
catch (IndexOutOfRangeException)
{
return null;
}
}
Final Code after answers
static bool isWordInDictionaryBinary(string word)
{
// assume the size of the dictionary is unknown
// quick check for empty dictionary
string w = getWordFromDictionary(0);
if (w == null)
return false;
// assume that the number of elements is very big
int low = 0;
int hi = int.MaxValue;
while (low <= hi)
{
int mid = (low + ((hi - low) >> 1));
w = getWordFromDictionary(mid);
// treat null the same as finding a string that comes
// after the string you are looking for
if (w == null)
{
hi = mid - 1;
continue;
}
int compare = String.Compare(w, word);
if (compare == 0)
return true;
if (compare < 0)
low = mid + 1;
else
hi = mid - 1;
}
return false;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以分两个阶段实施二分搜索。在第一阶段,您增加要搜索的区间的大小。一旦检测到超出范围,您就可以在找到的最新区间中进行正常的二分搜索。像这样的东西:
You can implement a binary search in two phases. In the first phase, you grow the size of the interval you're searching in. Once you detect you're outside the bounds, you can do a normal binary search in the latest interval you found. Something like this:
当然可以。从索引一开始,将查询索引加倍,直到遇到按字典顺序大于查询单词的内容(编辑:或 null)。然后你可以再次缩小搜索范围,直到找到索引,或者返回 false。
编辑:请注意,这不会增加您的渐近运行时间,并且它仍然是 O(logN),其中 N 是系列中的项目数。
Sure you can. Start at index one, and double your query index until you hit something that's lexographically larger than your query word(Edit: or null). Then you can narrow down your search space again until you find the index, or return false.
Edit: Note that this does NOT add to your asymptotic runtime, and it is still O(logN), where N is the number of items in the series.
因此,我不确定我是否完全理解您的描述中的问题,但我假设您正在尝试搜索未知长度的排序数组以查找特定字符串。我还假设实际数组中没有空值;如果您请求超出范围的索引,则该数组仅返回 null。
如果这些情况属实,那么解决方案应该只是标准的二分搜索,尽管您在整个整数空间中进行搜索,并且您只需将 null 视为查找位于您要查找的字符串之后的字符串即可。本质上,想象一下您的 N 个字符串的排序数组实际上是一个 INT_MAX 字符串的排序数组,最后以空值排序。
我不太明白的是,你似乎基本上已经做到了这一点(至少从粗略地看一下代码),所以我想我可能无法完全理解你的问题。
So, I'm not sure I entirely understand the problem from your description, but I'm assuming you're trying to search through a sorted array of unknown length to find a particular string. I'm also assuming that there are no nulls in the actual array; the array only returns null if you ask for an index that's out of bounds.
If those things are true, the solution should be just a standard binary search, albeit one where you search over the entire integer space, and you just treat null the same as finding a string that comes after the string you are looking for. Essentially just imagine that your sorted array of N strings is really a sorted array of INT_MAX strings sorted with nulls at the end.
What I don't quite understand is that you seem to basically have done that already (at least from a cursory look at the code), so I think I might not understand your problem completely.