从 char 数组获取当前单词的最有效方法
假设我有一个字符串“text”,一个插入符位置“caret”,然后想要找到当前单词(用空格分隔)。
我目前的做法似乎效率低下,我想知道是否有人有有效的方法?
const char* text;
int caret;
int initpos;
int start;
int count = 0;
char word[256];
// text and caret values assigned here.
initpos = caret;
while(caret > 0 && text[caret] != ' ') // get start
{
caret--;
count++;
}
start = caret;
caret = initpos;
while(text[caret] && text[caret] != ' ') // get end
{
caret++;
count++;
}
word = strsub(text, start, count);
Say I have a string "text", a caret position "caret" and then want to find the current word (seperated by space).
The way I'm currently doing it seems inefficient and I was wondering if anyone had a efficient way of doing it?
const char* text;
int caret;
int initpos;
int start;
int count = 0;
char word[256];
// text and caret values assigned here.
initpos = caret;
while(caret > 0 && text[caret] != ' ') // get start
{
caret--;
count++;
}
start = caret;
caret = initpos;
while(text[caret] && text[caret] != ' ') // get end
{
caret++;
count++;
}
word = strsub(text, start, count);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
“看起来效率低下”是指代码看起来对您来说效率低下,还是您测量后发现它对于您的目的来说太慢了?
您的方法需要 O(n) 个步骤,其中 n 是输入中最长单词的长度。这已经相当快了,除非你的单词有 DNA 字符串那么大。
对于某些数据集,更快的方法是使用单词开始和结束位置的索引。存储间隔的二叉搜索树可以满足此要求,但代价是 O(lg N) 检索时间,其中 N 是输入中的单词数。可能不值得。
By "seems inefficient", do you mean the code looks inefficient to you or that you've measured and found it too slow for you purposes?
Your method takes O(n) steps where n is the length of the longest word in your input. That's pretty fast unless your words have the size of DNA strings.
A faster method, for some datasets, would be to use an index of word start and end positions. An binary search tree storing intervals would fit this bill, but at the expense of O(lg N) retrieval time, where N is the number of words in your input. Probably not worth it.
我认为这是一种有效的方法。我只是建议检查该字符是否是字母,而不是它是否不是空格:
这会捕获其他情况,例如当单词以点、数字、括号等结尾时。
I think it is an efficient approach. I would just suggest checking if the character is a letter, rather than if it is not a space:
This will catch other cases, e.g. when word is terminated by a dot, a digit, a bracket etc.