自动完成的算法?

发布于 2024-09-03 00:07:12 字数 251 浏览 6 评论 0原文

我指的是当用户在 Google 中输入搜索词时用于提供查询建议的算法。

我主要感兴趣的是: 1. 最重要的结果(最有可能的查询而不是任何匹配的结果) 2. 匹配子串 3.模糊匹配

我知道你可以使用Trie或广义trie来查找匹配,但它不能满足上述要求...

之前提出的类似问题这里

I am referring to the algorithm that is used to give query suggestions when a user types a search term in Google.

I am mainly interested in:
1. Most important results (most likely queries rather than anything that matches)
2. Match substrings
3. Fuzzy matches

I know you could use Trie or generalized trie to find matches, but it wouldn't meet the above requirements...

Similar questions asked earlier here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

厌倦 2024-09-10 00:07:12

对于(呵呵)很棒的模糊/部分字符串匹配算法,请查看该死的酷算法:

这些不会取代尝试,而是阻止尝试在尝试中进行强力查找 - 这仍然是一个巨大的胜利。接下来,您可能需要一种方法来限制 trie 的大小:

  • 保留全局最近/前 N 个单词的 trie;
  • 对于每个用户,保留该用户最近/前 N 个单词的尝试。

最后,您希望尽可能防止查找...

  • 缓存查找结果:如果用户单击任何搜索结果,您可以非常快速地提供这些结果,然后异步获取完整的部分/模糊查找。
  • 预计算查找结果:如果用户输入了“appl”,他们很可能会继续输入“apple”、“apply”。
  • 预取数据:例如,Web 应用程序可以向浏览器发送较小的结果集,小到足以使 JS 中的强力搜索可行。

For (heh) awesome fuzzy/partial string matching algorithms, check out Damn Cool Algorithms:

These don't replace tries, but rather prevent brute-force lookups in tries - which is still a huge win. Next, you probably want a way to bound the size of the trie:

  • keep a trie of recent/top N words used globally;
  • for each user, keep a trie of recent/top N words for that user.

Finally, you want to prevent lookups whenever possible...

  • cache lookup results: if the user clicks through on any search results, you can serve those very quickly and then asynchronously fetch the full partial/fuzzy lookup.
  • precompute lookup results: if the user has typed "appl", they are likely to continue with "apple", "apply".
  • prefetch data: for instance, a web app can send a smaller set of results to the browser, small enough to make brute-force searching in JS viable.
优雅的叶子 2024-09-10 00:07:12

我只想说...
解决这个问题的一个好方法是结合三元搜索树以外的东西。
需要 Ngram 和 Shingles(短语)。还需要检测字边界错误。 “hell o”应该是“hello”......而“whitesocks”应该是“白袜子” - 这些是预处理步骤。如果您没有正确预处理数据,您将无法获得有价值的搜索结果。
三元搜索树是弄清楚什么是单词的有用组件,并且还可以在输入的单词不是索引中的有效单词时实现相关单词猜测。

谷歌算法执行短语建议和纠正。
谷歌算法也有一些上下文的概念......如果您搜索的第一个词与天气相关,并且您将它们组合起来“weatherforcst”与“monsoonfrcst”与“deskfrcst” - 我的猜测是幕后排名正在发生变化基于遇到的第一个单词的建议 - 预测和天气是相关的单词,因此预测在“您是不是想说”猜测中排名很高。

单词部分(ngram)、短语术语(shingles)、单词邻近度(单词聚类索引)、三元搜索树(单词查找)。

I'd just like to say...
A good solution to this problem is going to incorporate more than a Ternary Search Tree.
Ngrams, and Shingles (Phrases) are needed. Word-boundary errors also need to be detected. "hell o" should be "hello" ... and "whitesocks" should be "white socks" - these are pre-processing steps. If you don't preprocess the data properly you aren't going to get valuable search results.
Ternary search trees are a useful component in figuring out what is a word, and also for implementing related-word guessing when a word typed isn't a valid word in the index.

The google algorithm performs phrase suggestion and correction.
The google algorithm also has some concept of context... if the first word you search for is weather related and you combine them "weatherforcst" vs "monsoonfrcst" vs "deskfrcst" - my guess is behind the scenes rankings are being changed in the suggestion based on the first word encountered - forecast and weather are related words therefore forecast get's a high rank in the Did-You-Mean guess.

word-partials (ngrams), phrase-terms (shingles), word-proximity (word-clustering-index), ternary-search-tree (word lookup).

随梦而飞# 2024-09-10 00:07:12

Google 的具体算法尚不清楚,但据说 通过对用户输入进行统计分析来进行工作。这种方法不适合大多数情况。更常见的自动完成是使用以下之一实现的:

  • 。通过在树结构(前缀树、后缀树、dawg 等)中对可搜索文本进行索引,可以以内存存储为代价执行非常快速的搜索。树遍历可以适应近似匹配。
  • 模式分区。通过将文本划分为标记(ngram),我们可以使用一种简单的散列方案来执行模式出现的搜索。
  • 过滤。找到一组潜在的匹配项,然后应用顺序算法来检查每个候选项。

看一下 completely,这是一个实现了后面一些概念的 Java 自动完成库。

Google's exact algorithm is unknown, but it is said to work by statistical analysis of users input. An approach not suitable for most cases. More commonly auto completion is implemented using one of the following:

  • Trees. By indexing the searchable text in a tree structure (prefix tree, suffix tree, dawg, etc..) one can execute very fast searches at the expense of memory storage. The tree traversal can be adapted for approximate matching.
  • Pattern Partitioning. By partitioning the text into tokens (ngrams) one can execute searches for pattern occurrences using a simple hashing scheme.
  • Filtering. Find a set of potential matches and then apply a sequential algorithm to check each candidate.

Take a look at completely, a Java autocomplete library that implements some of the latter concepts.

瑾兮 2024-09-10 00:07:12

有一些工具,例如 soundex编辑距离,可用于查找特定范围内的模糊匹配。

Soundex 查找听起来相似的单词,levenshtein distance 查找与另一个单词在一定编辑距离内的单词。

There are tools like soundex and levenshtein distance that can be used to find fuzzy matches that are within a certain range.

Soundex finds words that sound similar and levenshtein distance finds words that are within a certain edit distance from another word.

你好,陌生人 2024-09-10 00:07:12

看看 Firefox 的 Awesome bar 算法

Google 建议很有用,因为它考虑了数百万个热门查询+您过去的相关查询。

但它没有良好的完成算法/用户界面:

  1. 不执行子字符串
  2. 似乎是一个相对简单的单词边界前缀算法。
    例如:尝试 tomcat tut -->正确建议“tomcat教程”。现在尝试 tomcat rial -->没有建议)-:
  3. 不支持“你是说吗?” - 就像谷歌搜索结果一样。

Take a look at Firefox's Awesome bar algorithm

Google suggest is useful, because it take the millions of popular queries + your past related queries into account.

It doesn't have a good completion algorithm / UI though:

  1. Doesn't do substrings
  2. Seems like a relatively simple word-boundary prefix algorithm.
    For example: Try tomcat tut --> correctly suggest "tomcat tutorial". Now try tomcat rial --> no suggestions )-:
  3. Doesn't support "did you mean?" - as in google search results.
乖乖兔^ω^ 2024-09-10 00:07:12

对于子字符串和模糊匹配,编辑距离算法对我来说效果相当好。尽管我承认它似乎并不像自动完成/建议的行业实现那么完美。我认为谷歌和微软的 Intellisense 都做得更好,因为他们改进了这个基本算法来权衡匹配不同字符串所需的编辑操作类型。例如,调换两个字符可能只算作 1 次操作,而不是 2 次操作(插入和删除)。

但即便如此,我发现这已经足够接近了。这是它在 C# 中的实现...

// This is the traditional Levenshtein Distance algorithem, though I've tweaked it to make
// it more like Google's autocomplete/suggest.  It returns the number of operations 
// (insert/delete/substitute) required to change one string into another, with the 
// expectation that userTyped is only a partial version of fullEntry.
// Gives us a measurement of how similar the two strings are.
public static int EditDistance(string userTyped, string fullEntry)
{
    if (userTyped.Length == 0) // all entries are assumed to be fully legit possibilities 
        return 0; // at this point, because the user hasn't typed anything.

    var inx = fullEntry.IndexOf(userTyped[0]);
    if (inx < 0) // If the 1st character doesn't exist anywhere in the entry, it's not
        return Int32.MaxValue; // a possible match.

    var lastInx = inx;
    var lastMatchCount = 0;
TryAgain:
    // Is there a better starting point?
    var len = fullEntry.Length - inx;
    var matchCount = 1;
    var k = 1;
    for (; k < len; k++)
    {
        if (k == userTyped.Length || userTyped[k] != fullEntry[k + inx])
        {
            if (matchCount > lastMatchCount)
            {
                lastMatchCount = matchCount;
                lastInx = inx;
            }
            inx = fullEntry.IndexOf(userTyped[0], inx + 1);
            matchCount = 0;
            if (inx > 0)
                goto TryAgain;
            else
                break;
        }
        else
            matchCount++;
    }
    if (k == len && matchCount > lastMatchCount)
        lastInx = inx;

    if (lastInx > 0)
        fullEntry = fullEntry.Substring(lastInx); // Jump to 1st character match, ignoring previous values 

    // The start of the Levenshtein Distance algorithem.
    var m = userTyped.Length;
    var n = Math.Min(m, fullEntry.Length);

    int[,] d = new int[m + 1, n + 1]; // "distance" - meaning number of operations.

    for (var i = 0; i <= m; i++)
        d[i, 0] = i; // the distance of any first string to an empty second string
    for (var j = 0; j <= n; j++)
        d[0, j] = j; // the distance of any second string to an empty first string

    for (var j = 1; j <= n; j++)
        for (var i = 1; i <= m; i++)
            if (userTyped[i - 1] == fullEntry[j - 1])
                d[i, j] = d[i - 1, j - 1];       // no operation required
            else
                d[i, j] = Math.Min
                           (
                             d[i - 1, j] + 1,  // a deletion
                             Math.Min(
                             d[i, j - 1] + 1,  // an insertion
                             d[i - 1, j - 1] + 1 // a substitution
                             )
                           );

    return d[m, n];
}

For substrings and fuzzy matches, the Levenshtein distance algorithm has worked fairly well for me. Though I will admit it does not seem to be as perfect as industry implementations of autocomplete/suggest. Both Google and Microsoft's Intellisense do a better job, I think because they've refined this basic algorithm to weigh the kind of edit operations it takes to match the dissimilar strings. E.g. transposing two characters should probably only count as 1 operation, not 2 (an insert & delete).

But even so I find this is close enough. Here is it's implementation in C#...

// This is the traditional Levenshtein Distance algorithem, though I've tweaked it to make
// it more like Google's autocomplete/suggest.  It returns the number of operations 
// (insert/delete/substitute) required to change one string into another, with the 
// expectation that userTyped is only a partial version of fullEntry.
// Gives us a measurement of how similar the two strings are.
public static int EditDistance(string userTyped, string fullEntry)
{
    if (userTyped.Length == 0) // all entries are assumed to be fully legit possibilities 
        return 0; // at this point, because the user hasn't typed anything.

    var inx = fullEntry.IndexOf(userTyped[0]);
    if (inx < 0) // If the 1st character doesn't exist anywhere in the entry, it's not
        return Int32.MaxValue; // a possible match.

    var lastInx = inx;
    var lastMatchCount = 0;
TryAgain:
    // Is there a better starting point?
    var len = fullEntry.Length - inx;
    var matchCount = 1;
    var k = 1;
    for (; k < len; k++)
    {
        if (k == userTyped.Length || userTyped[k] != fullEntry[k + inx])
        {
            if (matchCount > lastMatchCount)
            {
                lastMatchCount = matchCount;
                lastInx = inx;
            }
            inx = fullEntry.IndexOf(userTyped[0], inx + 1);
            matchCount = 0;
            if (inx > 0)
                goto TryAgain;
            else
                break;
        }
        else
            matchCount++;
    }
    if (k == len && matchCount > lastMatchCount)
        lastInx = inx;

    if (lastInx > 0)
        fullEntry = fullEntry.Substring(lastInx); // Jump to 1st character match, ignoring previous values 

    // The start of the Levenshtein Distance algorithem.
    var m = userTyped.Length;
    var n = Math.Min(m, fullEntry.Length);

    int[,] d = new int[m + 1, n + 1]; // "distance" - meaning number of operations.

    for (var i = 0; i <= m; i++)
        d[i, 0] = i; // the distance of any first string to an empty second string
    for (var j = 0; j <= n; j++)
        d[0, j] = j; // the distance of any second string to an empty first string

    for (var j = 1; j <= n; j++)
        for (var i = 1; i <= m; i++)
            if (userTyped[i - 1] == fullEntry[j - 1])
                d[i, j] = d[i - 1, j - 1];       // no operation required
            else
                d[i, j] = Math.Min
                           (
                             d[i - 1, j] + 1,  // a deletion
                             Math.Min(
                             d[i, j - 1] + 1,  // an insertion
                             d[i - 1, j - 1] + 1 // a substitution
                             )
                           );

    return d[m, n];
}
浅浅淡淡 2024-09-10 00:07:12

如果您正在寻找问题的整体设计,请尝试阅读 https://www 中的内容.interviewbit.com/problems/search-typeahead/

他们首先通过使用 trie 的简单方法构建自动完成功能,然后在此基础上进行构建。他们还解释了采样和离线更新等优化技术,以满足特定的用例。

为了保持解决方案的可扩展性,您必须智能地对 trie 数据进行分片。

If you are looking for an overall design for the problem, try reading the content at https://www.interviewbit.com/problems/search-typeahead/.

They start by building autocomplete through a naive approach of using a trie and then build upon it. They also explain optimization techniques like sampling and offline updates to cater to specific use cases.

To keep the solution scalable, you would have to shard your trie data intelligently.

逐鹿 2024-09-10 00:07:12

我认为构建一个专门的特里树可能会更好,而不是追求一种完全不同的数据结构。

我可以看到该功能体现在 trie 中,其中每个叶子都有一个字段,反映其相应单词的搜索频率。

搜索查询方法将显示具有最大值的后代叶节点,该最大值通过将到每个后代叶节点的距离乘以与每个后代叶节点相关联的搜索频率来计算。

谷歌使用的数据结构(以及相应的算法)可能要复杂得多,可能会考虑大量其他因素,例如来自您自己的特定帐户的搜索频率(以及一天中的时间......以及天气......季节) ...和月相...和...)。
然而,我相信通过在每个节点中包含附加字段并在搜索查询方法中使用这些字段,基本 trie 数据结构可以扩展到任何类型的专门搜索首选项。

I think that one might be better off constructing a specialized trie, rather than pursuing a completely different data structure.

I could see that functionality manifested in a trie in which each leaf had a field that reflected the frequency of searches of its corresponding word.

The search query method would display the descendant leaf nodes with the largest values calculated from multiplying the distance to each descendant leaf node by the search frequency associated with each descendant leaf node.

The data structure (and consequently the algorithm) Google uses are probably vastly more complicated, potentially taking into a large number of other factors, such as search frequencies from your own specific account (and time of day... and weather... season... and lunar phase... and... ).
However, I believe that the basic trie data structure can be expanded to any kind of specialized search preference by including additional fields to each of the nodes and using those fields in the search query method.

留蓝 2024-09-10 00:07:12

我不知道这是否能回答你的问题,但当时我用 C 语言做了一个非常简单的输入自动完成代码。我还没有在这方面实现机器学习和神经网络,所以它不会进行概率计算之类的。它的作用是使用子字符串检查算法检查与输入匹配的第一个索引。

您可以将匹配数据提供到“dict.txt”文件中。

/* Auto-complete input function in c
    @authors: James Vausch
    @date: 2018-5-23

    - This is a bona-fide self-created program which aims to
      stimulate an input auto-suggest or auto-complete function
      in C language. This is open source so you can use the code
      freely. However if you will use this, just acknowledge the
      creator as a sign of respect.
    - I'd also like to acknowledge Code with C team whom where I
      I got an answer how to have a colored output instead of 
      using system("color #"). Link down below

      https://www.codewithc.com/change-text-color-in-codeblocks-console-window/


    - THE GENERAL IDEA IS; WE READ A FILE WITH DICTIONARY WORDS
      OR SHALL WE SAY ALL WORDS. WE RUN A WHILE LOOP THAT WILL
      GET CHARACTER FROM THE USER USING "getch()" FUNCTION THEN
      STORE IT IN A CHARACTER ARRAY THEN IS PASSED ON A FUNCTION
      THAT CHECKS IF THE ANY DICTIONARY WORDS HAS A SUBSTRING
      THAT IS THE USER INPUT. IF YES(0), THE FUNCTION WILL COPY
      THE FOLLOWING STRING FROM THE DICTIONARY ARRAY THEN STORED
      IN A TEMP CHAR ARRAY AND PROCESSED. THE PROCESSING SHOULD
      BE SIMPLE. WE RUN A LOOP IN WHICH WILL CHECK THE AMOUNT OF
      CHARACTERS IN THE MATCHED STRING, THEN WE'LL RUN A LOOP
      THAT WILL SORT THE WORDS DECREMENTALLY BASED ON THE AMOUNT
      OF CHARACTERS OF THE INPUT SUBSTRING. THEN PRINT THE
      PROCESSED STRING ON THE FRONT OF THE INPUT STRING THEN RUN
      A LOOP BASED ON THE AMOUNT OF CHARACTERS PRESENT OR STRING
      LENGTH OF THE PROCESSED STRING PLUS 10 EXTRA CHARACTERS 
      ALONG WITH PRINTING SOME BACKWARD TRAVERSE CARET FUNCTION 
      TO MAKE THE CARET STAY WHERE IT SHOULD BE ALONG WITH 
      INPUTTING. SIMPLE.

    - <EXAMPLE>
        INPUT: COM
        AFTER LOOP RUN: MATCHED WITH WORD "COMMAND"
        AFTER LOOP RUN: INPUT HAS 3 CHARACTERS
        LOOP SEQUENCE:
            LOOP 0: OMMAND
            LOOP 1: MMAND
            LOOP 2: MAND
        AFTER LOOP: MAND
        PRINT: "MAND" AFTER INPUT BUT KEEP CARET ON THE INPUT "COM"

    NOTE:
    - You'll need the "dict.txt" file or you can create one and
      put some stuff there
    - Since C Programs run on.. say a terminal, I have not much of a way to
      efficiently make a way to use arrow keys for the job.
    - you should type your INPUT in LOWERCASE since pressing "Shift_Key + M"
      is equivalent to pressing the VK_Right(right arrow key) as well as
      the other arrow keys
    - the right arrow key has an ascii equivalent of <-32><77>, 77 = M
    - to complete the input, you'll need to press right arrow key
    - the left arrow key has an ascii equivalent of <-32><75>, 75 = K
    - to remove auto-complete suggestion, press left arrow key

    TO ADD:
    - UP arrow key and DOWN arrow key to cycle through suggestions
*/

//#include <headers.h> //My personal header file
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <windows.h>

void main(){
    SetColor(6);
    start();
}

void start(){
    int rep = 0;
    char dictFile[] = "dict.txt";
    loadDictionaryEntries(dictFile);

    char inp[50];
    printf("\nAuto Complete Program : C");
    while(rep == 0){
        printf("\nInput: ");
        autoCompleteInput(inp);
        if(strcasecmp(inp, "exit") == 0){
            break;
        }
        printf("\nOutput: %s", inp);
    }
    printf("\n");
    system("pause");
}

int dictEntryCount = 0;
struct allWords{
    char entry[100];
}dictionary[60000];

//============================================================================//
void loadDictionaryEntries(char directory[]){
    FILE *file;
    int dex = 0;
    char str[100];
    if(file = fopen(directory, "r")){
        printf("File accessed.\n");
        while(!feof(file)){
            fscanf(file, "%s", str);
            //UN-COMMENT line 109 to check if the program is reading from file
            //printf("Adding entry %d: \"%s\" to dictionary\n",dex + 1, str);
            strcpy(dictionary[dex].entry, str);
            dex++;
            dictEntryCount++;
        }
        fclose(file);
        printf("[ADDED %d WORDS TO DICTIONARY]\n", dictEntryCount);
    }else{
        printf(" File cannot be accessed.");
        fclose(file);
    }
}

void printArray(){
    for(int i = 0; i < dictEntryCount; i++){
        printf("Index %d: %s\n", i + 1, dictionary[i].entry);
    }
}

//============================================================================//
void autoCompleteInput(char input[]){
    char matchedWord[100]; //STORAGE FOR THE WORD THAT MATCHES INPUT
    char ch; //STORAGE FOR EACH CHARACTER THAT THE USER INPUTS
    int i = 0; //COUNTER
    int words;
    while(i != 200){ //LOOP TO GET EACH CHARACTER FROM KEYBOARD PRESS
        SetColor(6);
        ch = getch();
        clsx(strlen(matchedWord));
        if(ch == 13){ //CONDITION TO CHECK IF INPUT IS "ENTER" KEY
            break; //BREAKS LOOP IF "ENTER IS PRESSED"
        }else if(ch == 8){ //CONDITION TO CHECK IF INPUT IS "BACKSPACE"
            if(i == 0){ //IF INPUT IS NULL, DO NOTHING, DONT ERASE ANYTHING
                //DO NOTHING
            }else{ //IF INPUT IS NOT NULL, ENABLE ERASING
                clsx(strlen(matchedWord));
                bksp();
                i--;
                input[i] = '\0';
                if(i > 2){
                    if(matchToDictionary(input, matchedWord) == 0){
                        words = 0;
                        processMatchedWord(i, matchedWord);
                        SetColor(8);
                        printf("%s", matchedWord);
                        words = getArrSizeChar(matchedWord);
                        for(int x = 0; x < words; x++){
                            printf("\b");
                        }
                    }
                }
            }
        }else if(ch == 77){ //CONDITION TO CHECK IF INPUT IS RIGHT ARROW KEY
            printf("%s", matchedWord); //PRINT SUGESTED WORD WITH CARET AT FRONT
            strcat(input, matchedWord); //CONCATENATE SUGGESTION TO INPUT
            i = i + words - 1; //SETS INDEX AT THE END OF INPUT
            words = 0; //
        }else if(ch == 75){ //CONDITION TO CHECK IS INPUT IS LEFT ARROW KEY
            clsx(strlen(matchedWord)); //ERASE SUGGESTION
            i--; //DECREMENT INDEX
        }else{ //IF CONDITIONS ABOVE ARE NOT MET, DO THIS
            input[i] = ch; //INSERT CH AT THE INDEX OF INPUT
            printf("%c", ch); //PRINT CHARACTER
            input[i + 1] = '\0'; //SET END OF CURRENT INPUT TO NULL
            i++;
            if(i >= 2){
                if(matchToDictionary(input, matchedWord) == 0){
                    words = 0;
                    processMatchedWord(i, matchedWord);
                    SetColor(8);
                    printf("%s", matchedWord);
                    words = getArrSizeChar(matchedWord);
                    for(int x = 0; x < words; x++){
                        printf("\b");
                    }
                }else{
                    clsx(strlen(matchedWord));
                }
            }
        }

    }
    input[i] = '\0'; //NULL ENDING VALUE TO PREVENT UNNECESSARY CHARACTERS
}

int getArrSizeChar(char array[]){
    int size = 0;
    while(array[size] != '\0'){size++;}
    return size;
}

void clsx(int maxVal){
    for(int i = 0; i < maxVal + 10; i++){
        printf(" ");
    }
    for(int i = 0; i < maxVal + 10; i++){
        printf("\b");
    }
}

int matchToDictionary(char input[], char matchedWord[]){
    int found = 0;
    int dex = dictEntryCount; //LIMIT OF ARRAY / ARRAY BOUND/S
    //while(dictionary[dex] != '\0'){ //LOOP TO DETERMINE ARRAY BOUND
        //printf("%d", dex);
        //dex++; //INCREMENT IF INDEX OF ARRAY IS NOT NULL
    //}
    //printf("%d", dex);
    for(int i = 0; i < dex; i++){ //LOOP TROUGH ALL INDEXES OF DICTIONARY
        //CHECKS IF THE INDEX OF DICTIONARY HAS A SUBSTRING INPUT
        //printf(" Matching %s and %s\n", dictionary[i], input);
        if(containsIgnoreCase(dictionary[i].entry, input) == 0){
            //CHECKS IF THE INDEX OF DICTIONARY TOTALLY MATCHES THE INPUT
            //IT IS TO PREVENT ERRORS IN AUTO-COMPLETING PROCESS
            if(strcasecmp(dictionary[i].entry, input) == 1){
                //IF NOT, STORE INDEX OF DICTIONARY TO MATCHED WORD
                strcpy(matchedWord, dictionary[i].entry);
                found++;
                break; //BREAK LOOP
            }
        }
    }

    if(found == 1){
        return 0;
    }else{
        return 1;
    }
}

void processMatchedWord(int rep, char str[]){
    int lim = 0;
    int i;
    char temp[50];
    while(str[lim] != '\0'){
        lim++;
    }

    while(rep != 0){
        for(i = 0; i < lim; i++){
            str[i] = str[i + 1];
        }
        rep--;
    }
}

//===================================================================//
void bksp(){
    printf("\b "); //first backsapce to print an emtpy character
    printf("\b"); //second backspace to erase printed character
}

int containsIgnoreCase(char str1[], char str2[]){
    char tmp1[100];
    char tmp2[100];
    toLowerCase(tmp1, str1);
    toLowerCase(tmp2, str2);
    int i, j = 0, k;

    for(i = 0; tmp1[i]; i++){
        if(tmp1[i] == tmp2[j]){
            for(k = i, j = 0; tmp1[k] && tmp2[j]; j++, k++){
                if(tmp1[k] != tmp2[j]){
                    break;
                }
            }
            if(!tmp2[j]){
                return 0;
            }
        }
    }

    return 1;
}

void toLowerCase(char destination[], char source[]){
    int lim = 0;
    int i;

    while(source[lim] != '\0'){
        lim++;
    }

    for(i = 0; i < lim; i++){
        destination[i] = tolower(source[i]);
    }
    destination[i] = '\0';
}


/*Console Colors: Windows

    0 = Black        8 = Gray
    1 = Blue         9 = LBlue
    2 = Green       10 = LGreen
    3 = Aqua        11 = LAqua
    4 = Red         12 = LRed
    5 = Purple      13 = LPurple
    6 = Yellow      14 = LYellow
    7 = White       15 = Bright White
}*/

void SetColor(int ForgC){ //CODE SNIPPET FROM WWW.CODEWITHC.COM
    WORD wColor;
    //This handle is needed to get the current background attribute

    HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
    CONSOLE_SCREEN_BUFFER_INFO csbi;
    //csbi is used for wAttributes word

    if(GetConsoleScreenBufferInfo(hStdOut, &csbi)){
        //To mask out all but the background attribute, and to add the color
        wColor = (csbi.wAttributes & 0xF0) + (ForgC & 0x0F);
        SetConsoleTextAttribute(hStdOut, wColor);
    }
    return;
}

如果您指的是一个具有空初始化匹配的程序,然后将用户的输入保存到数组或文件中,那么当用户键入相同的单词时,程序会将其与先前的输入进行匹配,也许我可以解决这个问题。

I don't know if this will answer your question but I made a very simple input-autocomplete code using C language back then. I haven't implemented machine learning and neural networks on this so it won't make probability calculations and whatnot. What it does is check the very first index that matches the input using sub-string checking algorithm.

You can provide match data into a "dict.txt" file.

/* Auto-complete input function in c
    @authors: James Vausch
    @date: 2018-5-23

    - This is a bona-fide self-created program which aims to
      stimulate an input auto-suggest or auto-complete function
      in C language. This is open source so you can use the code
      freely. However if you will use this, just acknowledge the
      creator as a sign of respect.
    - I'd also like to acknowledge Code with C team whom where I
      I got an answer how to have a colored output instead of 
      using system("color #"). Link down below

      https://www.codewithc.com/change-text-color-in-codeblocks-console-window/


    - THE GENERAL IDEA IS; WE READ A FILE WITH DICTIONARY WORDS
      OR SHALL WE SAY ALL WORDS. WE RUN A WHILE LOOP THAT WILL
      GET CHARACTER FROM THE USER USING "getch()" FUNCTION THEN
      STORE IT IN A CHARACTER ARRAY THEN IS PASSED ON A FUNCTION
      THAT CHECKS IF THE ANY DICTIONARY WORDS HAS A SUBSTRING
      THAT IS THE USER INPUT. IF YES(0), THE FUNCTION WILL COPY
      THE FOLLOWING STRING FROM THE DICTIONARY ARRAY THEN STORED
      IN A TEMP CHAR ARRAY AND PROCESSED. THE PROCESSING SHOULD
      BE SIMPLE. WE RUN A LOOP IN WHICH WILL CHECK THE AMOUNT OF
      CHARACTERS IN THE MATCHED STRING, THEN WE'LL RUN A LOOP
      THAT WILL SORT THE WORDS DECREMENTALLY BASED ON THE AMOUNT
      OF CHARACTERS OF THE INPUT SUBSTRING. THEN PRINT THE
      PROCESSED STRING ON THE FRONT OF THE INPUT STRING THEN RUN
      A LOOP BASED ON THE AMOUNT OF CHARACTERS PRESENT OR STRING
      LENGTH OF THE PROCESSED STRING PLUS 10 EXTRA CHARACTERS 
      ALONG WITH PRINTING SOME BACKWARD TRAVERSE CARET FUNCTION 
      TO MAKE THE CARET STAY WHERE IT SHOULD BE ALONG WITH 
      INPUTTING. SIMPLE.

    - <EXAMPLE>
        INPUT: COM
        AFTER LOOP RUN: MATCHED WITH WORD "COMMAND"
        AFTER LOOP RUN: INPUT HAS 3 CHARACTERS
        LOOP SEQUENCE:
            LOOP 0: OMMAND
            LOOP 1: MMAND
            LOOP 2: MAND
        AFTER LOOP: MAND
        PRINT: "MAND" AFTER INPUT BUT KEEP CARET ON THE INPUT "COM"

    NOTE:
    - You'll need the "dict.txt" file or you can create one and
      put some stuff there
    - Since C Programs run on.. say a terminal, I have not much of a way to
      efficiently make a way to use arrow keys for the job.
    - you should type your INPUT in LOWERCASE since pressing "Shift_Key + M"
      is equivalent to pressing the VK_Right(right arrow key) as well as
      the other arrow keys
    - the right arrow key has an ascii equivalent of <-32><77>, 77 = M
    - to complete the input, you'll need to press right arrow key
    - the left arrow key has an ascii equivalent of <-32><75>, 75 = K
    - to remove auto-complete suggestion, press left arrow key

    TO ADD:
    - UP arrow key and DOWN arrow key to cycle through suggestions
*/

//#include <headers.h> //My personal header file
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <windows.h>

void main(){
    SetColor(6);
    start();
}

void start(){
    int rep = 0;
    char dictFile[] = "dict.txt";
    loadDictionaryEntries(dictFile);

    char inp[50];
    printf("\nAuto Complete Program : C");
    while(rep == 0){
        printf("\nInput: ");
        autoCompleteInput(inp);
        if(strcasecmp(inp, "exit") == 0){
            break;
        }
        printf("\nOutput: %s", inp);
    }
    printf("\n");
    system("pause");
}

int dictEntryCount = 0;
struct allWords{
    char entry[100];
}dictionary[60000];

//============================================================================//
void loadDictionaryEntries(char directory[]){
    FILE *file;
    int dex = 0;
    char str[100];
    if(file = fopen(directory, "r")){
        printf("File accessed.\n");
        while(!feof(file)){
            fscanf(file, "%s", str);
            //UN-COMMENT line 109 to check if the program is reading from file
            //printf("Adding entry %d: \"%s\" to dictionary\n",dex + 1, str);
            strcpy(dictionary[dex].entry, str);
            dex++;
            dictEntryCount++;
        }
        fclose(file);
        printf("[ADDED %d WORDS TO DICTIONARY]\n", dictEntryCount);
    }else{
        printf(" File cannot be accessed.");
        fclose(file);
    }
}

void printArray(){
    for(int i = 0; i < dictEntryCount; i++){
        printf("Index %d: %s\n", i + 1, dictionary[i].entry);
    }
}

//============================================================================//
void autoCompleteInput(char input[]){
    char matchedWord[100]; //STORAGE FOR THE WORD THAT MATCHES INPUT
    char ch; //STORAGE FOR EACH CHARACTER THAT THE USER INPUTS
    int i = 0; //COUNTER
    int words;
    while(i != 200){ //LOOP TO GET EACH CHARACTER FROM KEYBOARD PRESS
        SetColor(6);
        ch = getch();
        clsx(strlen(matchedWord));
        if(ch == 13){ //CONDITION TO CHECK IF INPUT IS "ENTER" KEY
            break; //BREAKS LOOP IF "ENTER IS PRESSED"
        }else if(ch == 8){ //CONDITION TO CHECK IF INPUT IS "BACKSPACE"
            if(i == 0){ //IF INPUT IS NULL, DO NOTHING, DONT ERASE ANYTHING
                //DO NOTHING
            }else{ //IF INPUT IS NOT NULL, ENABLE ERASING
                clsx(strlen(matchedWord));
                bksp();
                i--;
                input[i] = '\0';
                if(i > 2){
                    if(matchToDictionary(input, matchedWord) == 0){
                        words = 0;
                        processMatchedWord(i, matchedWord);
                        SetColor(8);
                        printf("%s", matchedWord);
                        words = getArrSizeChar(matchedWord);
                        for(int x = 0; x < words; x++){
                            printf("\b");
                        }
                    }
                }
            }
        }else if(ch == 77){ //CONDITION TO CHECK IF INPUT IS RIGHT ARROW KEY
            printf("%s", matchedWord); //PRINT SUGESTED WORD WITH CARET AT FRONT
            strcat(input, matchedWord); //CONCATENATE SUGGESTION TO INPUT
            i = i + words - 1; //SETS INDEX AT THE END OF INPUT
            words = 0; //
        }else if(ch == 75){ //CONDITION TO CHECK IS INPUT IS LEFT ARROW KEY
            clsx(strlen(matchedWord)); //ERASE SUGGESTION
            i--; //DECREMENT INDEX
        }else{ //IF CONDITIONS ABOVE ARE NOT MET, DO THIS
            input[i] = ch; //INSERT CH AT THE INDEX OF INPUT
            printf("%c", ch); //PRINT CHARACTER
            input[i + 1] = '\0'; //SET END OF CURRENT INPUT TO NULL
            i++;
            if(i >= 2){
                if(matchToDictionary(input, matchedWord) == 0){
                    words = 0;
                    processMatchedWord(i, matchedWord);
                    SetColor(8);
                    printf("%s", matchedWord);
                    words = getArrSizeChar(matchedWord);
                    for(int x = 0; x < words; x++){
                        printf("\b");
                    }
                }else{
                    clsx(strlen(matchedWord));
                }
            }
        }

    }
    input[i] = '\0'; //NULL ENDING VALUE TO PREVENT UNNECESSARY CHARACTERS
}

int getArrSizeChar(char array[]){
    int size = 0;
    while(array[size] != '\0'){size++;}
    return size;
}

void clsx(int maxVal){
    for(int i = 0; i < maxVal + 10; i++){
        printf(" ");
    }
    for(int i = 0; i < maxVal + 10; i++){
        printf("\b");
    }
}

int matchToDictionary(char input[], char matchedWord[]){
    int found = 0;
    int dex = dictEntryCount; //LIMIT OF ARRAY / ARRAY BOUND/S
    //while(dictionary[dex] != '\0'){ //LOOP TO DETERMINE ARRAY BOUND
        //printf("%d", dex);
        //dex++; //INCREMENT IF INDEX OF ARRAY IS NOT NULL
    //}
    //printf("%d", dex);
    for(int i = 0; i < dex; i++){ //LOOP TROUGH ALL INDEXES OF DICTIONARY
        //CHECKS IF THE INDEX OF DICTIONARY HAS A SUBSTRING INPUT
        //printf(" Matching %s and %s\n", dictionary[i], input);
        if(containsIgnoreCase(dictionary[i].entry, input) == 0){
            //CHECKS IF THE INDEX OF DICTIONARY TOTALLY MATCHES THE INPUT
            //IT IS TO PREVENT ERRORS IN AUTO-COMPLETING PROCESS
            if(strcasecmp(dictionary[i].entry, input) == 1){
                //IF NOT, STORE INDEX OF DICTIONARY TO MATCHED WORD
                strcpy(matchedWord, dictionary[i].entry);
                found++;
                break; //BREAK LOOP
            }
        }
    }

    if(found == 1){
        return 0;
    }else{
        return 1;
    }
}

void processMatchedWord(int rep, char str[]){
    int lim = 0;
    int i;
    char temp[50];
    while(str[lim] != '\0'){
        lim++;
    }

    while(rep != 0){
        for(i = 0; i < lim; i++){
            str[i] = str[i + 1];
        }
        rep--;
    }
}

//===================================================================//
void bksp(){
    printf("\b "); //first backsapce to print an emtpy character
    printf("\b"); //second backspace to erase printed character
}

int containsIgnoreCase(char str1[], char str2[]){
    char tmp1[100];
    char tmp2[100];
    toLowerCase(tmp1, str1);
    toLowerCase(tmp2, str2);
    int i, j = 0, k;

    for(i = 0; tmp1[i]; i++){
        if(tmp1[i] == tmp2[j]){
            for(k = i, j = 0; tmp1[k] && tmp2[j]; j++, k++){
                if(tmp1[k] != tmp2[j]){
                    break;
                }
            }
            if(!tmp2[j]){
                return 0;
            }
        }
    }

    return 1;
}

void toLowerCase(char destination[], char source[]){
    int lim = 0;
    int i;

    while(source[lim] != '\0'){
        lim++;
    }

    for(i = 0; i < lim; i++){
        destination[i] = tolower(source[i]);
    }
    destination[i] = '\0';
}


/*Console Colors: Windows

    0 = Black        8 = Gray
    1 = Blue         9 = LBlue
    2 = Green       10 = LGreen
    3 = Aqua        11 = LAqua
    4 = Red         12 = LRed
    5 = Purple      13 = LPurple
    6 = Yellow      14 = LYellow
    7 = White       15 = Bright White
}*/

void SetColor(int ForgC){ //CODE SNIPPET FROM WWW.CODEWITHC.COM
    WORD wColor;
    //This handle is needed to get the current background attribute

    HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
    CONSOLE_SCREEN_BUFFER_INFO csbi;
    //csbi is used for wAttributes word

    if(GetConsoleScreenBufferInfo(hStdOut, &csbi)){
        //To mask out all but the background attribute, and to add the color
        wColor = (csbi.wAttributes & 0xF0) + (ForgC & 0x0F);
        SetConsoleTextAttribute(hStdOut, wColor);
    }
    return;
}

If you're referring to a program that has a null initialized matches then saves an input from user into an array or file then when the user types the same word the program matches it to a previous input, maybe I can work into that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文