检查一组数组中是否存在数组

发布于 2025-01-29 11:05:27 字数 748 浏览 2 评论 0原文

我有这样的固定长度阵列：

char x[10] = "abcdefghij";   // could also contain non-printable char

如何有效测试此阵列是否存在于相同长度的10,000个阵列中？（注意：这些阵列都是二进制数据阵列，或所有它们的无效终止字符串，这是

在Python中修复的，我将使用集合/hashtable/dict而不是列表（因为非常快的O（1）查找）：

s = "abcdefghij"
S = {"foo1234567", "bar1234567", ..., "baz9876543"}
print(s in S)  # True or False

如何做在C中的等效物？（不是c ++）

注意：链接的问题如何检查字符串是否在c？中是一种幼稚的方法，没有性能要求（它们在所有字符串上循环并使用strcmp < /代码>）。在这里，由于有10k数组，因此需要使用另一种方法来进行性能（也许是一个可标记？）。

原文

I have fixed-length arrays like this:

char x[10] = "abcdefghij";   // could also contain non-printable char

How to efficiently test if this array is present or not in a fixed set of 10,000 arrays of the same length? (note: these arrays are either all of them binary data arrays, or all of them null-terminated strings, this is fixed at the beginning)

In Python, I would use a set/hashtable/dict and not a list (because of very fast O(1) lookup):

s = "abcdefghij"
S = {"foo1234567", "bar1234567", ..., "baz9876543"}
print(s in S)  # True or False

How to do the equivalent in C? (not C++)

Note: the linked question How to check if a string is in an array of strings in C? is about a naive way to do it, with no performance requirement (they loop over all strings and use strcmp). Here it's different since there are 10k arrays, one needs to use another method for performance (a hashtable maybe?).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

檐上三寸雪 2025-02-05 11:05:28

请注意，关于在您的问题中，将零终止的字符串和二进制数据存储在数组中

，您可以指定固定长度阵列可以代表

零终止的字符串或
二进制数据。

如果您的程序无法知道这两种可能性中的哪一种由数组内容表示（例如，使用函数 strncpy ）。仅编写单个无效终止字符是不够的。

这很重要，因为如果您不这样做，则您的程序将无法知道如何解释其余数据在遇到一个字节中使用值0。它不知道

是将此字节解释为字符串的零终止特征，而忽略了字符串的所有剩余字节，还是
将此字节解释为二进制数据，并将其余字节视为二进制数据（即不要忽略其余字节）。<<<<<<<<<<<<<<<<<<<<<<<<<< /p>

但是，如果您始终在终止带有空字符的字符串的null字符后填充所有剩余字节，则您的程序将不会担心数组内容是否代表字符串或二进制数据，因为它可以以相同的方式对处理。

因此，在我的答案的其余部分中，我假设如果数组内容代表一个字符串，那么字符串结束后的所有剩余字节都将带有值0的字节。这样，我可以假设不应忽略字节。

哈希表解决方案

在Python中，我将使用集合/hashtable/dict而不是列表（因为非常快的O（1）查找）：

也可以使用a hash表在C中，尽管您必须自己编程或使用已经存在的库。 C标准库不提供任何哈希功能。

以下代码将随机生成10,000个固定长度的数组（不是null终止的）长度10的字符串，字符a to z，a a 到z和0 to 9，但它也会将三个硬编码单词放入不同位置的数组，因此您可以稍后搜索这些单词，以测试搜索功能。然后，该程序将将所有单词插入哈希表中，然后在哈希表中执行几个硬编码的查找。

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <stdbool.h>

#define NUM_ARRAYS 10000
#define ARRAY_LENGTH 10
#define HASH_TABLE_SIZE 10000

struct hash_table_entry
{
    struct hash_table_entry *next;
    unsigned char data[ARRAY_LENGTH];
};

//function prototype declarations
void fill_with_random_data( unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] );
size_t hash_array( const unsigned char array[ARRAY_LENGTH] );
void build_hash_table_from_data( unsigned char data[][ARRAY_LENGTH], size_t data_length, struct hash_table_entry *hash_table[HASH_TABLE_SIZE] );
void cleanup_hash_table( struct hash_table_entry *hash_table[HASH_TABLE_SIZE] );
bool lookup( const unsigned char array[ARRAY_LENGTH], struct hash_table_entry *hash_table[HASH_TABLE_SIZE] );
bool verbose_lookup( const unsigned char array[ARRAY_LENGTH], struct hash_table_entry *hash_table[HASH_TABLE_SIZE] );

int main( void )
{
    //declare 2D array for the input data
    static unsigned char data[NUM_ARRAYS][ARRAY_LENGTH];

    //declare hash table and initialize all fields to zero
    static struct hash_table_entry *hash_table[HASH_TABLE_SIZE] = {NULL};

    //seed random number generator
    srand( (unsigned)time( NULL ) );

    //fill array "data" with random data
    printf( "Generating random data..." );
    fflush( stdout );
    fill_with_random_data( data );
    printf( "done\n" );
    fflush( stdout );

    //overwrite a few strings, so that we can search for
    //them later
    _Static_assert( NUM_ARRAYS > 500, "unexpected value" );
    _Static_assert( ARRAY_LENGTH == 10, "unexpected value" );
    memcpy( data[47], "abcdefghij", 10 );
    memcpy( data[218], "klmnopqrst", 10 );
    memcpy( data[419], "uvwxyz0123", 10 );

    //build hash table
    printf( "Building hash table..." );
    fflush( stdout );
    build_hash_table_from_data( data, NUM_ARRAYS, hash_table );
    printf( "done\n" );
    fflush( stdout );

    //perform lookups
    verbose_lookup( (unsigned char *)"uvwxyz0123", hash_table );
    verbose_lookup( (unsigned char *)"hsdkesldhg", hash_table );
    verbose_lookup( (unsigned char *)"abcdefghij", hash_table );
    verbose_lookup( (unsigned char *)"erlsodn3ls", hash_table );
    verbose_lookup( (unsigned char *)"klmnopqrst", hash_table );

    //cleanup
    printf( "Performing cleanup..." );
    fflush( stdout );
    cleanup_hash_table( hash_table );
    printf( "done\n" );
    fflush( stdout );
}

void fill_with_random_data( unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] )
{
    static const unsigned char random_chars[62] =
        "abcdefghijklmnopqrstuvwxyz"
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "0123456789";

    for ( size_t i = 0; i < NUM_ARRAYS; i++ )
    {
        for ( size_t j = 0; j < ARRAY_LENGTH; j++ )
        {
            data[i][j] = random_chars[rand()%sizeof random_chars];
        }
    }
}

//This is a simple hash function. Depending on the type of data, a
//more complex hashing function may be required, in order to prevent
//hash collisions.
size_t hash_array( const unsigned char array[ARRAY_LENGTH] )
{
    size_t hash = 0;

    for ( size_t i = 0; i < ARRAY_LENGTH; i++ )
    {
        hash = hash * 7 + array[i];
    }

    return hash % HASH_TABLE_SIZE;
}

void build_hash_table_from_data( unsigned char data[][ARRAY_LENGTH], size_t data_length, struct hash_table_entry *hash_table[HASH_TABLE_SIZE] )
{
    for ( size_t i = 0; i < data_length; i++ )
    {
        struct hash_table_entry **pp, *p;

        //hash the array and store the result
        size_t hash = hash_array( data[i] );

        //perform hash table lookup
        pp = &hash_table[hash];

        //make pp point to the last null pointer in the list
        while ( ( p = *pp ) != NULL )
            pp = &p->next;

        //allocate memory for linked list node
        p = malloc( sizeof *p );
        if ( p == NULL )
        {
            fprintf( stderr, "Memory allocation failure!\n" );
            exit( EXIT_FAILURE );
        }

        //fill in data for new node
        p->next = NULL;
        memcpy( p->data, data[i], ARRAY_LENGTH );

        //insert new node into linked list
        *pp = p;
    }
}

void cleanup_hash_table( struct hash_table_entry *hash_table[HASH_TABLE_SIZE] )
{
    for ( size_t i = 0; i < HASH_TABLE_SIZE; i++ )
    {
        struct hash_table_entry *p;

        p = hash_table[i];

        while ( p != NULL )
        {
            struct hash_table_entry *q = p;

            p = p->next;

            free( q );
        }
    }
}

bool lookup( const unsigned char array[ARRAY_LENGTH], struct hash_table_entry *hash_table[HASH_TABLE_SIZE] )
{
    size_t hash;
    struct hash_table_entry *p;

    //find hash
    hash = hash_array( array );

    //index into hash table
    p = hash_table[hash];

    //attempt to find array in linked list
    while ( p != NULL )
    {
        if ( memcmp( array, p->data, ARRAY_LENGTH ) == 0 )
        {
            //array was found
            return true;
        }

        p = p->next;
    }

    //array was not found
    return false;
}

//This function serves as a wrapper function for the
//function "lookup". It prints to "stdout" what it is
//looking for and what the result of the lookup was.
bool verbose_lookup( const unsigned char array[ARRAY_LENGTH], struct hash_table_entry *hash_table[HASH_TABLE_SIZE] )
{
    //print message
    printf( "Searching for %.*s...", ARRAY_LENGTH, array );
    fflush( stdout );

    if ( lookup( array, hash_table ) )
    {
        printf( "found\n" );
        fflush( stdout );
        return true;
    }
    else
    {
        printf( "not found\n" );
        fflush( stdout );
        return false;
    }
}

该程序具有以下输出：

Generating random data...done
Building hash table...done
Searching for uvwxyz0123...found
Searching for hsdkesldhg...not found
Searching for abcdefghij...found
Searching for erlsodn3ls...not found
Searching for klmnopqrst...found
Performing cleanup...done

如您所见，它发现了所有明确插入的3个字符串，并且找不到其他字符串。

在代码中，我使用无符号char而不是char，因为在c中，a char *通常用于传递null终止的字符串。因此，使用无符号char *用于传递可能是二进制或不终止的数据似乎更合适。

`bsearch`

用于比较的解决方案，以下是一种以相同方式生成输入数组的解决方案，但使用 bsearch 而不是用于搜索的哈希表：

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <stdbool.h>

#define NUM_ARRAYS 10000
#define ARRAY_LENGTH 10

//function prototype declarations
void fill_with_random_data( unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] );
int compare_func( const void *a, const void *b );
bool verbose_search( const unsigned char array[ARRAY_LENGTH], unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] );

int main( void )
{
    //declare 2D array for the input data
    static unsigned char data[NUM_ARRAYS][ARRAY_LENGTH];

    //seed random number generator
    srand( (unsigned)time( NULL ) );

    //fill array "data" with random data
    printf( "Generating random data..." );
    fflush( stdout );
    fill_with_random_data( data );
    printf( "done\n" );
    fflush( stdout );

    //overwrite a few strings, so that we can search for
    //them later
    _Static_assert( NUM_ARRAYS > 500, "unexpected value" );
    _Static_assert( ARRAY_LENGTH == 10, "unexpected value" );
    memcpy( data[47], "abcdefghij", 10 );
    memcpy( data[218], "klmnopqrst", 10 );
    memcpy( data[419], "uvwxyz0123", 10 );

    //sort the array
    printf( "Sorting array..." );
    fflush( stdout );
    qsort( data, NUM_ARRAYS, ARRAY_LENGTH, compare_func );
    printf( "done\n" );
    fflush( stdout );

    //perform lookups
    verbose_search( (unsigned char *)"uvwxyz0123", data );
    verbose_search( (unsigned char *)"hsdkesldhg", data );
    verbose_search( (unsigned char *)"abcdefghij", data );
    verbose_search( (unsigned char *)"erlsodn3ls", data );
    verbose_search( (unsigned char *)"klmnopqrst", data );
}

void fill_with_random_data( unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] )
{
    static const unsigned char random_chars[62] =
        "abcdefghijklmnopqrstuvwxyz"
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "0123456789";

    for ( size_t i = 0; i < NUM_ARRAYS; i++ )
    {
        for ( size_t j = 0; j < ARRAY_LENGTH; j++ )
        {
            data[i][j] = random_chars[rand()%sizeof random_chars];
        }
    }
}

int compare_func( const void *a, const void *b )
{
    return memcmp( a, b, ARRAY_LENGTH );
}

//This function serves as a wrapper function for the
//function "bsearch". It prints to "stdout" what it is
//looking for and what the result of the search was.
bool verbose_search( const unsigned char array[ARRAY_LENGTH], unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] )
{
    //print message
    printf( "Searching for %.*s...", ARRAY_LENGTH, array );
    fflush( stdout );

    if ( bsearch( array, data, NUM_ARRAYS, ARRAY_LENGTH, compare_func ) != NULL )
    {
        printf( "found\n" );
        fflush( stdout );
        return true;
    }
    else
    {
        printf( "not found\n" );
        fflush( stdout );
        return false;
    }
}

此程序具有以下输出：

Generating random data...done
Sorting array...done
Searching for uvwxyz0123...found
Searching for hsdkesldhg...not found
Searching for abcdefghij...found
Searching for erlsodn3ls...not found
Searching for klmnopqrst...found

哈希表解决方案可能比二进制搜索解决方案更快假设为输入选择了良好的哈希函数，因此哈希碰撞的数量很小。

Note about storing both null-terminated strings and binary data in an array

In your question, you specify that the fixed-length arrays could represent

null-terminated strings or
binary data.

If your program has no way of knowing which one of these two possibilities is represented by the array contents, then, if the array contents is supposed to represent a null-terminated string, you will have to fill the remainder of the array with null characters (for example using the function strncpy). It won't be sufficient to only write a single null terminating character.

This is important, because if you don't do this, your program will have no way of knowing how to interpret the remaining data after it encounters a byte with the value 0. It won't know whether to

interpret this byte as a null-terminating character of a string and ignore all remaining bytes of the string, or
interpret this byte as binary data and treat the remaining bytes also as binary data (i.e. not ignore the remaining bytes).

However, if you always fill all remaining bytes after a terminating null character of a string with null characters, then your program won't habe to worry about whether the array contents represents a string or binary data, because it can treat both the same way.

Therefore, in the remainder of my answer, I will assume that if the array contents represents a string, then all remaining bytes after the end of the string will be filled with bytes with the value 0. That way, I can assume that no bytes should ever be ignored.

Hash table solution

In Python, I would use a set/hashtable/dict and not a list (because of very fast O(1) lookup):

It is also possible to use a hash table in C, although you will have to program it yourself or use an already existing library. The C standard library does not provide any hashing functions.

The following code will randomly generate an array of 10,000 fixed-length (not null-terminated) strings of length 10 with the characters a to z, A to Z and 0 to 9, but it will also place three hard-coded words into the array in different places, so you can search for these words later, in order to test the search function. The program will then insert all words into a hash table, and then perform several hard-coded lookups into the hash table.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <stdbool.h>

#define NUM_ARRAYS 10000
#define ARRAY_LENGTH 10
#define HASH_TABLE_SIZE 10000

struct hash_table_entry
{
    struct hash_table_entry *next;
    unsigned char data[ARRAY_LENGTH];
};

//function prototype declarations
void fill_with_random_data( unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] );
size_t hash_array( const unsigned char array[ARRAY_LENGTH] );
void build_hash_table_from_data( unsigned char data[][ARRAY_LENGTH], size_t data_length, struct hash_table_entry *hash_table[HASH_TABLE_SIZE] );
void cleanup_hash_table( struct hash_table_entry *hash_table[HASH_TABLE_SIZE] );
bool lookup( const unsigned char array[ARRAY_LENGTH], struct hash_table_entry *hash_table[HASH_TABLE_SIZE] );
bool verbose_lookup( const unsigned char array[ARRAY_LENGTH], struct hash_table_entry *hash_table[HASH_TABLE_SIZE] );

int main( void )
{
    //declare 2D array for the input data
    static unsigned char data[NUM_ARRAYS][ARRAY_LENGTH];

    //declare hash table and initialize all fields to zero
    static struct hash_table_entry *hash_table[HASH_TABLE_SIZE] = {NULL};

    //seed random number generator
    srand( (unsigned)time( NULL ) );

    //fill array "data" with random data
    printf( "Generating random data..." );
    fflush( stdout );
    fill_with_random_data( data );
    printf( "done\n" );
    fflush( stdout );

    //overwrite a few strings, so that we can search for
    //them later
    _Static_assert( NUM_ARRAYS > 500, "unexpected value" );
    _Static_assert( ARRAY_LENGTH == 10, "unexpected value" );
    memcpy( data[47], "abcdefghij", 10 );
    memcpy( data[218], "klmnopqrst", 10 );
    memcpy( data[419], "uvwxyz0123", 10 );

    //build hash table
    printf( "Building hash table..." );
    fflush( stdout );
    build_hash_table_from_data( data, NUM_ARRAYS, hash_table );
    printf( "done\n" );
    fflush( stdout );

    //perform lookups
    verbose_lookup( (unsigned char *)"uvwxyz0123", hash_table );
    verbose_lookup( (unsigned char *)"hsdkesldhg", hash_table );
    verbose_lookup( (unsigned char *)"abcdefghij", hash_table );
    verbose_lookup( (unsigned char *)"erlsodn3ls", hash_table );
    verbose_lookup( (unsigned char *)"klmnopqrst", hash_table );

    //cleanup
    printf( "Performing cleanup..." );
    fflush( stdout );
    cleanup_hash_table( hash_table );
    printf( "done\n" );
    fflush( stdout );
}

void fill_with_random_data( unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] )
{
    static const unsigned char random_chars[62] =
        "abcdefghijklmnopqrstuvwxyz"
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "0123456789";

    for ( size_t i = 0; i < NUM_ARRAYS; i++ )
    {
        for ( size_t j = 0; j < ARRAY_LENGTH; j++ )
        {
            data[i][j] = random_chars[rand()%sizeof random_chars];
        }
    }
}

//This is a simple hash function. Depending on the type of data, a
//more complex hashing function may be required, in order to prevent
//hash collisions.
size_t hash_array( const unsigned char array[ARRAY_LENGTH] )
{
    size_t hash = 0;

    for ( size_t i = 0; i < ARRAY_LENGTH; i++ )
    {
        hash = hash * 7 + array[i];
    }

    return hash % HASH_TABLE_SIZE;
}

void build_hash_table_from_data( unsigned char data[][ARRAY_LENGTH], size_t data_length, struct hash_table_entry *hash_table[HASH_TABLE_SIZE] )
{
    for ( size_t i = 0; i < data_length; i++ )
    {
        struct hash_table_entry **pp, *p;

        //hash the array and store the result
        size_t hash = hash_array( data[i] );

        //perform hash table lookup
        pp = &hash_table[hash];

        //make pp point to the last null pointer in the list
        while ( ( p = *pp ) != NULL )
            pp = &p->next;

        //allocate memory for linked list node
        p = malloc( sizeof *p );
        if ( p == NULL )
        {
            fprintf( stderr, "Memory allocation failure!\n" );
            exit( EXIT_FAILURE );
        }

        //fill in data for new node
        p->next = NULL;
        memcpy( p->data, data[i], ARRAY_LENGTH );

        //insert new node into linked list
        *pp = p;
    }
}

void cleanup_hash_table( struct hash_table_entry *hash_table[HASH_TABLE_SIZE] )
{
    for ( size_t i = 0; i < HASH_TABLE_SIZE; i++ )
    {
        struct hash_table_entry *p;

        p = hash_table[i];

        while ( p != NULL )
        {
            struct hash_table_entry *q = p;

            p = p->next;

            free( q );
        }
    }
}

bool lookup( const unsigned char array[ARRAY_LENGTH], struct hash_table_entry *hash_table[HASH_TABLE_SIZE] )
{
    size_t hash;
    struct hash_table_entry *p;

    //find hash
    hash = hash_array( array );

    //index into hash table
    p = hash_table[hash];

    //attempt to find array in linked list
    while ( p != NULL )
    {
        if ( memcmp( array, p->data, ARRAY_LENGTH ) == 0 )
        {
            //array was found
            return true;
        }

        p = p->next;
    }

    //array was not found
    return false;
}

//This function serves as a wrapper function for the
//function "lookup". It prints to "stdout" what it is
//looking for and what the result of the lookup was.
bool verbose_lookup( const unsigned char array[ARRAY_LENGTH], struct hash_table_entry *hash_table[HASH_TABLE_SIZE] )
{
    //print message
    printf( "Searching for %.*s...", ARRAY_LENGTH, array );
    fflush( stdout );

    if ( lookup( array, hash_table ) )
    {
        printf( "found\n" );
        fflush( stdout );
        return true;
    }
    else
    {
        printf( "not found\n" );
        fflush( stdout );
        return false;
    }
}

This program has the following output:

Generating random data...done
Building hash table...done
Searching for uvwxyz0123...found
Searching for hsdkesldhg...not found
Searching for abcdefghij...found
Searching for erlsodn3ls...not found
Searching for klmnopqrst...found
Performing cleanup...done

As you can see, it found all 3 strings that were explicitly inserted, and no other strings were found.

In the code, I am using unsigned char instead of char, because in C, a char * is usually used for passing null-terminated strings. Therefore, it seemed more appropriate to use unsigned char * for passing data that could be binary or not null-terminated.

`bsearch` solution

For comparison, here is a solution which generates the input array in the same way, but uses bsearch instead of a hash table for searching it:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <stdbool.h>

#define NUM_ARRAYS 10000
#define ARRAY_LENGTH 10

//function prototype declarations
void fill_with_random_data( unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] );
int compare_func( const void *a, const void *b );
bool verbose_search( const unsigned char array[ARRAY_LENGTH], unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] );

int main( void )
{
    //declare 2D array for the input data
    static unsigned char data[NUM_ARRAYS][ARRAY_LENGTH];

    //seed random number generator
    srand( (unsigned)time( NULL ) );

    //fill array "data" with random data
    printf( "Generating random data..." );
    fflush( stdout );
    fill_with_random_data( data );
    printf( "done\n" );
    fflush( stdout );

    //overwrite a few strings, so that we can search for
    //them later
    _Static_assert( NUM_ARRAYS > 500, "unexpected value" );
    _Static_assert( ARRAY_LENGTH == 10, "unexpected value" );
    memcpy( data[47], "abcdefghij", 10 );
    memcpy( data[218], "klmnopqrst", 10 );
    memcpy( data[419], "uvwxyz0123", 10 );

    //sort the array
    printf( "Sorting array..." );
    fflush( stdout );
    qsort( data, NUM_ARRAYS, ARRAY_LENGTH, compare_func );
    printf( "done\n" );
    fflush( stdout );

    //perform lookups
    verbose_search( (unsigned char *)"uvwxyz0123", data );
    verbose_search( (unsigned char *)"hsdkesldhg", data );
    verbose_search( (unsigned char *)"abcdefghij", data );
    verbose_search( (unsigned char *)"erlsodn3ls", data );
    verbose_search( (unsigned char *)"klmnopqrst", data );
}

void fill_with_random_data( unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] )
{
    static const unsigned char random_chars[62] =
        "abcdefghijklmnopqrstuvwxyz"
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "0123456789";

    for ( size_t i = 0; i < NUM_ARRAYS; i++ )
    {
        for ( size_t j = 0; j < ARRAY_LENGTH; j++ )
        {
            data[i][j] = random_chars[rand()%sizeof random_chars];
        }
    }
}

int compare_func( const void *a, const void *b )
{
    return memcmp( a, b, ARRAY_LENGTH );
}

//This function serves as a wrapper function for the
//function "bsearch". It prints to "stdout" what it is
//looking for and what the result of the search was.
bool verbose_search( const unsigned char array[ARRAY_LENGTH], unsigned char data[NUM_ARRAYS][ARRAY_LENGTH] )
{
    //print message
    printf( "Searching for %.*s...", ARRAY_LENGTH, array );
    fflush( stdout );

    if ( bsearch( array, data, NUM_ARRAYS, ARRAY_LENGTH, compare_func ) != NULL )
    {
        printf( "found\n" );
        fflush( stdout );
        return true;
    }
    else
    {
        printf( "not found\n" );
        fflush( stdout );
        return false;
    }
}

This program has the following output:

Generating random data...done
Sorting array...done
Searching for uvwxyz0123...found
Searching for hsdkesldhg...not found
Searching for abcdefghij...found
Searching for erlsodn3ls...not found
Searching for klmnopqrst...found

The hash table solution is probably faster than the binary search solution though, assuming that a good hash function is selected for the input, so that the number of hash collisions is minimal.

回复收藏 0 原文

小红帽 2025-02-05 11:05:28

bsearch函数的签名如下：

   void *bsearch(const void *key, const void *base,
                 size_t nmemb, size_t size,
                 int (*compar)(const void *, const void *));

键将是t1或t2和基本将是t。 NMEMB和size分别为9和4。

比较是指向回调函数进行比较的指针。这可以是围绕memcmp的包装器：

int compare_char4(const void *p1, const void *p2)
{
    return memcmp(p1, p2, 4);
}

然后，您使用这些参数调用bsearch：

printf("%s\n", bsearch(t1, T, 9, 4, compare_char4) ? "found" : "not found");
printf("%s\n", bsearch(t2, T, 9, 4, compare_char4) ? "found" : "not found");

The signature of the bsearch function is as follows:

   void *bsearch(const void *key, const void *base,
                 size_t nmemb, size_t size,
                 int (*compar)(const void *, const void *));

Here, key would be either t1 or t2 and base would be T. nmemb and size would be 9 and 4 respectively.

compar is a pointer to a callback function to do the comparison. This can just be a wrapper around memcmp:

int compare_char4(const void *p1, const void *p2)
{
    return memcmp(p1, p2, 4);
}

Then you call bsearch with these parameters:

printf("%s\n", bsearch(t1, T, 9, 4, compare_char4) ? "found" : "not found");
printf("%s\n", bsearch(t2, T, 9, 4, compare_char4) ? "found" : "not found");

回复收藏 0 原文

走走停停 2025-02-05 11:05:28

我认为以下二进制搜索起作用（感谢@weathervane），如果对数组的数组进行排序，则复杂性为O（log n）。我们可以使用memcmp进行比较。

int binarysearch(unsigned char *T, unsigned char *t, int num, int size)  // https://en.wikipedia.org/wiki/Binary_search_algorithm#Algorithm
{
    int a = 0;
    int b = num-1;
    int m, res;
    while (a <= b) {
        m = (a + b) / 2;
        res = memcmp(&T[0] + m*size, &t[0], size);
        if (res < 0) {
            a = m + 1;
        } else if (res > 0) {
            b = m - 1;
        } else { 
            return 1;
        } 
    }
    return 0;
}
int main()
{
    unsigned char T[][4] = { 
        { 0x00, 0x6e, 0xab, 0xcd },
        { 0xea, 0x6e, 0xab, 0xcd },
        { 0xeb, 0x6e, 0xab, 0xcd },
        { 0xec, 0x6e, 0xab, 0xcd },
        { 0xed, 0x6e, 0xab, 0xcd },
        { 0xef, 0x6e, 0xab, 0xcd },
        { 0xfa, 0x6e, 0xab, 0xcd },
        { 0xfb, 0x6e, 0xab, 0xcd },
        { 0xff, 0x6e, 0xab, 0xcd }
    };
    unsigned char t1[4] = { 0x33, 0x6e, 0xab, 0xcd };
    unsigned char t2[4] = { 0xfb, 0x6e, 0xab, 0xcd };
    printf("%s\n", binarysearch(T[0], t1, 9, 4) ? "found" : "not found");
    printf("%s\n", binarysearch(T[0], t2, 9, 4) ? "found" : "not found");
}

结果：

找不到
找到

I think the following binary search works (thanks to @WeatherVane) if the array of arrays is sorted, then the complexity is O(log n). We can use memcmp for the comparison.

int binarysearch(unsigned char *T, unsigned char *t, int num, int size)  // https://en.wikipedia.org/wiki/Binary_search_algorithm#Algorithm
{
    int a = 0;
    int b = num-1;
    int m, res;
    while (a <= b) {
        m = (a + b) / 2;
        res = memcmp(&T[0] + m*size, &t[0], size);
        if (res < 0) {
            a = m + 1;
        } else if (res > 0) {
            b = m - 1;
        } else { 
            return 1;
        } 
    }
    return 0;
}
int main()
{
    unsigned char T[][4] = { 
        { 0x00, 0x6e, 0xab, 0xcd },
        { 0xea, 0x6e, 0xab, 0xcd },
        { 0xeb, 0x6e, 0xab, 0xcd },
        { 0xec, 0x6e, 0xab, 0xcd },
        { 0xed, 0x6e, 0xab, 0xcd },
        { 0xef, 0x6e, 0xab, 0xcd },
        { 0xfa, 0x6e, 0xab, 0xcd },
        { 0xfb, 0x6e, 0xab, 0xcd },
        { 0xff, 0x6e, 0xab, 0xcd }
    };
    unsigned char t1[4] = { 0x33, 0x6e, 0xab, 0xcd };
    unsigned char t2[4] = { 0xfb, 0x6e, 0xab, 0xcd };
    printf("%s\n", binarysearch(T[0], t1, 9, 4) ? "found" : "not found");
    printf("%s\n", binarysearch(T[0], t2, 9, 4) ? "found" : "not found");
}

Result: