在 C 中检查数据数组中的相似元素

发布于 2024-08-13 17:45:04 字数 1433 浏览 2 评论 0原文

我创建了一个“地址”结构。每个地址 (xx.yy.zz.mm) 由 xx、yy、zz 和 mm 元素组成，所有元素都是整数。每个地址还有一个与其关联的“名称”元素。

我有一个最多包含 100 个地址的数组，称为“网络”。以下是网络中一些元素的示例：

186.88.1.21 Tyler
186.88.9.11 鲍勃
101.21.0.13 汤姆
111.11.3.89 查克
101.21.5.99 Luke

我需要检查每个地址，看看是否有来自同一位置的其他地址。如果元素 xx 和 yy 相同，则两个地址来自同一位置。如果同一位置有 1 个或多个地址，我需要输出此信息。

下面是我为尝试执行此操作而编写的一些代码：

char temp[11];
int nameCount;
for (i = 0; i < count; i++)
{
    char names[100][10] = {};
    strcpy(temp, network[i].name);
    temp[11] = '\0';
    nameCount = 0;
    for (j = i + 1; j < count; j++)
    {
        if (network[i].xx == network[j].xx && network[i].yy == network[j].yy)
        {
            strcpy(names[nameCount], network[j].name);
            nameCount++;
        } 
    }
    if (nameCount == 0)
        printf("No matches for %s.\n", temp);
    else
    {
        printf("%s ", temp);
        for (j = 0; j < nameCount; j++)
            printf("and %s ", names[i]);
        printf("are from the same location.\n\n");
    }
}

该代码适用于数组中来自同一位置的前两个地址，但不适用于其余地址（尽管看起来几乎可以 - 它是打印空格而不是名称，但它具有正确数量的空格）。我上面列出的地址的输出是（抱歉格式错误）：

Tyler  
 and Bob  
 are from the same location.  

No matches for Bob  
.  
Tom  
 and [space] and [space] are from the same location.  

No matches for Chuck  
.  
Luke  
 and [space] are from the same location.  

No matches for Nick  
.

似乎每个名称的末尾都添加了一个换行符。

原文

I created an "address" structure. Each address (xx.yy.zz.mm) consists of an xx, yy, zz and mm element, all of which are ints. Each address also has a "name" element associated with it.

I have an array of up to 100 addresses called "network". Here is an example of some elements in network:

186.88.1.21 Tyler
186.88.9.11 Bob
101.21.0.13 Tom
111.11.3.89 Chuck
101.21.5.99 Luke

I need to check each address and see if there are other addresses from the same location. Two addresses are from the same location if elements xx and yy are identical. If there are 1 or more addresses from the same location, I need to output this information.

Below is some code I wrote to try to do this:

char temp[11];
int nameCount;
for (i = 0; i < count; i++)
{
    char names[100][10] = {};
    strcpy(temp, network[i].name);
    temp[11] = '\0';
    nameCount = 0;
    for (j = i + 1; j < count; j++)
    {
        if (network[i].xx == network[j].xx && network[i].yy == network[j].yy)
        {
            strcpy(names[nameCount], network[j].name);
            nameCount++;
        } 
    }
    if (nameCount == 0)
        printf("No matches for %s.\n", temp);
    else
    {
        printf("%s ", temp);
        for (j = 0; j < nameCount; j++)
            printf("and %s ", names[i]);
        printf("are from the same location.\n\n");
    }
}

This code works for the first two addresses in the array which are from the same location, but it doesn't work for the rest (although it looks like it almost does -- it's printing blanks instead of names, but it has the right number of blanks). The output for the addresses I listed above is (sorry for the bad formatting):

Tyler  
 and Bob  
 are from the same location.  

No matches for Bob  
.  
Tom  
 and [space] and [space] are from the same location.  

No matches for Chuck  
.  
Luke  
 and [space] are from the same location.  

No matches for Nick  
.

It also seems like there is a newline character that has been added to the end of each name.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

枉心 2024-08-20 17:45:05

这里至少存在几个问题。

0： temp[11] 是您定义的 char 数组中的第 12 个元素，长度为 11 个元素。这是缓冲区溢出。

1: names[100][10] 应该是 names[100][11]，以便每个元素足够大来存储 temp< 中的值/代码>。

2：您正在使用 strcpy()，然后插入一个终止字符，大概是为了防止您从 strcpy() 复制了超过 10 个字符。在这种情况下，就会出现数据溢出。您想要使用 strncpy()，然后终止字符串。

strcpy(temp, network[i].name);
temp[11] = '\0';

with

strncpy(temp, network[i].name, sizeof(temp) - 1);
temp[sizeof(temp) - 1] = '\0';

并替换

        strcpy(names[nameCount], network[j].name);
        nameCount++;

为

        strncpy(names[nameCount], network[j].name, sizeof(names[nameCount] - 1);
        names[nameCount][sizeof(nameCount) - 1] = '\0';
        nameCount++;

3：打印“and %s ”列表的循环使用错误的变量取消引用数组。您正在使用“j”进行迭代，但将第“i”个元素拉出。

4：就换行符而言，network[i].name（对于任何 i）很可能包含您要复制的换行符。

5：如果您在同一位置有三个东西，您可能会以您不希望的方式列出它们。

1.1.1.1 chuck
1.1.2.2 larry
1.1.3.3 biff

可能会输出（一旦修复了其他错误）

chuck and larry and biff are from the same location
larry and biff are from the same location
No matches for biff.

修复该问题将作为练习。

There are at least several problems here.

0: temp[11] is the twelfth element in a char array you've defined to be 11 elements long. This is a buffer overrun.

1: names[100][10] should be names[100][11], so that each element is large enough to store a value from temp.

2: you're using strcpy(), then inserting a terminating character, presumably in case you copied more than 10 characters from strcpy(). In that case, you have a data overflow. You want to use strncpy(), and then terminate the string.

strcpy(temp, network[i].name);
temp[11] = '\0';

with

strncpy(temp, network[i].name, sizeof(temp) - 1);
temp[sizeof(temp) - 1] = '\0';

and replace

        strcpy(names[nameCount], network[j].name);
        nameCount++;

with

        strncpy(names[nameCount], network[j].name, sizeof(names[nameCount] - 1);
        names[nameCount][sizeof(nameCount) - 1] = '\0';
        nameCount++;

3: the loop where you print the "and %s " list is dereferencing the array using the wrong variable. You're iterating using 'j', but pulling the 'i'th element out.

4: as far as the newline goes, it's very likely the case that network[i].name (for any i) contains a newline character that you're copying in.

5: if you have three things from the same location, you'll probably list them in a way you may not want.

1.1.1.1 chuck
1.1.2.2 larry
1.1.3.3 biff

will likely output (once the other bugs are fixed)

chuck and larry and biff are from the same location
larry and biff are from the same location
No matches for biff.

Fixing that problem is left as an exercise.

回复收藏 0 原文

美人如玉 2024-08-20 17:45:05

我会稍微改变这一点。我首先根据 xx 和 yy 值对地址/名称数组进行排序。然后你可以穿过阵列，所有来自同一位置的人都会紧挨着……

回复收藏 0 原文

旧时浪漫 2024-08-20 17:45:05

似乎每个名称的末尾都添加了一个换行符。

显然，您使用 fgets() 从文件中读取数据。 fgets() 保留最后的换行符。您可以删除它，例如：

fgets(buf, sizeof buf, file);
if (buf[0] != '\0') buf[strlen(buf) - 1] = '\0';

您的其他问题是错误的索引

    for (j = 0; j < nameCount; j++)
        printf("and %s ", names[i]);
    /*                         ^^^ should be j */

It also seems like there is a newline character that has been added to the end of each name.

Apparently, you use fgets() to read the data from the file. fgets() retains the final newline. You can remove it with, for example:

fgets(buf, sizeof buf, file);
if (buf[0] != '\0') buf[strlen(buf) - 1] = '\0';

You other problem is a wrong index

    for (j = 0; j < nameCount; j++)
        printf("and %s ", names[i]);
    /*                         ^^^ should be j */

回复收藏 0 原文

以酷 2024-08-20 17:45:05

避免使用 strcpy 并使用 strncpy 代替。这将防止您遇到缓冲区溢出问题，这就是我认为这里发生的问题。

数组 temp 的大小为 11，您将一个 10 个字符的字符串复制到其中并添加尾随 '\0' （正确）。 names[100][] 的元素只有 10 个字符长，因此当您将 10 个字符的字符串写入其中时，您将在下一个数组元素的第一个字符中写入 NULL 字符。当您稍后尝试读取此元素时，它将显示为空（这可以解释您所看到的空白名称）。

关于额外的换行符，我会重新检查您在数据中读取的方式。如果您从文本文件中读取它，则可能会读取文件每行末尾的换行符。解决这个问题的方法是将换行符替换为 NULL（因为这通常是字符串的结尾），例如

char* pEndl = strchr(input_string,'\0');
if (pEndl != NULL)
  *pEndl = '\0';

Avoid using strcpy and use strncpy instead. This will prevent you from buffer overflow problems, which is what I think is happening here.

Array temp has size 11, and you copy a 10-character string into it and add a trailing '\0' (correctly). The elements of names[100][] are only 10 characters long, so when you write a 10-character string into one you will write a NULL character into the first character of the next array element. When you later try to read this element, it will appear empty (which would explain the blank names you are seeing).

Regarding the extra newlines, I would re-examine the way you are reading in your data. If you are reading it in from a text file, you are probably reading in the newline at the end of each line of the file. A way around this is to replace the newline with a NULL (since that's typically the end of the string) with something like

char* pEndl = strchr(input_string,'\0');
if (pEndl != NULL)
  *pEndl = '\0';

回复收藏 0 原文

对岸观火 2024-08-20 17:45:05

以下是我在修改代码时采取的一些迭代的不同步骤。我还没有运行过任何一个，但我希望它大部分是正确的（除了最后一个，我已经很长时间没有接触过 C qsort() 函数了）。前两个的复杂度为 O(n^2)，而最后一个的复杂度为 O(n*log(n))。这对于“大型”网络来说很重要。

除非您有特殊需要制作所有这些副本，否则您确实应该远离它。

下面代码的最后一个版本也修改了数组的顺序。（它排序）。

for (int i = 0; i < count; i++) { 
    bool any_matches = false;

    for (int j = i + 1; j < count; j++) {
        if (network[i].xx == network[j].xx && network[i].yy == network[j].yy) {
            if (!any_matches) {
                 printf("%s ", network[i].name);               
                 any_matches = true;
            }

            printf("and %s ", network[j].name);
        }
    }

    if (any_matches == false)
        printf("No matches for %s.\n", network[i].name);
    else
        printf("are from the same location.\n\n");
}

for (int i = 0; i < count; i++) { 
    bool any_matches = false;

    for (int j = i + 1; j < count; j++) {
        printf("%s matches: ", network[i].name);               

        if (network[i].xx == network[j].xx && network[i].yy == network[j].yy)
            printf("%s, ", network[j].name);
    }
}

int compare_networks(struct Network *left, struct Network *right) {
    if (left->xx < right->xx)
        return -1;
    if (left->xx > right->xx)
        return 1;
    if (left->yy < right->yy)
        return -1;
    if (left->yy > right->yy)
        return 1;
    return 0;
}

// Sort the list
qsort(network, count, sizeof(network), compare_networks);

printf("%s matches: ", network[0].name);
for (int i=1; i<count; ++i) {
    if (network[i-1].xx == network[i].xx && network[i-1].yy == network[i].yy)
        printf("%s, ", network[i].name);
    else
        printf("\n%s matches: ", network[i].name);
}

Here's some iteratively different steps I took at modifying your code. I haven't run any of this, but I expect it to be mostly right (except for the last one, I haven't touched the C qsort() function in a long time). The first two have complexity O(n^2), while the last is complexity O(n*log(n)). This would matter on "large" networks.

Unless you have a particular need to make all those copies, you really should stay away from it.

The last version of the code below also modifies the order of the array. (It sorts it).

for (int i = 0; i < count; i++) { 
    bool any_matches = false;

    for (int j = i + 1; j < count; j++) {
        if (network[i].xx == network[j].xx && network[i].yy == network[j].yy) {
            if (!any_matches) {
                 printf("%s ", network[i].name);               
                 any_matches = true;
            }

            printf("and %s ", network[j].name);
        }
    }

    if (any_matches == false)
        printf("No matches for %s.\n", network[i].name);
    else
        printf("are from the same location.\n\n");
}

for (int i = 0; i < count; i++) { 
    bool any_matches = false;

    for (int j = i + 1; j < count; j++) {
        printf("%s matches: ", network[i].name);               

        if (network[i].xx == network[j].xx && network[i].yy == network[j].yy)
            printf("%s, ", network[j].name);
    }
}

int compare_networks(struct Network *left, struct Network *right) {
    if (left->xx < right->xx)
        return -1;
    if (left->xx > right->xx)
        return 1;
    if (left->yy < right->yy)
        return -1;
    if (left->yy > right->yy)
        return 1;
    return 0;
}

// Sort the list
qsort(network, count, sizeof(network), compare_networks);

printf("%s matches: ", network[0].name);
for (int i=1; i<count; ++i) {
    if (network[i-1].xx == network[i].xx && network[i-1].yy == network[i].yy)
        printf("%s, ", network[i].name);
    else
        printf("\n%s matches: ", network[i].name);
}

回复收藏 0 原文