C 代码 - 从 2D 字符数组中删除重复字的逻辑

发布于 2024-11-25 01:31:02 字数 898 浏览 0 评论 0原文

您好,有一个 C 代码,其中有一个 2D 字符数组 -

names[100][20] //Currently maximum 100 names, each of 19 characters supported

该数组由一些带有名称的逻辑填充。我在变量名称_found 中跟踪实际找到的名称总数(可能少于 100 个名称)。

现在我想删除可能存在的重复名称。我打算做的是类似的事情。

for(i=0;i<names_found;i++)
{
    for(j=i+1;j<names_found;j++)
    {
       //Then compare(strcmp) each string/name with every other.
       //e.g. if there were 4 names the comparisons done would be
       //{name[0],name[1]},{name[0],name[2]},{name[0],name[3]}
       //{name[1],name[2]} , {name[1],name[3]}
       //& {name[2],name[3]}
       //And then some more logic to remove duplicate based on result of strcmp    results. Don't know what this logic would look like to store the result in place, in same 2D character buffer?

     }

}
我正在做的这种重复词删除的逻辑在功能上正确吗?

我怎样才能优化它的速度。

任何更好/更快的解决方案。

Hi have a C code in which I have a 2D char array as -

names[100][20] //Currently maximum 100 names, each of 19 characters supported

This array gets filled by some logic with names. I keep track of total number of names found actually(it could be less than 100 names) in a variable names_found.

Now I want to remove duplicate names which might be present. What I plan to do is something like.

for(i=0;i<names_found;i++)
{
    for(j=i+1;j<names_found;j++)
    {
       //Then compare(strcmp) each string/name with every other.
       //e.g. if there were 4 names the comparisons done would be
       //{name[0],name[1]},{name[0],name[2]},{name[0],name[3]}
       //{name[1],name[2]} , {name[1],name[3]}
       //& {name[2],name[3]}
       //And then some more logic to remove duplicate based on result of strcmp    results. Don't know what this logic would look like to store the result in place, in same 2D character buffer?

     }

}
Is this logic of duplicate word removal, what I am doing correct, functionally?

How can I optimize it for speed.

Any better/faster solution.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

¢好甜 2024-12-02 01:31:02

这是一个简单的方法。它假设名称的顺序并不重要:

for (i = 0; i < names_found; i ++)
{
    j = i + 1;
    while (j < names_found)
    {
        if (strcmp(names[i], names[j]) == 0)
        {
            memmove(names + j, names + (names_found - 1), sizeof(names[0]));
            -- names_found;
        }
        else
            ++ j;
    }
}

This is a simple approach. It assumes that the order of the names isn't important:

for (i = 0; i < names_found; i ++)
{
    j = i + 1;
    while (j < names_found)
    {
        if (strcmp(names[i], names[j]) == 0)
        {
            memmove(names + j, names + (names_found - 1), sizeof(names[0]));
            -- names_found;
        }
        else
            ++ j;
    }
}
情话已封尘 2024-12-02 01:31:02

有很多方法可以更快地做到这一点,但不一定适用于这么小的集合。此外,删除名称的逻辑可能会比您想象的要花费更长的时间,因为它要么会导致您必须解决的数组中存在空白,要么您需要 memmove() 您的答案来填补空白。

即时的 Boyer-Moore 类型搜索可能会加快速度,但根据 strcmp 函数的速度,由于设置查找等方面的开销,您可能不会从中获得任何好处。如果设置正确,您可能可以使用 strstr() 来代替搜索,这可能确实使用更高级的搜索算法。

基本上,您的集合非常小,因此优化可能有点不成熟。

There are ways and ways to do this faster, but not necessarily for such a small set. Also, your logic for removing the names will probably take longer than you think because it will either cause gaps in the array you will have to work around, or you will need to memmove() your answers back down to fill the gap.

Off hand a Boyer-Moore type search might speed things up, but depending on how fast the strcmp function is, you might not get any benefit from this due to overhead in setting up the lookups and such. If you set things up right, you might be able to use strstr() instead for your search, which probably does use a more advanced search algo.

Basically, your set is so small that optimizations may be a bit premature here.

似狗非友 2024-12-02 01:31:02

逻辑上是可以的:对于每个数组元素,查找后面的元素是否存在其他相等的,如果有,则将其删除;但你需要动态改变数组大小;例如,如果删除第一个元素的 3 个重复项,则剩余元素的数量小于 names_found,因此您需要相应地更新它。

如果对数组进行排序(使用快速排序算法,但这可能取决于数据的大小),然后重复项全部“并排”,则可以使其速度更快。使用目标数组会更快,因为如果找到 N 个重复项,则不需要将所有其他数组元素向后移动 N 个位置(在最坏的情况下,您需要一个与源数组大小相同的数组)。

另一种方法是使用哈希容器;在这种情况下,您需要一个库(例如,glib 有一个哈希表“对象”),并且您的代码看起来会有所不同(例如,当您填充哈希表时,您可以“跳过”重复项)。

It is logically ok: for each array element, search if there are other equal in the following elements, and if it is so, remove them; but you need to change dynamically the array size; e.g. if you remove 3 duplicates of the first element, then the remaining number of elements is smaller than names_found, so that you need to update it accordingly.

You can make it faster if you sort the array (with a fast sorting algorithm, but it may depend on the size of the data) and then duplicates are all "side by side". Using a destination array would be faster, since if you find N duplicates, you do not need to move all the other array element back by N positions (in the worst case you need an array of the same size of the source array).

Another approach is to use a hash container; in this case you need a library (e.g. glib has an hashtable "object"), and your code would look different (e.g. you can "skip" duplicates when you populate the hashtable).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文