差异算法 C++

发布于 2024-11-10 14:50:55 字数 938 浏览 3 评论 0原文

我正在尝试用 C++ 创建一个能够比较两个 .txt 文件的程序。

struct line
{
    string text;
    size_t num;
    int status;
};

void compareFiles(vector<line> &buffer_1, vector<line> &buffer_2, size_t index_1, size_t index_2)
{
    while(index_1 < buffer_1.size())
    {
         while(index_2 < buffer_2.size())
         {  
             X = buffer_1[index_1].text;
             Y = buffer_2[index_2].text;
             if(X == Y)
             {
                 ++index_1;
                 ++index_2;
             }
             else
             {
                 LCS();
                 string lcs = printLCS(X.length(), Y.length());

                 /*
                 * Here's my problem
                 */

             }
         }
     }
 }

如您所见，我有两个缓冲区（行向量），之前加载了文件内容。我还有功能齐全的 LCS 算法（经过测试）。 LCS 适用于全局定义的字符串 X 和 Y。

所以，我真正需要做的是将缓冲区与 LCS 进行逐行比较，但我没有找到一种方法来做到这一点。

请你帮助我好吗？

原文

I'm trying to create a program in C++ that could be able to diff two .txt files.

struct line
{
    string text;
    size_t num;
    int status;
};

void compareFiles(vector<line> &buffer_1, vector<line> &buffer_2, size_t index_1, size_t index_2)
{
    while(index_1 < buffer_1.size())
    {
         while(index_2 < buffer_2.size())
         {  
             X = buffer_1[index_1].text;
             Y = buffer_2[index_2].text;
             if(X == Y)
             {
                 ++index_1;
                 ++index_2;
             }
             else
             {
                 LCS();
                 string lcs = printLCS(X.length(), Y.length());

                 /*
                 * Here's my problem
                 */

             }
         }
     }
 }

As you can see, I have two buffers (vectors of lines) previously loaded with files contents . I also have the LCS algorithm fully functional (tested). LCS works on strings X and Y globally defined.

So, what I really need to do is compare buffers line by line with LCS, but I didn't manage a way to do it.

Could you please help me?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初吻给了烟 2024-11-17 14:50:55

当有疑问时，我通常会听取以前做过这件事的人的意见。古老的 diff 程序一直存在，并且可以做您想做的事情。此外，它是开源的，因此请访问 ftp://mirrors。 kernel.org/gnu/diffutils/diffutils-3.0.tar.gz 并查看它。

解压存档后，打开 src/analyze.c。 diff_2_files 函数从第 472 行开始。执行实际比较的代码从第 512 - 537 行运行。它们复制如下：

for (;; cmp->file[0].buffered = cmp->file[1].buffered = 0)
{
    /* Read a buffer's worth from both files.  */
    for (f = 0; f < 2; f++)
        if (0 <= cmp->file[f].desc)
            file_block_read (&cmp->file[f],
                buffer_size - cmp->file[f].buffered);

    /* If the buffers differ, the files differ.  */
    if (cmp->file[0].buffered != cmp->file[1].buffered
            || memcmp (cmp->file[0].buffer,
                    cmp->file[1].buffer,
                    cmp->file[0].buffered))
    {
        changes = 1;
        break;
    }

    /* If we reach end of file, the files are the same.  */
    if (cmp->file[0].buffered != buffer_size)
    {
        changes = 0;
        break;
    }
}

这里的想法是加载两个相同大小的缓冲区，然后加载每个文件进入缓冲区。使用 memcmp 一次比较两个文件的一个缓冲区，并查看是否有任何缓冲区与另一个不同。如果任何缓冲区比较未返回相等，则这两个文件不同。同样重要的是要注意，您永远不必一次读取超过两个缓冲区的数据，因此这种方法也适用于大文件。

When in doubt, I usually defer to someone who's done it before. The venerable diff program has been around forever, and does what you want to do. Furthermore, it is open source, so head over to ftp://mirrors.kernel.org/gnu/diffutils/diffutils-3.0.tar.gz and check it out.

Once you unzip the archive, open up src/analyze.c. The diff_2_files function begins on line 472. The code that does the actual comparison runs from lines 512 - 537. They are reproduced below:

for (;; cmp->file[0].buffered = cmp->file[1].buffered = 0)
{
    /* Read a buffer's worth from both files.  */
    for (f = 0; f < 2; f++)
        if (0 <= cmp->file[f].desc)
            file_block_read (&cmp->file[f],
                buffer_size - cmp->file[f].buffered);

    /* If the buffers differ, the files differ.  */
    if (cmp->file[0].buffered != cmp->file[1].buffered
            || memcmp (cmp->file[0].buffer,
                    cmp->file[1].buffer,
                    cmp->file[0].buffered))
    {
        changes = 1;
        break;
    }

    /* If we reach end of file, the files are the same.  */
    if (cmp->file[0].buffered != buffer_size)
    {
        changes = 0;
        break;
    }
}

The idea there is to load two buffers of identical size, then load each file into a buffer. Compare the two files one buffer at a time using memcmp, and see if any buffer differs from the other. If any buffer comparison does not return equal, then the two files are different. It's also important to note that you never have to read more than two buffers worth of data at a time, so this approach works on huge files too.

回复收藏 0 原文

路弥 2024-11-17 14:50:55

首先，我会重写 LCS() 以两行作为参数并返回最长的公共序列 - 我想象一个像 std::string LCS(const line& lhs, const 行& rhs)。然后我将修改您的 while 循环，如下所示。

for(int i = 0; i < buffer_1.size(); ++i)
{
    for(int j = 0; j < buffer_2.size(); ++j)
    {  
        std::string lcs = LCS(buffer_1[i].text, buffer_2[j].text);
        std::cout << "LCS[" << i << "][" << j << "]: " << lcs << std::endl;
    }
}

这将为 buffer_1 和 buffer_2 中的每个行组合找到并打印最长的公共序列。这是你想做的事吗？我正确理解你的问题吗？

First of all, I would rewrite LCS() to take two lines as parameters and return the longest common sequence—I imagine a function signature like std::string LCS(const line& lhs, const line& rhs). Then I would modify your while loops as follows.

for(int i = 0; i < buffer_1.size(); ++i)
{
    for(int j = 0; j < buffer_2.size(); ++j)
    {  
        std::string lcs = LCS(buffer_1[i].text, buffer_2[j].text);
        std::cout << "LCS[" << i << "][" << j << "]: " << lcs << std::endl;
    }
}

This will find and print the longest common sequence for each combination of lines in buffer_1 and buffer_2. Is this want you want to do? Did I understand your question correctly?

回复收藏 0 原文

~没有更多了~