使用 SVNkit 从 SVN 存储库下载每个版本中的所有文件 - 请帮助

发布于 2024-12-05 01:34:24 字数 2158 浏览 1 评论 0原文

这是我的问题：

我正在研究一个项目，作为我的文凭论文的一部分。我正在尝试连接到不同的开源项目存储库并从源文件中获取信息。实际上我们分析了这个项目的代码以及期间所做的更改。换句话说，我们希望了解软件如何演变并具体说明所做的更改。因此，我们需要使用 SVNKit 连接到存储库，并为每个源文件下载其每个更改版本的内容。

例如，假设我们有一个具有初始目录结构的项目：

dirA/
-- file1.java
-- file2.java

第一次提交对 dirA/file1.java 进行更改，第二次提交对 dirA/file2.java 和 file1.java 进行更改。我们想要分析初始状态下两个文件（file1.java 和 file2.java）的代码，然后分析第一次和第二次提交期间 file1.java 所做的更改以及第二次提交期间 file2.java 所做的更改。

第三次提交创建目录和文件：

dirB/
-- file3.java
dirA/dirC
-- file4.java

以与上述相同的方式，我们也想分析 dirB/file3.java 和 dirA/dirC/file4.java 的代码因为我们想分析（主）目录结构是如何改变的。

第四次提交将文件 file3.java 复制到 dirA/dirC/ 目录并对此文件进行更改。同样我们要分析复制操作是如何改变目录结构的，并分析提交前后file3.java的内容。

因为我们是面向代码的，所以我们希望从存储库中获取所有源文件及其所有修订版。对于特定文件的每个修订版，我们需要当前修订版（从第一个修订版开始）和前一个修订版的内容，直到最后一个修订版。由于文件不必在每次提交时更改（可能会复制或删除），因此无需下载具有相同内容的重复文件。

我知道有一种方法可以通过递归地对其内容执行向后比较来检索文件的原始状态，方法是仅通过其上次修订的内容来检索文件的原始状态。例如，拥有上次修订版 dirA/file1.java 的内容（在第二次提交期间创建的内容）并具有 diff 输出，我们可以检索文件在该修订版之前（第二次提交之前）的状态。这样就无需下载每个修订版的每个文件的内容。因此，我们只需在第一个修订版下载文件的内容，然后下载每个修订版的每个 diff 输出（如果有），并执行前向 diff 以检索提交后的状态。

解释：

1 - 在修订版 1 file1.java 具有以下内容：

"Content at revision 1 (initial state)"

2 - 在修订版 2 该文件被修改为，并具有以下内容：

"Content at revision 1 (initial state)
 Modification at revision 2 (line added)"

3 - 在修订版 3 该文件被修改为，并具有以下内容：

"Modification at revision 2 (line added)
 Modification at revision 3 (line added)
 First line from revision 1 was removed"

如果我们得到在 file1.java 的日志中，我们将拥有三个条目，每个修改对应一个条目（对应于版本 1、2、3）。对于所有三个修订版，我们都希望检索文件内容，因为每次对源文件进行提交（更改）时我们都会分析代码修改。我们知道如何以简单的方式做到这一点：SVNRepository.getFile(...)。这种方法的问题是，如果我们有 1 个文件已修改 1000 次，我们必须下载其内容 1000 次（每次下载不同的版本号）。也就是说，对于一个包含 100 个源文件并且每个文件大约有 1000 次修改的小项目，我们应该获得 100,000 个不同的内容！另一种方法是获取最后一个修订版的文件内容以及每个先前修订版的文件内容以获取差异输出。我们可以（向后）应用 diff 输出来检索所有先前修订的文件内容。也就是说，我们最小化带宽。这是我正在寻找的解决方案，或者如果有更好的解决方案，您很乐意贡献。

您能给我一些关于如何使用 SVNKit 实现此类功能的帮助吗？如果您提供一些简短的代码示例，或者我必须使用哪些类和方法，那么这将非常有用，这样我就可以阅读 java 文档。每一个帮助将不胜感激。

先感谢您，埃尔维斯。

原文

Here is my problem:

I am working on a project as part of my diploma thesis. I am trying to connect to different Open Source project repositories and get info from source files. Actually we analyze the code of this projects and the changes made on it during the time. In other words, we want to see how the software evolves and specify the changes made on. Therefore, we need to connect to a repository using SVNKit and download for each source file its contents for each revision it is changed.

For example let say we have a project with an initial directory structure:

dirA/
-- file1.java
-- file2.java

The first commit make changes to dirA/file1.java and the second to dirA/file2.java and file1.java. We want to analyze the code of two files (file1.java and file2.java) at initial state and then the changes that were made at file1.java during first and second commit and the changes made at file2.java during second commit.

The third commit creates directories and files:

dirB/
-- file3.java
dirA/dirC
-- file4.java

In the same way as described above we want to analyze the code for dirB/file3.java and dirA/dirC/file4.java, as well as we want to analyze how the (main) directory structure is changed.

The 4th commit copies the file file3.java to dirA/dirC/ directory and makes changes to this file. In the same way we want to analyze how the copy operation changed the directory structure and analyze the contents of file3.java before and after the commit.

Because we are code oriented we want to get all of the source files from repository and all their revisions. For each revision of a particular file we want the contents of current revision (starting from very first revision) and the previous one, until the last revision. Because a file is not necessary changed at each commit (it might be copied or deleted) there is no need to download a duplicate file with same contents.

I know there is a way to retrieve the original state of a file only by having its contents at its last revision by recursively performing backward diff to its contents. For example having the contents of dirA/file1.java at last revision (the one created during second commit) and having the diff output we can retrieve the state of file as it was before this revision (before second commit). This way there is no need to download each file's contents for each revision. So we only have to download the contents of a file at the very first revision and then every diff output (if any) for each revision and perform forward diff to retrieve the state after commit.

Explanation :

1 - at revision 1 file1.java has this content:

"Content at revision 1 (initial state)"

2 - at revision 2 this file is modified to, and has the following content:

"Content at revision 1 (initial state)
 Modification at revision 2 (line added)"

3 - at revision 3 this file is modified to and has the following content:

"Modification at revision 2 (line added)
 Modification at revision 3 (line added)
 First line from revision 1 was removed"

If we get logs for file1.java we will have three entries, one for each modification (which corresponds to rev 1, 2, 3). For all three revisions we want to retrieve file contents, because we analyze code modification each time a commit (change) is made for a source file.
We know how to do it in a simple way: SVNRepository.getFile(...). The problem with this approach is, if we have 1 file which has been modified 1000 we have to download its contents 1000 times (each time for a different rev number). That is, for a small project with 100 source files and approximately 1000 modifications per each file we should get 100,000 different contents!!! An other approach is to get the contents of the file for the very last revision and for each previous revision to get the diff output. Than we can apply diff output (backwardly) to retrieve the contents of the file for all previous revisions. That is, we minimize bandwidth.
This is a solution I am looking for, or if there is a better solution you are pleased to contribute.

Can you please provide me some help on how to implement such functionality with SVNKit. It would be very useful if you provide some short code example, and or which classes and methods I have to use, so I can read the java doc. Every help will be appreciate.

Thank you in advance,
Elvis.

分享到QQ

分享到微博