通过哈希查找 svn 内容

发布于 2024-12-06 13:19:45 字数 533 浏览 1 评论 0原文

svn 存储库中的内容使用两条信息进行唯一标识:

  • 存储库路径
  • 修订号

我正在寻找一种从固定长度消息(例如 8 或 16 字节)中恢复该信息的方法。仅通过存储修订号来从固定长度消息中识别存储库中的内容是不够的。该路径的长度是可变的,无法容纳在消息中。

但是,我想知道 svn 路径+修订对是否可以通过哈希访问,就像 Git 那样。 svn 中是否已经内置了这种机制?

如果仅路径可以通过哈希访问就足够了,那么我可以将修订号独立存储在固定长度消息中。

我是否必须保留已用路径及其哈希值的外部数据库,或者 SVN 是否提供了一种快速方法来列出我可以按需查询的所有修订版中现有的所有路径?


编辑:这实际上是同一个问题,但没有结论:SVN:路径和节点 ID 之间的转换?

Content in the svn repository is uniquely identified using two pieces of information:

  • repository path
  • revision number

I am looking for a way to recover that information from a fixed-length message (say, 8 or 16 bytes). It is not enough to identify content in the repository from our fixed-length message by just storing the revision number. The path is variable length, and cannot fit in the message.

However, I was wondering if svn path+revision pairs can be accessed by hash, like how Git does it. Is there a mechanism for this already built into svn?

It would suffice if the path alone were accessible by hash, then I could store the revision number independently in the fixed-length message.

Would I have to keep an external database of used paths and their hashes, or does SVN provide a fast way to list all paths extant across all revisions that I can query on-demand?


Edit: This is practically the same question, but is inconclusive: SVN: translation between path and node ids?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

虫児飞 2024-12-13 13:19:45

SVN不存储文件,它存储文件系统。因此,修订版用于访问文件系统的正确修订版,然后路径的一部分用于访问相关文件。

SVN 内部修订 inode,具有各自的节点 ID。然而,这种“直接到 inode”访问通常不受支持,因为 inode 缺乏通常必需的某些信息(例如文件名、所有者、组、权限等)。

另一方面,Git 存储文件,因此找到比文件名更好的文件 ID 是有意义的(文件名可能在文件的多个修订版中保持不变),因此 Git 使用文件内容的哈希值。由于是面向文件的,因此使用文件的 id(哈希值)提取文件的情况并不少见。

不幸的是,没有相当于通过哈希拉取文件系统的方法,因为哈希的输入必须基于每个版本的索引节点的内容。这意味着一种对树的内容进行哈希处理的方法,这是可能的。这样的系统将提供对索引节点的特定历史版本的快速访问。

没有这样做的主要原因可能是 SVN 中不太关心客户端对 inode 的快速访问。 SVN 服务器已经具有访问服务器端 inode 的指针和数据结构,并且它了解客户端传输的远程存储库的文件系统。这允许 SVN 将文件系统中的差异传输到客户端(而不是文件系统的完整副本)。如果不需要一致地拉取完整文件系统,则对完整文件系统拉取的快速路径访问不是优先事项。

SVN doesn't store files, it stores file systems. As such, the revision is used to access the correct revision of the file system, and then a portion of the path is used to access the file in question.

Internally SVN revisions inodes, with their own respective node ids. However, such "direct to the inode" access is typically not supported, as an inode lacks certain information that is generally necessary (like the file's name, owner, group, permissions, etc.).

Git on the other hand stores files, so it makes sense to find a better file id than the file name (which might stay the same for multiple revisions of the file), so Git uses a hash of the file's contents. Being file oriented, it's not uncommon to pull the file using its id (the hash).

Unfortunately, there's not an equivalent of pulling a file system by hash, because the hash's inputs would have to be based on the contents of the inode on a per-version of the inode basis. That would mean a way of hashing a tree's contents, which would be possible. Such a system would provide fast access to a particular historical version of a inode.

Probably the main reason it wasn't done this way is that fast client access of the inode isn't much of a concern in SVN. The SVN server already has the pointers and data structure to access the inodes on the server side, and it has knowledge of the remote repository's filesystem as transmitted by the client. This allows SVN to transmit the differences in the file systems to the client (not a full copy of the file system). Without a need to consistently pull full file systems, fast path access to a full file system pull isn't a priority.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文