URL 缩短:使用 inode 作为短名称?

发布于 2024-08-03 05:26:02 字数 898 浏览 7 评论 0原文

我正在开发的网站想要生成自己的缩短 URL,而不是依赖像tinyurl 或 bit.ly 这样的第三方。

显然,我可以在新 URL 添加到站点时对其进行运行计数,并使用它来生成短 URL。但如果可能的话,我会尽量避免这种情况,因为只是为了让这一件事发挥作用似乎需要做很多工作。

由于需要短 URL 的东西都是网络服务器上的真实物理文件,我当前的解决方案是使用它们的索引节点号,因为这些索引节点号已经为我生成可供使用并保证是唯一的。

function short_name($file) {
   $ino = @fileinode($file);
   $s = base_convert($ino, 10, 36);
   return $s;
}

这似乎有效。问题是,怎样才能让短网址变得更短呢?

在使用此功能的系统上,新添加的文件的 inode 所在的范围使上述函数返回 7 个字符长的字符串。

我可以安全地丢弃索引节点的一些(一半?)位吗?如果是的话,应该是高位还是低位?

我想过使用文件名的 crc32,但这实际上使我的短名称比使用 inode 更长。

这样的东西会有碰撞的危险吗?通过选择“$referencefile”的正确值,我已经能够减少到个位数。

function short_name($file) {
   $ino = @fileinode($file);
   // arbitrarily selected pre-existing file,
   // as all newer files will have higher inodes
   $ino = $ino - @fileinode($referencefile);
   $s = base_convert($ino, 10, 36);
   return $s;
}

The site I am working on wants to generate its own shortened URLs rather than rely on a third party like tinyurl or bit.ly.

Obviously I could keep a running count new URLs as they are added to the site and use that to generate the short URLs. But I am trying to avoid that if possible since it seems like a lot of work just to make this one thing work.

As the things that need short URLs are all real physical files on the webserver my current solution is to use their inode numbers as those are already generated for me ready to use and guaranteed to be unique.

function short_name($file) {
   $ino = @fileinode($file);
   $s = base_convert($ino, 10, 36);
   return $s;
}

This seems to work. Question is, what can I do to make the short URL even shorter?

On the system where this is being used, the inodes for newly added files are in a range that makes the function above return a string 7 characters long.

Can I safely throw away some (half?) of the bits of the inode? And if so, should it be the high bits or the low bits?

I thought of using the crc32 of the filename, but that actually makes my short names longer than using the inode.

Would something like this have any risk of collisions? I've been able to get down to single digits by picking the right value of "$referencefile".

function short_name($file) {
   $ino = @fileinode($file);
   // arbitrarily selected pre-existing file,
   // as all newer files will have higher inodes
   $ino = $ino - @fileinode($referencefile);
   $s = base_convert($ino, 10, 36);
   return $s;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

听你说爱我 2024-08-10 05:26:02

不确定这是一个好主意:如果您必须更改服务器,或更改磁盘/重新格式化它,您的文件的 inode 编号很可能会发生变化...并且您所有的短 URL 将被破坏/丢失!

顺便说一句,如果出于任何原因您需要将文件移动到磁盘的另一个分区,情况也是如此。

另一个想法可能是计算一些 crc/md5/文件名的任何内容,就像您建议的那样,并使用某种算法来“缩短”它。

这里有几篇相关文章:

Not sure this is a good idea : if you have to change server, or change disk / reformat it, the inodes numbers of your files will most probably change... And all your short URL will be broken / lost !

Same thing if, for any reason, you need to move your files to another partition of your disk, btw.

Another idea might be to calculate some crc/md5/whatever of the file's name, like you suggested, and use some algorithm to "shorten" it.

Here are a couple articles about that :

听不够的曲调 2024-08-10 05:26:02

相当巧妙地使用了文件系统。如果您保证 inode id 是唯一的,那么它是生成唯一编号的快速方法。我想知道这是否可以通过 NFS 一致地工作,因为显然不同的机器会有不同的 inode 编号。然后,您只需序列化您在那里创建的文件中的链接信息即可。

要稍微缩短网址,您可以考虑区分大小写,并执行其中一种安全编码(您将得到大约 base62 的值 - 10 [0-9] + 26 (az) + 26 (AZ),如果您删除一些“冲突”字母,例如 Il1 ,则可以减少或更少...有很多示例/库那里)。

正如您所说,您还需要使用偏移量“放置”您的 ids。您还需要弄清楚如何防止临时文件/日志文件等创建占用您的密钥空间。

Rather clever use of the filesystem there. If you are guaranteed that inode ids are unique its a quick way of generating the unique numbers. I wonder if this could work consistently over NFS, because obviously different machines will have different inode numbers. You'd then just serialize the link info in the file you create there.

To shorten the urls a bit, you might take case sensitivity into account, and do one of the safe encodings (you'll get about base62 out of it - 10 [0-9] + 26 (a-z) + 26 (A-Z), or less if you remove some of the 'conflict' letters like I vs l vs 1... there are plenty of examples/libraries out there).

You'll also want to 'home' your ids with an offset, like you said. You will also need to figure out how to keep temp file/log file, etc creation from eating up your keyspace.

祁梦 2024-08-10 05:26:02

查看 Sean Inman 的 Lessn;尚未使用它,但它是一个自托管的滚动您自己的 URL 解决方案。

Check out Lessn by Sean Inman; Haven't played with it yet, but it's a self-hosted roll your own URL solution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文