我应该如何清理网址,以免人们将 漢字或 á或者其中有其他东西?

发布于 2024-08-18 19:02:59 字数 162 浏览 8 评论 0原文

我应该如何清理网址,以免人们在其中放入汉字或其他内容?

编辑:我正在使用java。该 URL 将根据用户在表单上提出的问题生成。 StackOverflow 似乎只是删除了有问题的字符,但它也将 á 变成了 a。

是否有执行此操作的标准约定?或者每个开发人员都只编写自己的版本?

How should I sanitize urls so people don't put 漢字 or other things in them?

EDIT: I'm using java. The url will be generated from a question the user asks on a form. It seems StackOverflow just removed the offending characters, but it also turns an á into an a.

Is there a standard convention for doing this? Or does each developer just write their own version?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

风追烟花雨 2024-08-25 19:03:00

您描述的过程是slugify。没有固定的机制可以做到这一点;每个框架都以自己的方式处理它。

The process you're describing is slugify. There's no fixed mechanism for doing it; every framework handles it in their own way.

夜灵血窟げ 2024-08-25 19:03:00

是的,我会清理/删除。它要么不一致,要么编码看起来很难看

使用 Java 请参阅 URLEncoder API 文档

小心!如果您要删除诸如奇数字符之类的元素,那么两个不同的输入可能会在无意时产生相同的剥离 URL。

URL 规范(RFC 1738,94 年 12 月)提出了一个问题,因为它将 URL 中允许的字符的使用限制为 US-ASCII 字符集的有限子集

这意味着它将被编码。 URL 应该是可读的。标准往往带有英语偏见(那是什么?Langist?Languagist?)。

不确定其他国家/地区的惯例是什么,但如果我看到发送给我的 URL 中存在大量编码,我会认为这是愚蠢或可疑的......

除非链接正确显示,由浏览器编码并在另一端解码...但是你想冒这个风险吗?

StackOverflow 似乎只是从 URL 中一起删除这些字符:)

StackOverflow 有能力删除
字符,因为它包括
URL 中的问题 ID。蛞蝓
包含问题标题的内容是
方便,实际没有使用
通过该网站,据我所知。例如,你
可以删除slug,链接将
仍然工作正常:问题 ID 是
重要的是一个简单的机制
使链接唯一,即使是两个
不同的问题标题会生成
同一个蛞蝓。其实你可以验证一下
通过尝试去
stackoverflow.com/questions/2106942/...
它只会带你回到这个
页。

谢谢迈克·斯普罗斯

Yes, I would sanitize/remove. It will either be inconsistent or look ugly encoded

Using Java see URLEncoder API docs

Be careful! If you are removing elements such as odd chars, then two distinct inputs could yield the same stripped URL when they don't mean to.

The specification for URLs (RFC 1738, Dec. '94) poses a problem, in that it limits the use of allowed characters in URLs to only a limited subset of the US-ASCII character set

This means it will get encoded. URLs should be readable. Standards tend to be English biased (what's that? Langist? Languagist?).

Not sure what convention is other countries, but if I saw tons of encoding in a URL send to me, I would think it was stupid or suspicious ...

Unless the link is displayed properly, encoded by the browser and decoded at the other end ... but do you want to take that risk?

StackOverflow seems to just remove those chars from the URL all together :)

StackOverflow can afford to remove the
characters because it includes the
question ID in the URL. The slug
containing the question title is for
convenience, and isn't actually used
by the site, AFAIK. For example, you
can remove the slug and the link will
still work fine: the question ID is
what matters and is a simple mechanism
for making links unique, even if two
different question titles generate the
same slug. Actually, you can verify
this by trying to go to
stackoverflow.com/questions/2106942/
and it will just take you back to this
page.

Thanks Mike Spross

一场信仰旅途 2024-08-25 19:03:00

你说的是哪种语言?
在 PHP 中,我认为这是最简单的,并且可以处理所有事情:

https:/ /www.php.net/manual/en/function.urlencode.php

Which language you are talking about?
In PHP I think this is the easiest and would take care of everything:

https://www.php.net/manual/en/function.urlencode.php

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文