什么是一个好的完整的 正则表达式 或其他一些可以采用标题的过程:
如何将标题更改为 URL 的一部分,例如 Stack Overflow?
并将其转换为
how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow
StackOverflow 上 SEO 友好的 URL 中使用的内容?
我使用的开发环境是 Ruby on Rails,但是如果还有其他一些特定于平台的解决方案(.NET、PHP、Django),我也很想看到这些。
我确信我(或其他读者)会在不同的平台上遇到同样的问题。
我正在使用自定义路由,我主要想知道如何将字符串更改为删除所有特殊字符,全部小写,并替换所有空格。
What is a good complete regular expression or some other process that would take the title:
How do you change a title to be part of the URL like Stack Overflow?
and turn it into
how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow
that is used in the SEO-friendly URLs on Stack Overflow?
The development environment I am using is Ruby on Rails, but if there are some other platform-specific solutions (.NET, PHP, Django), I would love to see those too.
I am sure I (or another reader) will come across the same problem on a different platform down the line.
I am using custom routes, and I mainly want to know how to alter the string to all special characters are removed, it's all lowercase, and all whitespace is replaced.
发布评论
评论(21)
有一个名为 PermalinkFu 的小型 Ruby on Rails 插件,可以执行此操作。 转义方法将转换为字符串适用于 URL。 看一下代码; 这个方法很简单。
要删除非 ASCII 字符,它使用 iconv 库将其转换为 'ascii//ignore/ /translit' 来自 'utf-8'。 然后空格变成破折号,所有内容都小写,等等。
There is a small Ruby on Rails plugin called PermalinkFu, that does this. The escape method does the transformation into a string that is suitable for a URL. Have a look at the code; that method is quite simple.
To remove non-ASCII characters it uses the iconv lib to translate to 'ascii//ignore//translit' from 'utf-8'. Spaces are then turned into dashes, everything is downcased, etc.
Brian 的 Ruby 代码:
downcase
将字符串转换为小写,strip
删除前导和尾随空格,第一个gsub
调用 g< /em>lobally 用破折号替换空格,第二个删除所有不是字母或破折号的内容。Brian's code, in Ruby:
downcase
turns the string to lowercase,strip
removes leading and trailing whitespace, the firstgsub
call globally substitutes spaces with dashes, and the second removes everything that isn't a letter or a dash.不不不。 你们都错了。 除了变音符号之外,你已经到达了那里,但是亚洲字符呢(Ruby 开发者没有考虑他们的 nihonjin 弟兄们)。
Firefox 和 Safari 都在 URL 中显示非 ASCII 字符,坦率地说,它们看起来很棒。 很高兴支持像“http://somewhere.com/news/read/お前たちはaホじゃないかい'。
这里有一些 PHP 代码可以做到这一点,但我只是编写了它,还没有对其进行压力测试。
示例:
输出:
コrin-and-トーマsu-and-アーノルド
“-and-”是因为 & 变成了“-and-”。
No, no, no. You are all so very wrong. Except for the diacritics-fu stuff, you're getting there, but what about Asian characters (shame on Ruby developers for not considering their nihonjin brethren).
Firefox and Safari both display non-ASCII characters in the URL, and frankly they look great. It is nice to support links like 'http://somewhere.com/news/read/お前たちはアホじゃないかい'.
So here's some PHP code that'll do it, but I just wrote it and haven't stress tested it.
Example:
Outputs:
コリン-and-トーマス-and-アーノルド
The '-and-' is because &'s get changed to '-and-'.
我将代码移植到 TypeScript 中。 它可以很容易地适应 JavaScript。
我正在向
String
原型添加一个.contains
方法,如果您的目标是最新的浏览器或 ES6,则可以使用.includes
代替。I ported the code to TypeScript. It can easily be adapted to JavaScript.
I am adding a
.contains
method to theString
prototype, if you're targeting the latest browsers or ES6 you can use.includes
instead.重写 Jeff 的代码使其更加简洁
Rewrite of Jeff's code to be more concise
我喜欢这种不使用 正则表达式 的方式,因此我将其移植到 PHP 中。 我刚刚添加了一个名为
is_ Between
的函数来检查字符:I liked the way this is done without using regular expressions, so I ported it to PHP. I just added a function called
is_between
to check characters:stackoverflow 解决方案非常棒,但现代浏览器(不包括 IE,像往常一样)现在可以很好地处理 utf8 编码:
所以我升级了建议的解决方案:
Pastebin 上的完整代码
编辑:这是
RemapInternationalCharToAscii
方法的代码(pastebin 中缺少该方法)。The stackoverflow solution is great, but modern browser (excluding IE, as usual) now handle nicely utf8 encoding:
So I upgraded the proposed solution:
Full Code on Pastebin
Edit: Here's the code for
RemapInternationalCharToAscii
method (that's missing in the pastebin).这是我的 Jeff 代码版本(速度较慢,但写起来很有趣):
我的测试字符串:
" 我喜欢 C#、F#、C++ 和...焦糖布丁!!!他们看到我编码...他们讨厌'...试图抓住我肮脏的编码...“
Here's my (slower, but fun to write) version of Jeff's code:
My test string:
" I love C#, F#, C++, and... Crème brûlée!!! They see me codin'... they hatin'... tryin' to catch me codin' dirty... "
您可以使用以下辅助方法。 它可以转换 Unicode 字符。
You can use the following helper method. It can convert the Unicode characters.
现在所有浏览器都能很好地处理 utf8 编码,因此您可以使用 WebUtility.UrlEncode 方法,类似于 HttpUtility.UrlEncode 由 @giamin 使用,但它在 Web 应用程序之外工作。
Now all Browser handle nicely utf8 encoding, so you can use WebUtility.UrlEncode Method , its like HttpUtility.UrlEncode used by @giamin but its work outside of a web application.
这是我的杰夫代码版本。 我进行了以下更改:
大小写转换现在也是可选的。
有关更多详细信息、单元测试以及 Facebook 的 URL 方案比 Stack Overflows 更聪明,我有一个 我的博客上的扩展版本 。
Here is my version of Jeff's code. I've made the following changes:
The case conversion is now also optional.
For more details, the unit tests, and an explanation of why Facebook's URL scheme is a little smarter than Stack Overflows, I've got an expanded version of this on my blog.
T-SQL 实现,改编自 dbo.UrlEncode:
T-SQL implementation, adapted from dbo.UrlEncode:
我对 Ruby 或 Rails 不太了解,但在 Perl 中,这就是我会做的:
我只是做了一个快速测试,它似乎有效。 希望这相对容易翻译成 Ruby。
I don't much about Ruby or Rails, but in Perl, this is what I would do:
I just did a quick test and it seems to work. Hopefully this is relatively easy to translate to Ruby.
我不熟悉 Ruby on Rails,但以下是(未经测试的)PHP 代码。 如果您发现它有用,您可以很快地将其转换为 Ruby on Rails。
我希望这有帮助。
I am not familiar with Ruby on Rails, but the following is (untested) PHP code. You can probably translate this very quickly to Ruby on Rails if you find it useful.
I hope this helps.
我们是这样做的。 请注意,边缘条件可能比您第一眼意识到的要多。
这是第二个版本,性能提高了 5 倍(是的,我对它进行了基准测试)。 我想我应该优化它,因为这个函数每页可以调用数百次。
要查看此替换的代码的先前版本(但在功能上等效,并且速度快 5 倍),请查看本文的修订历史记录(单击日期链接)。
此外,此处还可以找到
RemapInternationalCharToAscii
方法源代码。Here's how we do it. Note that there are probably more edge conditions than you realize at first glance.
This is the second version, unrolled for 5x more performance (and yes, I benchmarked it). I figured I'd optimize it because this function can be called hundreds of times per page.
To see the previous version of the code this replaced (but is functionally equivalent to, and 5x faster), view revision history of this post (click the date link).
Also, the
RemapInternationalCharToAscii
method source code can be found here.我知道这是一个非常老的问题,但由于现在大多数浏览器支持unicode url,我在XRegex中找到了一个很好的解决方案,可以将除字母之外的所有内容(在所有语言中都转换为“-”) )。
这可以用多种编程语言来完成。
模式是
\\p{^L}+
,然后您只需使用它来将所有非字母替换为“-”。Node.js 中使用 xregex 模块的工作示例。
I know it's very old question but since most of the browsers now support unicode urls I found a great solution in XRegex that converts everything except letters (in all languages to '-').
That can be done in several programming languages.
The pattern is
\\p{^L}+
and then you just need to use it to replace all non letters to '-'.Working example in node.js with xregex module.
您需要设置一个自定义路由,将 URL 指向将处理它的控制器。 由于您使用的是 Ruby on Rails,因此这里是使用其路由引擎的简介。
在 Ruby 中,您将需要一个您已经知道的正则表达式,下面是要使用的正则表达式:
You will want to setup a custom route to point the URL to the controller that will handle it. Since you are using Ruby on Rails, here is an introduction in using their routing engine.
In Ruby, you will need a regular expression like you already know and here is the regular expression to use:
您还可以使用此 JavaScript 函数以形式生成 slug(该函数基于/复制自 Django):
You can also use this JavaScript function for in-form generation of the slug's (this one is based on/copied from Django):
为了更好地衡量,这里是 WordPress 中的 PHP 函数,它可以做到这一点...我认为 WordPress 是使用花哨链接的更流行的平台之一。
该函数以及一些支持函数可以在 wp-includes/formatting.php 中找到。
For good measure, here's the PHP function in WordPress that does it... I'd think that WordPress is one of the more popular platforms that uses fancy links.
This function as well as some of the supporting functions can be found in wp-includes/formatting.php.
如果您使用 Rails Edge,则可以依赖 Inflector.parametrize - 这是文档中的示例:
此外,如果您需要处理更多外来字符,例如以前版本的 Rails 中的重音符号 (éphémère),您可以混合使用 PermalinkFu 和 变音符号:
If you are using Rails edge, you can rely on Inflector.parametrize - here's the example from the documentation:
Also if you need to handle more exotic characters such as accents (éphémère) in previous version of Rails, you can use a mixture of PermalinkFu and DiacriticsFu:
假设您的模型类具有 title 属性,您可以简单地重写模型中的 to_param 方法,如下所示:
这一集 Railscast 包含所有详细信息。 您还可以使用以下命令确保标题仅包含有效字符:
Assuming that your model class has a title attribute, you can simply override the to_param method within the model, like this:
This Railscast episode has all the details. You can also ensure that the title only contains valid characters using this: