我正在开发一个 rails 3.1 应用程序,使用 ruby 1.9.3 和 mongoid 作为我的 ORM。我面临着一个恼人的问题。我想截断帖子的内容,如下所示:
<%= raw truncate(strip_tags(post.content), :length => 200) %>
我正在使用 raw
和 strip_tags
因为我的 post.content
实际上是用富文本编辑器。
我对非 ASCII 字符有一个严重的问题。想象一下我的帖子内容如下:
éééé éééé éééé éééé éééé éééé éééé éééé
我在上面以一种天真的方式所做的事情是这样的:
éééé éééé éééé éééé éééé &eac...
看起来截断是看到字符串的每个单词,如 é&eactute;éé
。
有没有办法:
- 让截断处理实际的 UTF-8 字符串,其中“é”代表单个字符?这将是我最喜欢的方法。
- 修改上面的指令,使结果更好,比如强制rails在2个单词之间截断,
我问这个问题是因为到目前为止我还没有找到任何解决方案。这是我的应用程序中唯一遇到此类字符问题的地方,这是一个主要问题,因为网站的全部内容都是法语,因此包含大量 é、ç、à、ù.
另外,我认为这种行为对于 truncate
帮助程序来说非常不幸,因为在我的例子中,它根本不会截断 200 个字符,而是大约 25 个字符!
I am working on a rails 3.1 app with ruby 1.9.3 and mongoid as my ORM. I am facing an annoying issue. I would like to truncate the content of a post like this:
<%= raw truncate(strip_tags(post.content), :length => 200) %>
I am using raw
and strip_tags
because my post.content
is actually handled with a rich text editor.
I have a serious issue with non ASCII characters. Imagine my post content is the following:
éééé éééé éééé éééé éééé éééé éééé éééé
What I am doing above in a naive way does this:
éééé éééé éééé éééé éééé &eac...
Looks like truncate is seeing every word of the string like é&eactute;éé
.
Is there a way to either:
- Have truncate handle an actual UTF-8 strings, where 'é' stands for a single character ? That would be my favorite approach.
- Hack the above instruction such that the result is better, like force rails to truncate between 2 words,
I am asking this question because I have not found any solution so far. This is the only place in my app where I have problems with such character, and it is a major issues since the whole content of the website is in french, so contains a lot of é, ç, à, ù
.
Also, I think this behavior is quite unfortunate for the truncate
helper because in my case it does not truncate 200 characters at all, but approximately 25 characters !
发布评论
评论(4)
可能为时已晚,无法帮助您解决问题,但是...
您可以使用 ActiveSupport::Multibyte::Chars 限制方法,如下所示:
请参阅 http://api.rubyonrails.org/v3.1.1/classes/ActiveSupport/Multibyte/Chars.html#method-i-limit
我遇到了一个非常类似的问题(截断字符串不同的语言)这对我的情况有用。这是在确保所有位置的编码设置为 UTF-8 之后进行的:rails 配置、数据库配置和/或数据库表定义以及任何 html 模板。
Probably too late to help with your issue, but...
You can use the ActiveSupport::Multibyte::Chars limit method, like so:
see http://api.rubyonrails.org/v3.1.1/classes/ActiveSupport/Multibyte/Chars.html#method-i-limit
I was having a very similar issue (truncating strings in different languages) and this worked for my case. This is after making sure the encoding is set to UTF-8 everywhere: rails config, database config and/or database table definitions, and any html templates.
如果您的字符串是 HTML,那么我建议您查看 truncate_html gem。我还没有将它与这样的字符一起使用,但它应该知道在哪里可以安全地截断字符串。
If your string is HTML then I would suggest you check out the truncate_html gem. I've not used it with characters like this but it should be aware of where it can safely truncate the string.
有一个简单的方法,但不是一个很好的解决方案。首先你必须确保你保存的内容是UTF-8。这可能没有必要。
然后当你读它时你可以读强制它回来
There is a simple way, but not a nice solution. First you have to make sure the content you save is UTF-8. This might not necessary.
Then when you read it you can read force it back
我编写了 strings 来帮助截断、对齐、换行多字节文本,支持无空格语言(日语) 、中文等……)
I've written strings to help truncate, align, wrap multibyte text with support for no whitespace languages(Japanese, Chinese etc…)