当前位置：文江博客话题详情

Rails 截断包含 é 的 UTF-8 字符串（例如）

发布于 2025-01-03 21:59:45 字数 979 浏览 1 评论 0 原文

我正在开发一个 rails 3.1 应用程序，使用 ruby 1.9.3 和 mongoid 作为我的 ORM。我面临着一个恼人的问题。我想截断帖子的内容，如下所示：

<%= raw truncate(strip_tags(post.content), :length => 200) %>

我正在使用 raw 和 strip_tags 因为我的 post.content 实际上是用富文本编辑器。

我对非 ASCII 字符有一个严重的问题。想象一下我的帖子内容如下：

éééé éééé éééé éééé éééé éééé éééé éééé

我在上面以一种天真的方式所做的事情是这样的：

éééé éééé éééé éééé éééé &eac...

看起来截断是看到字符串的每个单词，如 é&eactute;éé 。

有没有办法：

让截断处理实际的 UTF-8 字符串，其中“é”代表单个字符？这将是我最喜欢的方法。
修改上面的指令，使结果更好，比如强制rails在2个单词之间截断，

我问这个问题是因为到目前为止我还没有找到任何解决方案。这是我的应用程序中唯一遇到此类字符问题的地方，这是一个主要问题，因为网站的全部内容都是法语，因此包含大量 é、ç、à、ù.

另外，我认为这种行为对于 truncate 帮助程序来说非常不幸，因为在我的例子中，它根本不会截断 200 个字符，而是大约 25 个字符！

原文

I am working on a rails 3.1 app with ruby 1.9.3 and mongoid as my ORM. I am facing an annoying issue. I would like to truncate the content of a post like this:

<%= raw truncate(strip_tags(post.content), :length => 200) %>

I am using raw and strip_tags because my post.content is actually handled with a rich text editor.

I have a serious issue with non ASCII characters. Imagine my post content is the following:

éééé éééé éééé éééé éééé éééé éééé éééé

What I am doing above in a naive way does this:

éééé éééé éééé éééé éééé &eac...

Looks like truncate is seeing every word of the string like é&eactute;éé.

Is there a way to either:

Have truncate handle an actual UTF-8 strings, where 'é' stands for a single character ? That would be my favorite approach.
Hack the above instruction such that the result is better, like force rails to truncate between 2 words,

I am asking this question because I have not found any solution so far. This is the only place in my app where I have problems with such character, and it is a major issues since the whole content of the website is in french, so contains a lot of é, ç, à, ù.

Also, I think this behavior is quite unfortunate for the truncate helper because in my case it does not truncate 200 characters at all, but approximately 25 characters !

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雄赳赳气昂昂 2025-01-10 21:59:45

可能为时已晚，无法帮助您解决问题，但是...
您可以使用 ActiveSupport::Multibyte::Chars 限制方法，如下所示：

post.content.mb_chars.limit(200).to_s

请参阅 http://api.rubyonrails.org/v3.1.1/classes/ActiveSupport/Multibyte/Chars.html#method-i-limit

我遇到了一个非常类似的问题（截断字符串不同的语言）这对我的情况有用。这是在确保所有位置的编码设置为 UTF-8 之后进行的：rails 配置、数据库配置和/或数据库表定义以及任何 html 模板。

Probably too late to help with your issue, but...
You can use the ActiveSupport::Multibyte::Chars limit method, like so:

post.content.mb_chars.limit(200).to_s

see http://api.rubyonrails.org/v3.1.1/classes/ActiveSupport/Multibyte/Chars.html#method-i-limit

I was having a very similar issue (truncating strings in different languages) and this worked for my case. This is after making sure the encoding is set to UTF-8 everywhere: rails config, database config and/or database table definitions, and any html templates.

回复收藏 0 原文

吾家有女初长成 2025-01-10 21:59:45

如果您的字符串是 HTML，那么我建议您查看 truncate_html gem。我还没有将它与这样的字符一起使用，但它应该知道在哪里可以安全地截断字符串。

回复收藏 0 原文

短叹 2025-01-10 21:59:45

有一个简单的方法，但不是一个很好的解决方案。首先你必须确保你保存的内容是UTF-8。这可能没有必要。

content = "éééé"
post.content = content.force_encoding('utf-8') unless content.encoding.to_s = "UTF-8"

然后当你读它时你可以读强制它回来

<%= raw truncate(strip_tags(post.content.force_encoding('utf-8')), :length => 200) %>

There is a simple way, but not a nice solution. First you have to make sure the content you save is UTF-8. This might not necessary.

content = "éééé"
post.content = content.force_encoding('utf-8') unless content.encoding.to_s = "UTF-8"

Then when you read it you can read force it back

<%= raw truncate(strip_tags(post.content.force_encoding('utf-8')), :length => 200) %>

回复收藏 0 原文

|煩躁 2025-01-10 21:59:45

我编写了 strings 来帮助截断、对齐、换行多字节文本，支持无空格语言（日语）、中文等……）

Strings.truncate('ラドクリフ、マラソン五輪代表に1万m出場にも含み', 12)
# => "ラドクリフ…"

I've written strings to help truncate, align, wrap multibyte text with support for no whitespace languages(Japanese, Chinese etc…)

Strings.truncate('ラドクリフ、マラソン五輪代表に1万m出場にも含み', 12)
# => "ラドクリフ…"

回复收藏 0 原文

~没有更多了~

关于作者

生死何惧

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

Rails 截断包含 é 的 UTF-8 字符串（例如）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

微信用户

小情绪

追我者格杀勿论

ゞ记忆︶ㄣ

笨死的猪

彭明超

友情链接

Rails 截断包含 &eacute; 的 UTF-8 字符串（例如）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

微信用户

小情绪

追我者格杀勿论

ゞ记忆︶ㄣ

笨死的猪

彭明超

友情链接

Rails 截断包含 é 的 UTF-8 字符串（例如）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。