Ruby on Rails - 在 rake 任务中将奇怪的字符从身体中剥离
在我公司的 Rails 网站上,我们有一个 Twitter 区域,其中来自我们社交媒体团队的推文通过 rake 任务显示。基本上,rake 任务使用 Twitter gem 定期将任何新推文导入数据库,并从那里显示它们。使用 auto_link 帮助器将推文中的 URL 链接转换为 HTML 链接。
一直工作得很好,直到现在。突然间,链接被破坏,甚至错误地突出显示了 URL 链接之前的单词。因此,在示例推文中,应如下所示:“圣路易斯请注意安全。高温警告延长至八月http://bit. ly/...”,单词 August 被链接,并且后面的 URL 本身被破坏,就好像最后一个单词和链接之间有什么东西破坏了它...
调查了助手,查看了数据库对于推文的文本字段,看看是否有任何奇怪的地方,甚至使用了 Rails 控制台手动提取推文,但一切看起来都很好。直到我一路进入推文正文的十六进制代码,我才看到......
Please be safe S
t. Louis. Heat w
arning extended
through August.
 http://bit.ly/
r5fXlz #heatpoca
lypse
所以罪魁祸首是 ؆ 被扔进空间,当我删除罪魁祸首空间并在数据库中手动读取它时,问题解决了。
唯一的问题是,我不明白为什么要这样导入推文正文,尤其是当它通过 Rails 控制台看起来不错时。由于这是一个较旧的数据库,我注意到它在某些区域仍然使用 latin1 编码,在其他区域使用 utf8,并且我确信将所有这些都转换为 UTF-8 可以修复它,但事实并非如此。
在被进口之前,我甚至尝试在尸体上使用卫生助手,但这也不起作用。
还尝试了 ruby gsub 来去除 ؆ ,但它不起作用。
有谁知道如何解决这个奇怪的问题?
On my company's Rails website, we have a Twitter area where tweets from our social media team are displayed by a rake task. Basically the rake task uses the Twitter gem to import any new tweets into the database on a regular basis, and displays them from there. URL links in the tweet are converted to HTML links using the auto_link helper.
Always works fine, until now. All of the sudden, the links are broken and even wrongly highlighting the word right before the URL link. So in an example tweet that should look like this: "Please be safe St. Louis. Heat warning extended through August http://bit.ly/...", the word August is linked and the URL itself that follows is broken, as if there was something in between the last word and link breaking it...
Investigated the helpers, looked in the database for the tweet's text field to see if there was anything strange, even used the rails console to manually pull up the tweets, but everything looked okay. It wasn't until I went all the way into the tweet body's hex code that I saw...
Please be safe S
t. Louis. Heat w
arning extended
through August.
 http://bit.ly/
r5fXlz #heatpoca
lypse
So the culprit was that   being thrown into the space, when I deleted the culprit space and readded it manually in the database, the issue cleared up.
The only problem is, I don't understand why the tweet body is being imported like that, especially when it looks fine via the Rails console. As this is an older database, I noticed it was still using latin1 encoding in some areas with utf8 in others, and I was certain that converting all of that to UTF-8 would fix it, but it did not.
I went as far as tried to use a sanitation helper on the body before being imported, but that didn't work either.
Also tried a ruby gsub to strip the   out, but it didn't work.
Does anyone have any insight on how to solve this odd problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我终于能够通过专门在 rake 任务中的正文字符串上运行以下命令来解决这个问题......
奇怪,但它有效。有关使用上述内容的更多信息可以在这里找到:ruby (1.8.7):如何在抓取时删除不可打印的字符?
I was finally able to solve this by running the following specifically on the body string in the rake task...
Odd, but it works. More information on using the above can be found here: ruby (1.8.7): How to get rid of non-printable chars while scraping?