JDBC 和 MySQL 的编码问题
我从 RSS 频道获取数据,对其进行清理并保存在数据库中。我使用 java、tidy、MySQL 和 JDBC。
步骤:
- 我抓取 RSS 记录。没关系。
- 我用 tidy 来清理 html。 这是一种转变。 Tidy 会自动将“So it’s likely”之类的字符串转换为“So it's likely”。
- 我将此字符串保存到表中
MySQL 方案是
CREATE TABLE IF NOT EXISTS `rss_item_safe_texts` (
`id` int(10) unsigned NOT NULL,
`title` varchar(1000) NOT NULL,
`link` varchar(255) NOT NULL,
`description` mediumtext NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
JDBC 连接 URL 是
connUrl = "jdbc:mysql://" + host + "/" + database + "?user=" + username + "&password=" + password + "&useUnicode=true&characterEncoding=UTF-8";
Java 代码是
PreparedStatement updateSafeTextSt = conn.prepareStatement("UPDATE `rss_item_safe_texts` SET `title` = ?, `link` = ?, `description` = ? WHERE `id` = ?");
updateSafeTextSt.setString(1, EscapingUtils.escapeXssInjection(title));
updateSafeTextSt.setString(2, link);
updateSafeTextSt.setString(3, EscapingUtils.escapeXssInjection(description));
updateSafeTextSt.setInt(4, itemId);
updateSafeTextSt.execute();
updateSafeTextSt.close();
结果我在数据库中看到损坏的字符,例如“所以它'?不太可能”。然后我在网页(utf-8 页面)上看到输出文本。
I'm grabbing data from RSS-channels, sanitize it and save in the database. I use java, tidy, MySQL and JDBC.
Steps:
- I grab RSS-records. It's OK.
- I sanitize html with tidy.
Here is one transformation. Tidy automatically converts strings like "So it’s unlikely" to "So it’s unlikely". - I save this string to the table
MySQL scheme is
CREATE TABLE IF NOT EXISTS `rss_item_safe_texts` (
`id` int(10) unsigned NOT NULL,
`title` varchar(1000) NOT NULL,
`link` varchar(255) NOT NULL,
`description` mediumtext NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
JDBC connection URL is
connUrl = "jdbc:mysql://" + host + "/" + database + "?user=" + username + "&password=" + password + "&useUnicode=true&characterEncoding=UTF-8";
Java code is
PreparedStatement updateSafeTextSt = conn.prepareStatement("UPDATE `rss_item_safe_texts` SET `title` = ?, `link` = ?, `description` = ? WHERE `id` = ?");
updateSafeTextSt.setString(1, EscapingUtils.escapeXssInjection(title));
updateSafeTextSt.setString(2, link);
updateSafeTextSt.setString(3, EscapingUtils.escapeXssInjection(description));
updateSafeTextSt.setInt(4, itemId);
updateSafeTextSt.execute();
updateSafeTextSt.close();
As a result I see broken characters in the database like "So it'? unlikely". The same I see then output text on the web-page (utf-8 page).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不要忘记还有很多其他地方可以进行不同的编码设置。例如,检查您的数据库/表/列是否具有正确的编码。另外,我通常在 MySQL 中将所有能设置的都设置为 utf8:
Don't forget there are lots of other places where encoding can be set differently. Check, for example, if your database/table/column has correct encodings to begin with. Also, I usually set everything I can to utf8 in MySQL: