数据中的换行和回车:0D 0A
我正在编写一个数据清理脚本(MS 智能引号等),它将在以 Latin1 编码的 mySQL 表上运行。在扫描数据时,我注意到换行符处有大量 0D 0A。
既然我正在清理数据,我是否也应该通过删除它们来解决所有 0D 问题?是否有充分的理由继续保留 0D(回车符)?
谢谢!
I am writing a data clean up script (MS Smart Quotes, etc.) that will operate on mySQL tables encoded in Latin1. While scanning the data I noticed a ton of 0D 0A where the line breaks are.
Since I am cleaning the data, should I also address all of the 0D, too, by removing them? Is there ever a good reason to keep 0D (carriage return) anymore?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
0D0A(\r\n)和0A(\n)是行终止符; \r\n 主要用于Windows 操作系统,\n 用于unix 系统。
有充分的理由继续保持 0D 吗?
我认为你应该自己回答这个问题。
您可以从数据中删除“\r”,但请确保将使用此数据的程序能够很好地理解“\n”意味着行结束。在大多数情况下,都会考虑到这一点,但为了以防万一,请检查一下。
0D0A (\r\n), and 0A (\n) are line terminators; \r\n is mostly used in OS Windows, \n in unix systems.
Is there ever a good reason to keep 0D anymore?
I think you should answer this question yourself.
You could remove '\r' from the data, but make sure that the programs that will use this data understand that '\n' means the end of line very well. In most cases it is taken into account, but check just in case.
CR/LF 组合是 Windows 的东西。 *NIX 操作系统仅使用 LF。因此,根据使用您的数据的应用程序,您需要决定是否想要/需要过滤掉 CR。有关详细信息,请参阅关于换行符的维基百科条目。
The CR/LF combination is a Windows thing. *NIX operating systems just use LF. So based on the application that uses your data, you'll need to make the decision on whether you want/need to filter out CR's. See the Wikipedia entry on newline for more info.
Python 的 readline() 返回一行后跟 \O12。 \O 表示八进制。 12 是十进制 10 的八进制。您可以在 ASCII 表中看到 Dec 10 是 NL 或 LF。换行或换行。
unix 文本或脚本文件中行尾的标准。
http://www.asciitable.com/
因此请注意,len() 将包含 NL,除非您尝试读取超过 EOF 的 len() 永远不会为零。
因此,如果您将 Python readline() 获得的任何文本行插入到 mysql 表中,默认情况下它将在末尾包含 NL 字符。
Python's readline() returns a line followed with a \O12. \O means Octal. 12 is octal for decimal 10. You can see on the ASCII table that Dec 10 is NL or LF. Newline or line feed.
Standard for end-of-line in a unix text or script file.
http://www.asciitable.com/
So be aware that the len() will include the NL unless you try to read past the EOF the len() will never be zero.
Therefore if you INSERT any line of text obtained by the Python readline() into a mysql table it will include the NL character by default, at the end.