来自传递给 raw_input() 的文件内容副本的字符串中的 CR 奇怪地消失

发布于 2024-10-18 02:32:07 字数 3019 浏览 3 评论 0原文

试图弄清楚似乎是错误的原因,我最终遇到了 Python 2.7 中 raw_input() 函数的奇怪行为:

它删除了 CR 字符仅来自手动复制(通过剪贴板)文件内容的字符串的 CR LF 对。传递给 raw_input() 的字符串是与之前的字符串相同的字符串显示的副本,不会丢失其 CR 字符。 在所有情况下,单独的 CR 字符均保持不变。 CR(回车符)是一个\r 字符。

为了比混乱的描述更清楚,这里有一段代码描述了观察事实必须做什么,其命令只需要执行即可。

重点在于 Text 对象:它有 7 个字符,而不是传递给 raw_input() 来创建 Text 的 8 个字符。

为了验证传递给 raw_input() 的参数是否确实有 8 个字符,我使用相同的参数创建了另一个文件 PASTED.txt。确定这个问题中的某些内容确实是一项尴尬的任务,因为在 Notepad++ 窗口中进行复制向我展示了:各种行尾(\r、\n、\r\n)在此类窗口中的行的末端显示为 CR LF

建议使用 Ctrl-A 选择文件的全部数据。

我很困惑,想知道我是否犯了编码或理解错误,或者这是否是 Python 的真正功能。

希望大家多多评论和指点。

with open('PRIM.txt','wb') as f:
    f.write('A\rB\nC\r\nD')
print "  1) A file with name 'PRIM.txt' has just been created with content A\\rB\\nC\\r\\nD"
raw_input("  Open this file and copy manually its CONTENT in the clipboard.\n"+\
          "    --when done, press Enter to continue-- ")


print "\n  2) Paste this CONTENT in a Notepad++ window "+\
      "     and see the symbols at the extremities of the lines."
raw_input("    --when done, press Enter to continue-- ")


Text = raw_input("\n  3) Paste this CONTENT here and press a key : ")
print ("     An object Text has just been created with this pasted value of CONTENT.")


with open('PASTED.txt','wb') as f:
    f.write('')
print "\n  4) An empty file 'PASTED.txt' has just been created."
print "     Paste manually in this file the PRIM's CONTENT and shut this file."
raw_input("     --when done, press Enter to continue-- ")


print "\n  5) Enter the copy of this display of A\\rB\\nC\\r\\nD : \nA\rB\nC\r\nD"
DSP = raw_input('please, enter it on the following line :\n')
print "    An object DSP has just been created with this pasted value of this copied display"


print '\n----------'
with open('PRIM.txt','rb') as fv:
    verif = fv.read()
print "The read content of the file 'PRIM.txt' obtained by open() and read() : "+repr(verif)
print "len of the read content of the file 'PRIM.txt'  ==",len(verif)


print '\n----------'
print "The file PASTED.txt received by pasting the manually copied CONTENT of PRIM.txt"
with open('PASTED.txt','rb') as f:
    cpd = f.read()
    print "The read content of the file 'PASTED.txt' obtained by open() and read() "+\
          "is now : "+repr(cpd)
    print "its len is==",len(cpd)


print '\n----------'
print 'The object Text received through raw_input() the manually copied CONTENT of PRIM.txt'
print "value of Text=="+repr(Text)+\
      "\nText.split('\\r\\n')==",Text.split('\r\n')
print 'len of Text==',len(Text)


print '\n----------'
print "The object DSP received  through raw_input() the copy of the display of A\\rB\\nC\\r\\nD" 
print "value of DSP==",repr(DSP)
print 'len of DSP==',len(DSP)

我的操作系统是Windows。我想知道在其他操作系统上是否也观察到同样的情况。

Trying to clear up the reasons of what seemed to be a bug, I finally bumped into a weird behaviour of the raw_input() function in Python 2.7:

it removes the CR characters of pairs CR LF from only the strings that result from a manual copy (via the clipboard) of a file's content. The strings passed to raw_input() that are copies of a display of identical strings than the former ones don't loose their CR characters.
The alone CR chars remain untouched in all the cases. A CR (carriage return) is a \r character.

To be clearer than with a muddled description, here's a code describing what must be done to observe the fact, whose orders need only to be executed.

The point is in the Text object: it has 7 characters instead of the 8 that were passed to raw_input() to create Text.

To verifiy that the argument passed to raw_input() had really 8 characters, I created another file PASTED.txt with the same argument. It is indeed an awkward task to be sure of something in this problem, as the copying in a Notepad++ window showed me: all sorts of ends of lines (\r , \n , \r\n) appear as CR LF at the extremities of the lines in such a window.

Ctrl-A to select the whole data of a file is recommended.

I am in the perplexity of wondering if I did a mistake of coding or comprehension, or if it is a real feature of Python.

I hope commentaries and light from you.

with open('PRIM.txt','wb') as f:
    f.write('A\rB\nC\r\nD')
print "  1) A file with name 'PRIM.txt' has just been created with content A\\rB\\nC\\r\\nD"
raw_input("  Open this file and copy manually its CONTENT in the clipboard.\n"+\
          "    --when done, press Enter to continue-- ")


print "\n  2) Paste this CONTENT in a Notepad++ window "+\
      "     and see the symbols at the extremities of the lines."
raw_input("    --when done, press Enter to continue-- ")


Text = raw_input("\n  3) Paste this CONTENT here and press a key : ")
print ("     An object Text has just been created with this pasted value of CONTENT.")


with open('PASTED.txt','wb') as f:
    f.write('')
print "\n  4) An empty file 'PASTED.txt' has just been created."
print "     Paste manually in this file the PRIM's CONTENT and shut this file."
raw_input("     --when done, press Enter to continue-- ")


print "\n  5) Enter the copy of this display of A\\rB\\nC\\r\\nD : \nA\rB\nC\r\nD"
DSP = raw_input('please, enter it on the following line :\n')
print "    An object DSP has just been created with this pasted value of this copied display"


print '\n----------'
with open('PRIM.txt','rb') as fv:
    verif = fv.read()
print "The read content of the file 'PRIM.txt' obtained by open() and read() : "+repr(verif)
print "len of the read content of the file 'PRIM.txt'  ==",len(verif)


print '\n----------'
print "The file PASTED.txt received by pasting the manually copied CONTENT of PRIM.txt"
with open('PASTED.txt','rb') as f:
    cpd = f.read()
    print "The read content of the file 'PASTED.txt' obtained by open() and read() "+\
          "is now : "+repr(cpd)
    print "its len is==",len(cpd)


print '\n----------'
print 'The object Text received through raw_input() the manually copied CONTENT of PRIM.txt'
print "value of Text=="+repr(Text)+\
      "\nText.split('\\r\\n')==",Text.split('\r\n')
print 'len of Text==',len(Text)


print '\n----------'
print "The object DSP received  through raw_input() the copy of the display of A\\rB\\nC\\r\\nD" 
print "value of DSP==",repr(DSP)
print 'len of DSP==',len(DSP)

My OS is Windows. I wonder if the same is observed on other operating systems.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

剪不断理还乱 2024-10-25 02:32:07

sys.stdin 以文本模式打开(您可以通过显示 sys.stdin.mode 并查看它是 'r' 来检查这一点) 。如果在 Python 中以文本模式打开任何文件,则平台本机行结尾(对于 Windows 为 \r\n)将转换为简单的换行符(\n ) 在 Python 字符串中。

您可以通过使用模式 'r' 而不是 'rb' 打开 PASTED.txt 文件来查看其运行情况。

sys.stdin is opened in text mode (you can check this by displaying sys.stdin.mode and seeing that it is 'r'). If you open any file in text mode in Python, then the platform native line ending (\r\n for Windows) will be converted to a simple line feed (\n) in the Python string.

You can see this in operation by opening your PASTED.txt file using mode 'r' instead of 'rb'.

小瓶盖 2024-10-25 02:32:07

在我的帖子之后,我可以从我的代码中查找,我确实注意到从文件复制并传递给 raw_input() 的数据的修改与 Python 在以下情况下执行的换行符的修改相同:它直接读取文件中的数据,如下所示:

with open("TestWindows.txt", 'wb') as f:
    f.write("PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  ")

print "\n- Following string have been written in TestWindows.txt in mode 'wb' :\n"+\
      "PACIFIC \\r  ARCTIC \\n  ATLANTIC \\r\\n  "


print "\n- data got by reading the file TestWindows.txt in 'rb' mode :"
with open("TestWindows.txt", 'rb') as f:
    print "    repr(data)==",repr(f.read())

print "\n- data got by reading the file TestWindows.txt in 'r' mode :"
with open("TestWindows.txt", 'r') as f:
    print "    repr(data)==",repr(f.read())

print "\n- data got by reading the file TestWindows.txt in 'rU' mode :"
with open("TestWindows.txt", 'rU') as f:
    print "    repr(data)==",repr(f.read())

结果:

- Following string have been written in TestWindows.txt in mode 'wb' :
PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  

- data got by reading the file TestWindows.txt in 'rb' mode :
    repr(data)== 'PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  '

- data got by reading the file TestWindows.txt in 'r' mode :
    repr(data)== 'PACIFIC \r  ARCTIC \n  ATLANTIC \n  '

- data got by reading the file TestWindows.txt in 'rU' mode :
    repr(data)== 'PACIFIC \n  ARCTIC \n  ATLANTIC \n  '

首先,文件 PASTED.txt 与文件 PRIM.txt 具有相同的内容,是复制的结果PRIM.txt 的内容并将其粘贴到 PASTED.txt 中,而不用 Python 字符串进行传输。因此,当数据仅通过剪贴板从一个文件传输到另一个文件时,它不会被修改。这一事实证明 PRIM.txt 的内容在复制放置数据的剪贴板中未损坏。

其次,通过剪贴板和raw_input()从文件到Python字符串的数据被修改;因此修改发生在剪贴板和Python字符串之间。因此,我认为 raw_input() 可能会对从剪贴板接收的数据进行与 Python 解释器在从读取文件接收数据时所做的相同解释。

然后,我绣出了这样的想法:将 \r\n 替换为 \n 是因为“Windows 性质”的数据变成了“ Python 本质”,并且剪贴板不会引入数据修改,因为它是 Windows 操作系统控制下的一部分。

唉,从屏幕复制并传递到 raw_input() 的数据并没有经历换行符 \r\n 的转换,尽管该数据传输通过Windows的剪贴板,打破了我的微小概念。

然后我认为Python知道数据的性质不是因为它的来源,而是因为数据中包含的信息;此类信息是一种“格式”。我找到了以下有关Windows剪贴板的页面,剪贴板记录的信息确实有多种格式:

http://msdn.microsoft.com/en-us/library/ms648709(v=vs.85).aspx

也许,修改的解释 \r\n 通过 Python 链接到剪贴板中存在的这些格式,也可能不链接。但我对这些乱七八糟的事情还不够了解,而且我还不能确定。

有人能够解释上述所有观察结果吗?

谢谢您的回答,ncoghlan。但我不认为这是原因:

  • sys.stdin 没有属性 mode

  • 据我所知,

    sys.stdin 指的是键盘。然而,在我的代码中,数据不是来自键盘上的输入,而是来自通过剪贴板的粘贴。这是不同的。

关键点是我不明白Python解释器如何区分来自文件复制的剪贴板的数据和从屏幕复制的来自剪贴板的数据

After my post, I could look up from my code, and I indeed noticed that the modification of data copied from a file and passed to raw_input() is the same as the modification of newlines that Python performs when it reads data directly in a file, which is evidenced here:

with open("TestWindows.txt", 'wb') as f:
    f.write("PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  ")

print "\n- Following string have been written in TestWindows.txt in mode 'wb' :\n"+\
      "PACIFIC \\r  ARCTIC \\n  ATLANTIC \\r\\n  "


print "\n- data got by reading the file TestWindows.txt in 'rb' mode :"
with open("TestWindows.txt", 'rb') as f:
    print "    repr(data)==",repr(f.read())

print "\n- data got by reading the file TestWindows.txt in 'r' mode :"
with open("TestWindows.txt", 'r') as f:
    print "    repr(data)==",repr(f.read())

print "\n- data got by reading the file TestWindows.txt in 'rU' mode :"
with open("TestWindows.txt", 'rU') as f:
    print "    repr(data)==",repr(f.read())

result:

- Following string have been written in TestWindows.txt in mode 'wb' :
PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  

- data got by reading the file TestWindows.txt in 'rb' mode :
    repr(data)== 'PACIFIC \r  ARCTIC \n  ATLANTIC \r\n  '

- data got by reading the file TestWindows.txt in 'r' mode :
    repr(data)== 'PACIFIC \r  ARCTIC \n  ATLANTIC \n  '

- data got by reading the file TestWindows.txt in 'rU' mode :
    repr(data)== 'PACIFIC \n  ARCTIC \n  ATLANTIC \n  '

First, the file PASTED.txt has the same content as the file PRIM.txt, resulting from copying PRIM.txt's content and pasting it in PASTED.txt without transiting in a Python string. So, when data goes from a file to another file transiting only by clipboard, it isn't modified. This fact proves that the content of PRIM.txt stands uncorrupted in the clipboard where the copying put the data.

Secondly, data going from a file to a Python string via clipboard and raw_input() is modified; hence the modification takes place between the clipboard and the Python string. So I thought that raw_input() might do the same interpretation of data received from the clipboard than the Python interpreter does when it receives data from a reading of file.

Then, I embroidered on the idea that the replacement of \r\n with \n is due to the fact that a data of "Windows nature" becomes a data of "Python nature" and that a clipboard doesn't introduce a modification in data because it is a part under control of the Windows operating system.

Alas, the fact that data copied from the screen and passed to raw_input() doesn't undergo transformation of the newlines \r\n , despite the fact that this data transits through Windows's clipboard, breaks my tiny concept.

Then I thought that Python knows the nature of a data not because of its source, but because of information contained in the data; such information is a 'format'. I found the following page concerning Windows's clipboard and there are indeed several formats for the information recorded by a clipboard:

http://msdn.microsoft.com/en-us/library/ms648709(v=vs.85).aspx

Maybe, the explanation of the modification of \r\n by Python is linked to these formats existing in clipboard and maybe not. But I don't understand enough all this mess and I am far to be sure.

Is anybody able to explain all the above observations ?

.

.

Thank you for your answer, ncoghlan. But I don't think it's the reason:

  • sys.stdin has no attribute mode

  • sys.stdin refers to the keyboard, as far as I undesrtand. However, in my code, data doesn't come from a typing on the keyboard but from a pasting via the clipboard. It's different.

The key point is that I don't understand how the Python interpeter could differentiates a data coming from clipboard having been copied from a file and a data coming from clipboard having been copied from the screen

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文