使用 Python 和 ftplib.FTP 从 z/os 下载文本文件
我正在尝试使用 Python 和 ftplib 自动从 az/os PDS 下载一些文本文件。
由于主机文件是 EBCDIC,我不能简单地使用 FTP.retrbinary()。
FTP.retrlines() 与 open(file,w).writelines 作为回调一起使用时,当然不提供 EOL。
所以,对于初学者来说,我已经想出了这段“对我来说看起来不错”的代码,但由于我是一个相对的 Python 菜鸟,有人能建议更好的方法吗? 显然,为了简单起见,这不是最终的、花里胡哨的事情。
非常感谢。
#!python.exe
from ftplib import FTP
class xfile (file):
def writelineswitheol(self, sequence):
for s in sequence:
self.write(s+"\r\n")
sess = FTP("zos.server.to.be", "myid", "mypassword")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
sess.cwd("'FOO.BAR.PDS'")
a = sess.nlst("RTB*")
for i in a:
sess.retrlines("RETR "+i, xfile(i, 'w').writelineswitheol)
sess.quit()
更新:Python 3.0,平台是Windows XP下的MingW。
z/os PDS 具有固定的记录结构,而不是依赖行结尾作为记录分隔符。 然而,z/os FTP 服务器在以文本模式传输时,会提供记录结尾,而 retrlines() 会将其删除。
结束更新:
这是我修改后的解决方案,它将成为持续开发的基础(例如删除内置密码):
import ftplib
import os
from sys import exc_info
sess = ftplib.FTP("undisclosed.server.com", "userid", "password")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
for dir in ["ASM", "ASML", "ASMM", "C", "CPP", "DLLA", "DLLC", "DLMC", "GEN", "HDR", "MAC"]:
sess.cwd("'ZLTALM.PREP.%s'" % dir)
try:
filelist = sess.nlst()
except ftplib.error_perm as x:
if (x.args[0][:3] != '550'):
raise
else:
try:
os.mkdir(dir)
except:
continue
for hostfile in filelist:
lines = []
sess.retrlines("RETR "+hostfile, lines.append)
pcfile = open("%s/%s"% (dir,hostfile), 'w')
for line in lines:
pcfile.write(line+"\n")
pcfile.close()
print ("Done: " + dir)
sess.quit()
感谢 John 和 Vinay
I'm trying to automate downloading of some text files from a z/os PDS, using Python and ftplib.
Since the host files are EBCDIC, I can't simply use FTP.retrbinary().
FTP.retrlines(), when used with open(file,w).writelines as its callback, doesn't, of course, provide EOLs.
So, for starters, I've come up with this piece of code which "looks OK to me", but as I'm a relative Python noob, can anyone suggest a better approach? Obviously, to keep this question simple, this isn't the final, bells-and-whistles thing.
Many thanks.
#!python.exe
from ftplib import FTP
class xfile (file):
def writelineswitheol(self, sequence):
for s in sequence:
self.write(s+"\r\n")
sess = FTP("zos.server.to.be", "myid", "mypassword")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
sess.cwd("'FOO.BAR.PDS'")
a = sess.nlst("RTB*")
for i in a:
sess.retrlines("RETR "+i, xfile(i, 'w').writelineswitheol)
sess.quit()
Update: Python 3.0, platform is MingW under Windows XP.
z/os PDSs have a fixed record structure, rather than relying on line endings as record separators. However, the z/os FTP server, when transmitting in text mode, provides the record endings, which retrlines() strips off.
Closing update:
Here's my revised solution, which will be the basis for ongoing development (removing built-in passwords, for example):
import ftplib
import os
from sys import exc_info
sess = ftplib.FTP("undisclosed.server.com", "userid", "password")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
for dir in ["ASM", "ASML", "ASMM", "C", "CPP", "DLLA", "DLLC", "DLMC", "GEN", "HDR", "MAC"]:
sess.cwd("'ZLTALM.PREP.%s'" % dir)
try:
filelist = sess.nlst()
except ftplib.error_perm as x:
if (x.args[0][:3] != '550'):
raise
else:
try:
os.mkdir(dir)
except:
continue
for hostfile in filelist:
lines = []
sess.retrlines("RETR "+hostfile, lines.append)
pcfile = open("%s/%s"% (dir,hostfile), 'w')
for line in lines:
pcfile.write(line+"\n")
pcfile.close()
print ("Done: " + dir)
sess.quit()
My thanks to both John and Vinay
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
当我试图弄清楚如何从 z/OS 递归下载数据集时,我遇到了这个问题。 多年来我一直在使用一个简单的 python 脚本从大型机下载 ebcdic 文件。 它实际上只是这样做:
Just came across this question as I was trying to figure out how to recursively download datasets from z/OS. I've been using a simple python script for years now to download ebcdic files from the mainframe. It effectively just does this:
您应该能够以二进制形式下载该文件(使用
retrbinary
),并使用codecs
模块将 EBCDIC 转换为您想要的任何输出编码。 您应该知道 z/OS 系统上使用的特定 EBCDIC 代码页(例如 cp500)。 如果文件很小,您甚至可以执行类似的操作(用于转换为 UTF-8):更新: 如果您需要使用
retrlines
来获取行和您的行以正确的编码返回,您的方法将不起作用,因为每行都会调用一次回调。 因此,在回调中,sequence
将成为该行,并且 for 循环会将该行中的各个字符写入到输出中,每个字符在其自己的行上。 因此,您可能想要执行 self.write(sequence + "\r\n") 而不是 for 循环。 不过,仅仅为了添加此实用方法而对file
进行子类化仍然感觉不太正确 - 它可能需要位于您的bells-and-whistles
版本中的不同类中。You should be able to download the file as a binary (using
retrbinary
) and use thecodecs
module to convert from EBCDIC to whatever output encoding you want. You should know the specific EBCDIC code page being used on the z/OS system (e.g. cp500). If the files are small, you could even do something like (for a conversion to UTF-8):Update: If you need to use
retrlines
to get the lines and your lines are coming back in the correct encoding, your approach will not work, because the callback is called once for each line. So in the callback,sequence
will be the line, and your for loop will write individual characters in the line to the output, each on its own line. So you probably want to doself.write(sequence + "\r\n")
rather than thefor
loop. It still doesn' feel especially right to subclassfile
just to add this utility method, though - it probably needs to be in a different class in yourbells-and-whistles
version.您的 writelineswitheol 方法附加 '\r\n' 而不是 '\n',然后将结果写入以文本模式打开的文件。 无论您在哪个平台上运行,其结果都是不需要的“\r”。 只需附加“\n”,您就会得到适当的行结尾。
正确的错误处理不应该被降级为“花里胡哨”的版本。 您应该设置回调,以便您的文件 open() 位于 try/ except 中并保留对输出文件句柄的引用,您的 write 调用位于 try/ except 中,并且您有一个 callback_obj.close() 方法,该方法当 retrlines() 返回显式 file_handle.close() (在 try/ except 中)时使用 - 这样你就可以得到显式错误处理,例如消息“can't (open|write to|close) file X because Y”并且您不必考虑何时隐式关闭文件以及是否有耗尽文件句柄的风险。
Python 3.x ftplib.FTP.retrlines() 应该给你 str 对象,它们实际上是 Unicode 字符串,你需要在编写它们之前对它们进行编码——除非默认编码是 latin1,这对于 Windows 来说是相当不寻常的盒子。 您应该拥有包含 (1) 所有可能的 256 字节 (2) 在预期 EBCDIC 代码页中有效的所有字节的测试文件。
[一些“卫生”评论]
您应该考虑将 Python 从 3.0(“概念验证”版本)升级到 3.1。
到目前为止发现的两个问题(将行终止符附加到每个字符) ,错误的行终止符)会在您第一次测试时出现。
Your writelineswitheol method appends '\r\n' instead of '\n' and then writes the result to a file opened in text mode. The effect, no matter what platform you are running on, will be an unwanted '\r'. Just append '\n' and you will get the appropriate line ending.
Proper error handling should not be relegated to a "bells and whistles" version. You should set up your callback so that your file open() is in a try/except and retains a reference to the output file handle, your write call is in a try/except, and you have a callback_obj.close() method which you use when retrlines() returns to explicitly file_handle.close() (in a try/except) -- that way you get explict error handling e.g. messages "can't (open|write to|close) file X because Y" AND you save having to think about when your files are going to be implicitly closed and whether you risk running out of file handles.
Python 3.x ftplib.FTP.retrlines() should give you str objects which are in effect Unicode strings, and you will need to encode them before you write them -- unless the default encoding is latin1 which would be rather unusual for a Windows box. You should have test files with (1) all possible 256 bytes (2) all bytes that are valid in the expected EBCDIC codepage.
[a few "sanitation" remarks]
You should consider upgrading your Python from 3.0 (a "proof of concept" release) to 3.1.
To facilitate better understanding of your code, use "i" as an identifier only as a sequence index and only if you irredeemably acquired the habit from FORTRAN 3 or more decades ago :-)
Two of the problems discovered so far (appending line terminator to each character, wrong line terminator) would have shown up the first time you tested it.
使用ftplib的retrlines从z/os下载文件,每行没有'\n'。
它与 Windows ftp 命令“get xxx”不同。
我们可以将 ftplib.py 中的函数“retrlines”重写为“retrlines_zos”。
只需复制 retrlines 的整个代码,并将“回调”行更改为:
...
callback(line + "\n")
...
我测试过,它有效。
Use retrlines of ftplib to download file from z/os, each line has no '\n'.
It's different from windows ftp command 'get xxx'.
We can rewrite the function 'retrlines' to 'retrlines_zos' in ftplib.py.
Just copy the whole code of retrlines, and chane the 'callback' line to:
...
callback(line + "\n")
...
I tested and it worked.
你想要一个 lambda 函数和一个回调。 像这样:
这将下载文件“zOsFile”并将其写入“newfilename”
you want a lambda function and a callback. Like so:
This will download file 'zOsFile' and write it to 'newfilename'