Tkinter GUI 将固定宽度文件转换为分隔文件
我正在为我们的数据部门编写一个转换器代码,以将固定宽度文件转换为分隔文件。通常我们使用将文件导入Excel,使用文本导入向导设置字段长度,然后保存为csv。然而,我们遇到了限制,我们开始获取数百万条记录长的文件,因此无法导入到 Excel 中。文件的字段之间并不总是有空格,尤其是电话号码或邮政编码等值字段之间。标题通常也被完全填满,没有空格。
我们正在处理的典型固定宽度文件的示例:
SequenSack and PaFull Name****************************]JOB TITLE****************]HOSP NAME******************************]Delivery Address***********************]Alternate 1 Address********************]Calculated Text**********************************]POSTNET Bar
000001T1 P1 Sample A Sample 123 Any Street Anytown 12345-6789 12345678900
000002T1 P1 Sample A Sample Director of Medicine 123 Any Street Po Box 1234 Anytown 12345-6789 12345678900
程序需要将文件分成以下分隔字段:
序列
萨克和帕
全名
职位名称
医院名称
送货地址
备用地址 1
计算文本
POSTNET Bar
每个文件的每个字段的宽度略有不同,具体取决于作业的其余部分。我正在寻找的是一个面向 GUI 的分隔符,非常类似于固定宽度文件的 Excel 导入向导。我用 Python 编写这个工具,作为一个更大工具的一部分,该工具执行许多其他文件操作,例如将文件分解为多个文件、反转文件、从分隔宽度转换为固定宽度以及校验位检查。我使用 Tkinter 作为其余工具,如果解决方案也使用它,那就太理想了。
任何帮助表示赞赏
I am writing a converter code for our Data Department to convert fixed width files into delmited files. Normally we use import the file into Excel, use the text import wizard to set the field lengths, and then just save as a csv. However we have run into the limitation where we have started getting files that are millions of records long, and thus cant be imported into Excel. The files do not always have spaces in between the fields, espicially so between value fields like phone numbers or zip codes. The headers are also often filled completely in with no spaces.
A sample of a typical fixed width file we are dealing with:
SequenSack and PaFull Name****************************]JOB TITLE****************]HOSP NAME******************************]Delivery Address***********************]Alternate 1 Address********************]Calculated Text**********************************]POSTNET Bar
000001T1 P1 Sample A Sample 123 Any Street Anytown 12345-6789 12345678900
000002T1 P1 Sample A Sample Director of Medicine 123 Any Street Po Box 1234 Anytown 12345-6789 12345678900
The program needs to break file into the following delimited fields:
Sequen
Sack and Pa
Full name
Job Title
Hosp Name
Delivery Address
Alternate Address 1
Calculated Text
POSTNET Bar
Each file as a slightly different width of each field depending on the rest of the job. What i am looking for is a GUI oriented delimiter much like the Excel import wizard for fixed width files. I am writing this tool in Python as a part of a larger tool that does many other file operations such as breaking up files into multiple up, reversing a file, converting from delimited to fixed width and check digit checking. I am using Tkinter for the rest of the tools and it would be ideal if the solution use it as well.
Any help appreciated
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我正确理解了问题(而且我很可能没有......),最简单的解决方案可能是使用文本小部件。
使第一行成为一系列与行长度相同的空格。使用几个交替的标签(例如:“偶数”和“奇数”)为每个字符提供替代颜色,使它们彼此脱颖而出。第二行将是标题,任何剩余的行将是几行示例数据。
然后,在第一行设置绑定,以便在用户单击字符时将空格转换为“x”。如果他们单击“x”,请将其转换回空格。然后,他们可以点击每列开头的字符。当用户完成后,您可以获得文本小部件的第一行,并且每列都会有一个“x”。然后,您只需要一个小函数即可将其转换为您需要的任何格式。
它看起来大致像这样(尽管颜色显然会与本网站上显示的颜色不同),
这是一个快速的技巧来说明总体想法。虽然有点草率,但我认为它说明了这项技术。运行它时,单击第一行中的区域以设置或清除标记。这将导致每个标记的标题以替代颜色突出显示。
If I understand the problem correctly (and there's a good chance I don't...), the simplest solution might be to use a text widget.
Make the first line be a series of spaces the same length as the row. Use a couple of alternating tags (eg: "even" and "odd") to give each character an alternate color so they stand out from one another. The second line would be the header, and any remaining lines would be a couple lines of sample data.
Then, set up bindings on the first row to convert a space into an "x" when the user clicks on a character. If they click on an "x", convert it back to a space. They can then go and click on the character that is the start of each column. When the user is done, you can get the first line of the text widget and it will have an "x" for each column. You then just need a little function that translates that into whatever format you need.
It would look roughly like this (though obviously the colors would be different than what appears on this website)
Here's a quick hack to illustrate the general idea. It's a little sloppy but I think it illustrates the technique. When you run it, click on an area in the first row to set or clear a marker. This will cause the header to be highlighted in alternate colors for each marker.
编辑:我现在看到您正在寻找 GUI。我会将这个错误的答案留给后代。
edit: I now see that you are looking for a gui. I'll leave this incorrect answer for posterity.