Tkinter GUI 将固定宽度文件转换为分隔文件

发布于 2024-12-26 11:53:15 字数 1362 浏览 5 评论 0原文

我正在为我们的数据部门编写一个转换器代码，以将固定宽度文件转换为分隔文件。通常我们使用将文件导入Excel，使用文本导入向导设置字段长度，然后保存为csv。然而，我们遇到了限制，我们开始获取数百万条记录长的文件，因此无法导入到 Excel 中。文件的字段之间并不总是有空格，尤其是电话号码或邮政编码等值字段之间。标题通常也被完全填满，没有空格。

我们正在处理的典型固定宽度文件的示例：

SequenSack and PaFull Name****************************]JOB TITLE****************]HOSP NAME******************************]Delivery Address***********************]Alternate 1 Address********************]Calculated Text**********************************]POSTNET Bar
000001T1  P1     Sample A Sample                                                                                         123 Any Street                                                                  Anytown 12345-6789                                12345678900
000002T1  P1     Sample A Sample                       Director of Medicine                                              123 Any Street                          Po Box 1234                             Anytown 12345-6789                                12345678900

程序需要将文件分成以下分隔字段：

序列
萨克和帕
全名
职位名称
医院名称
送货地址
备用地址 1
计算文本
POSTNET Bar

每个文件的每个字段的宽度略有不同，具体取决于作业的其余部分。我正在寻找的是一个面向 GUI 的分隔符，非常类似于固定宽度文件的 Excel 导入向导。我用 Python 编写这个工具，作为一个更大工具的一部分，该工具执行许多其他文件操作，例如将文件分解为多个文件、反转文件、从分隔宽度转换为固定宽度以及校验位检查。我使用 Tkinter 作为其余工具，如果解决方案也使用它，那就太理想了。

任何帮助表示赞赏

原文

I am writing a converter code for our Data Department to convert fixed width files into delmited files. Normally we use import the file into Excel, use the text import wizard to set the field lengths, and then just save as a csv. However we have run into the limitation where we have started getting files that are millions of records long, and thus cant be imported into Excel. The files do not always have spaces in between the fields, espicially so between value fields like phone numbers or zip codes. The headers are also often filled completely in with no spaces.

A sample of a typical fixed width file we are dealing with:

SequenSack and PaFull Name****************************]JOB TITLE****************]HOSP NAME******************************]Delivery Address***********************]Alternate 1 Address********************]Calculated Text**********************************]POSTNET Bar
000001T1  P1     Sample A Sample                                                                                         123 Any Street                                                                  Anytown 12345-6789                                12345678900
000002T1  P1     Sample A Sample                       Director of Medicine                                              123 Any Street                          Po Box 1234                             Anytown 12345-6789                                12345678900

The program needs to break file into the following delimited fields:

Sequen
Sack and Pa
Full name
Job Title
Hosp Name
Delivery Address
Alternate Address 1
Calculated Text
POSTNET Bar

Each file as a slightly different width of each field depending on the rest of the job. What i am looking for is a GUI oriented delimiter much like the Excel import wizard for fixed width files. I am writing this tool in Python as a part of a larger tool that does many other file operations such as breaking up files into multiple up, reversing a file, converting from delimited to fixed width and check digit checking. I am using Tkinter for the rest of the tools and it would be ideal if the solution use it as well.

Any help appreciated

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

下雨或天晴 2025-01-02 11:53:15

如果我正确理解了问题（而且我很可能没有......），最简单的解决方案可能是使用文本小部件。

使第一行成为一系列与行长度相同的空格。使用几个交替的标签（例如：“偶数”和“奇数”）为每个字符提供替代颜色，使它们彼此脱颖而出。第二行将是标题，任何剩余的行将是几行示例数据。

然后，在第一行设置绑定，以便在用户单击字符时将空格转换为“x”。如果他们单击“x”，请将其转换回空格。然后，他们可以点击每列开头的字符。当用户完成后，您可以获得文本小部件的第一行，并且每列都会有一个“x”。然后，您只需要一个小函数即可将其转换为您需要的任何格式。

它看起来大致像这样（尽管颜色显然会与本网站上显示的颜色不同），

      x          x                                     x  ...
SequenSack and PaFull Name****************************]JOB...
000001T1  P1     Sample A Sample                          ...

这是一个快速的技巧来说明总体想法。虽然有点草率，但我认为它说明了这项技术。运行它时，单击第一行中的区域以设置或清除标记。这将导致每个标记的标题以替代颜色突出显示。

import sys
import Tkinter as tk
import tkFont

class SampleApp(tk.Tk):
    def __init__(self, *args, **kwargs):
        tk.Tk.__init__(self, *args, **kwargs)

        header = "SequenSack and PaFull Name****************************]JOB TITLE****************]HOSP NAME******************************]Delivery Address***********************]Alternate 1 Address********************]Calculated Text**********************************]POSTNET Bar"
        sample = "000001T1  P1     Sample A Sample                                                                                         123 Any Street                                                                  Anytown 12345-6789                                12345678900"
        widget = DelimiterWidget(self, header, sample)
        hsb = tk.Scrollbar(orient="horizontal", command=widget.xview)
        widget.configure(xscrollcommand=hsb.set)
        hsb.pack(side="bottom", fill="x")
        widget.pack(side="top", fill="x")

class DelimiterWidget(tk.Text):
    def __init__(self, parent, header, samplerow):
        fixedFont = tkFont.nametofont("TkFixedFont")
        tk.Text.__init__(self, parent, wrap="none", height=3, font=fixedFont)
        self.configure(cursor="left_ptr")
        self.tag_configure("header", background="gray")
        self.tag_configure("even", background="#ffffff")
        self.tag_configure("header_even", background="bisque")
        self.tag_configure("header_odd", background="lightblue")
        self.tag_configure("odd", background="#eeeeee")
        markers = " "*len(header)
        for i in range(len(header)):
            tag = "even" if i%2==0 else "odd"
            self.insert("end", " ", (tag,))
        self.insert("end", "\n")
        self.insert("end", header+"\n", "header")
        self.insert("end", samplerow, "sample")
        self.configure(state="disabled")
        self.bind("<1>", self.on_click)
        self.bind("<Double-1>", self.on_click)
        self.bind("<Triple-1>", self.on_click)

    def on_click(self, event):
        '''Handle a click on a marker'''
        index = self.index("@%s,%s" % (event.x, event.y))
        current = self.get(index)
        self.configure(state="normal")
        self.delete(index)
        (line, column) = index.split(".")
        tag = "even" if int(column)%2 == 0 else "odd"
        char = " " if current == "x" else "x"
        self.insert(index, char, tag)
        self.configure(state="disabled")
        self.highlight_header()
        return "break"

    def highlight_header(self):
        '''Highlight the header based on marker positions'''
        self.tag_remove("header_even", 1.0, "end")
        self.tag_remove("header_odd", 1.0, "end")
        markers = self.get(1.0, "1.0 lineend")

        i = 0
        start = "2.0"
        tag = "header_even"
        while True:
            try:
                i = markers.index("x", i+1)
                end = "2.%s" % i
                self.tag_add(tag, start, end)
                start = self.index(end)
                tag = "header_even" if tag == "header_odd" else "header_odd"
            except ValueError:
                break

if __name__ == "__main__":
    app = SampleApp()
    app.mainloop()

If I understand the problem correctly (and there's a good chance I don't...), the simplest solution might be to use a text widget.

Make the first line be a series of spaces the same length as the row. Use a couple of alternating tags (eg: "even" and "odd") to give each character an alternate color so they stand out from one another. The second line would be the header, and any remaining lines would be a couple lines of sample data.

Then, set up bindings on the first row to convert a space into an "x" when the user clicks on a character. If they click on an "x", convert it back to a space. They can then go and click on the character that is the start of each column. When the user is done, you can get the first line of the text widget and it will have an "x" for each column. You then just need a little function that translates that into whatever format you need.

It would look roughly like this (though obviously the colors would be different than what appears on this website)

      x          x                                     x  ...
SequenSack and PaFull Name****************************]JOB...
000001T1  P1     Sample A Sample                          ...

Here's a quick hack to illustrate the general idea. It's a little sloppy but I think it illustrates the technique. When you run it, click on an area in the first row to set or clear a marker. This will cause the header to be highlighted in alternate colors for each marker.

import sys
import Tkinter as tk
import tkFont

class SampleApp(tk.Tk):
    def __init__(self, *args, **kwargs):
        tk.Tk.__init__(self, *args, **kwargs)

        header = "SequenSack and PaFull Name****************************]JOB TITLE****************]HOSP NAME******************************]Delivery Address***********************]Alternate 1 Address********************]Calculated Text**********************************]POSTNET Bar"
        sample = "000001T1  P1     Sample A Sample                                                                                         123 Any Street                                                                  Anytown 12345-6789                                12345678900"
        widget = DelimiterWidget(self, header, sample)
        hsb = tk.Scrollbar(orient="horizontal", command=widget.xview)
        widget.configure(xscrollcommand=hsb.set)
        hsb.pack(side="bottom", fill="x")
        widget.pack(side="top", fill="x")

class DelimiterWidget(tk.Text):
    def __init__(self, parent, header, samplerow):
        fixedFont = tkFont.nametofont("TkFixedFont")
        tk.Text.__init__(self, parent, wrap="none", height=3, font=fixedFont)
        self.configure(cursor="left_ptr")
        self.tag_configure("header", background="gray")
        self.tag_configure("even", background="#ffffff")
        self.tag_configure("header_even", background="bisque")
        self.tag_configure("header_odd", background="lightblue")
        self.tag_configure("odd", background="#eeeeee")
        markers = " "*len(header)
        for i in range(len(header)):
            tag = "even" if i%2==0 else "odd"
            self.insert("end", " ", (tag,))
        self.insert("end", "\n")
        self.insert("end", header+"\n", "header")
        self.insert("end", samplerow, "sample")
        self.configure(state="disabled")
        self.bind("<1>", self.on_click)
        self.bind("<Double-1>", self.on_click)
        self.bind("<Triple-1>", self.on_click)

    def on_click(self, event):
        '''Handle a click on a marker'''
        index = self.index("@%s,%s" % (event.x, event.y))
        current = self.get(index)
        self.configure(state="normal")
        self.delete(index)
        (line, column) = index.split(".")
        tag = "even" if int(column)%2 == 0 else "odd"
        char = " " if current == "x" else "x"
        self.insert(index, char, tag)
        self.configure(state="disabled")
        self.highlight_header()
        return "break"

    def highlight_header(self):
        '''Highlight the header based on marker positions'''
        self.tag_remove("header_even", 1.0, "end")
        self.tag_remove("header_odd", 1.0, "end")
        markers = self.get(1.0, "1.0 lineend")

        i = 0
        start = "2.0"
        tag = "header_even"
        while True:
            try:
                i = markers.index("x", i+1)
                end = "2.%s" % i
                self.tag_add(tag, start, end)
                start = self.index(end)
                tag = "header_even" if tag == "header_odd" else "header_odd"
            except ValueError:
                break

if __name__ == "__main__":
    app = SampleApp()
    app.mainloop()

回复收藏 0 原文

惯饮孤独 2025-01-02 11:53:15

编辑：我现在看到您正在寻找 GUI。我会将这个错误的答案留给后代。

import csv

def fixedwidth2csv(fw_name, csv_name, field_info, headings=None):
    with open(fw_name, 'r') as fw_in:
        with open(csv_name, 'rb') as csv_out: # 'rb' => 'r' for python 3
            wtr = csv.writer(csv_out)
            if headings:
                wtr.writerow(headings)
            for line in fw_in:
                wtr.writerow(line[pos:pos+width].strip() for pos, width in field_info)

edit: I now see that you are looking for a gui. I'll leave this incorrect answer for posterity.

import csv

def fixedwidth2csv(fw_name, csv_name, field_info, headings=None):
    with open(fw_name, 'r') as fw_in:
        with open(csv_name, 'rb') as csv_out: # 'rb' => 'r' for python 3
            wtr = csv.writer(csv_out)
            if headings:
                wtr.writerow(headings)
            for line in fw_in:
                wtr.writerow(line[pos:pos+width].strip() for pos, width in field_info)

回复收藏 0 原文

~没有更多了~