我想将以文本格式的报告转换为XLSX文档。但是问题是文本文件中的数据有一些缺少的列值

发布于 2025-02-05 04:21:55 字数 219 浏览 6 评论 0原文

典型的报告数据就是这样,

我想遵循的一种简单方法是将空间用作特定器,但数据不是结构良好的方式

typical report data is like this,
Report in txt format

A simple approach that i wanted to follow was to use space as a delimeter but the data is not in a well structured manner

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

緦唸λ蓇 2025-02-12 04:21:56

您是正确的,从文本到CSV的导出不是一个实际的开始,但是这对导入是有益的。因此,这是您的100%结构良好的源文本,可以将其保存到纯文本中。

这是excel

”在此处输入图像描述”

You are correct that export from text to csv is not a practical start, however it would be good for import. So here is your 100% well structured source text to be saved into plain text.

enter image description here

And here is the import to Excel

enter image description here

℡Ms空城旧梦 2025-02-12 04:21:56

您可以使用Google镜头将数据从这张图片中获取,然后复制并粘贴到Excel文件。最简单的方法。

或首先将其转换为PDF,然后使用Google镜头。转到文件滚动以打印在打印设置中的打印选项,这是Microsoft打印到PDF选择的选项,然后按打印将要求您提供位置,然后给它并使用它

you can use google lens to get your data out of this picture then copy and paste to excel file. the easiest way.

or first convert this into pdf then use google lens. go to file scroll to print option in print setting their is an option of MICROSOFT PRINT TO PDF select that and press print it will ask you for location then give it and use it

岁月静好 2025-02-12 04:21:55

读取文件的第一行,并通过检查是否有1个以上的空格来拆分每一列。除此之外,还可以计算每列的时间。
之后,您可以简单地浏览包含数据的其他行并提取信息,通过检查列的长度

(并且请不要将文本图像放入Stackoverflow,实际文本更好)

编辑:编辑:
Python实施:

import pandas as pd
import re

file = "path/to/file.txt"

with open("file", "r") as f:
    line = f.readline()
    columns = re.split("  +", line)
    column_sizes = [re.finditer(column, line).__next__().start() for column in columns]
    column_sizes.append(-1)

    # ------
    f.readline()

    rows = []
    while True:
        line = f.readline()
        if len(line) == 0:
            break
        elif line[-1] != "\n":
            line += "\n"

        row = []
        for i in range(len(column_sizes)-1):
            value = line[column_sizes[i]:column_sizes[i+1]]
            row.append(value)
        rows.append(row)

columns = [column.strip() for column in columns]
df = pd.DataFrame(data=rows, columns=columns)

print(df)

df.to_excel(file.split(".")[0] + ".xlsx")

read the first line of the file and split each column by checking if there is more than 1 whitespace. In addition to that you count how long each column is.
after that you can simply go through the other rows containing data and extract the information, by checking the length of the column you are at

(and please don't put images of text into stackoverflow, actual text is better)

EDIT:
python implementation:

import pandas as pd
import re

file = "path/to/file.txt"

with open("file", "r") as f:
    line = f.readline()
    columns = re.split("  +", line)
    column_sizes = [re.finditer(column, line).__next__().start() for column in columns]
    column_sizes.append(-1)

    # ------
    f.readline()

    rows = []
    while True:
        line = f.readline()
        if len(line) == 0:
            break
        elif line[-1] != "\n":
            line += "\n"

        row = []
        for i in range(len(column_sizes)-1):
            value = line[column_sizes[i]:column_sizes[i+1]]
            row.append(value)
        rows.append(row)

columns = [column.strip() for column in columns]
df = pd.DataFrame(data=rows, columns=columns)

print(df)

df.to_excel(file.split(".")[0] + ".xlsx")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文