当前位置：文江博客话题详情

我想将以文本格式的报告转换为XLSX文档。但是问题是文本文件中的数据有一些缺少的列值

发布于 2025-02-05 04:21:55 字数 219 浏览 6 评论 0原文

典型的报告数据就是这样，

我想遵循的一种简单方法是将空间用作特定器，但数据不是结构良好的方式

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

緦唸λ蓇 2025-02-12 04:21:56

您是正确的，从文本到CSV的导出不是一个实际的开始，但是这对导入是有益的。因此，这是您的100％结构良好的源文本，可以将其保存到纯文本中。

这是excel

回复收藏 0 原文

℡Ms空城旧梦 2025-02-12 04:21:56

您可以使用Google镜头将数据从这张图片中获取，然后复制并粘贴到Excel文件。最简单的方法。

或首先将其转换为PDF，然后使用Google镜头。转到文件滚动以打印在打印设置中的打印选项，这是Microsoft打印到PDF选择的选项，然后按打印将要求您提供位置，然后给它并使用它

回复收藏 0 原文

岁月静好 2025-02-12 04:21:55

读取文件的第一行，并通过检查是否有1个以上的空格来拆分每一列。除此之外，还可以计算每列的时间。
之后，您可以简单地浏览包含数据的其他行并提取信息，通过检查列的长度

（并且请不要将文本图像放入Stackoverflow，实际文本更好）

编辑：编辑：
Python实施：

import pandas as pd
import re

file = "path/to/file.txt"

with open("file", "r") as f:
    line = f.readline()
    columns = re.split("  +", line)
    column_sizes = [re.finditer(column, line).__next__().start() for column in columns]
    column_sizes.append(-1)

    # ------
    f.readline()

    rows = []
    while True:
        line = f.readline()
        if len(line) == 0:
            break
        elif line[-1] != "\n":
            line += "\n"

        row = []
        for i in range(len(column_sizes)-1):
            value = line[column_sizes[i]:column_sizes[i+1]]
            row.append(value)
        rows.append(row)

columns = [column.strip() for column in columns]
df = pd.DataFrame(data=rows, columns=columns)

print(df)

df.to_excel(file.split(".")[0] + ".xlsx")

read the first line of the file and split each column by checking if there is more than 1 whitespace. In addition to that you count how long each column is.
after that you can simply go through the other rows containing data and extract the information, by checking the length of the column you are at

(and please don't put images of text into stackoverflow, actual text is better)

EDIT:
python implementation:

import pandas as pd
import re

file = "path/to/file.txt"

with open("file", "r") as f:
    line = f.readline()
    columns = re.split("  +", line)
    column_sizes = [re.finditer(column, line).__next__().start() for column in columns]
    column_sizes.append(-1)

    # ------
    f.readline()

    rows = []
    while True:
        line = f.readline()
        if len(line) == 0:
            break
        elif line[-1] != "\n":
            line += "\n"

        row = []
        for i in range(len(column_sizes)-1):
            value = line[column_sizes[i]:column_sizes[i+1]]
            row.append(value)
        rows.append(row)

columns = [column.strip() for column in columns]
df = pd.DataFrame(data=rows, columns=columns)

print(df)

df.to_excel(file.split(".")[0] + ".xlsx")

回复收藏 0 原文

~没有更多了~