如何使用MDFReader从AWS S3读取.DAT文件

发布于 2025-01-29 20:23:25 字数 990 浏览 3 评论 0原文

我正在使用Python 3.7，并尝试从AWS S3读取.DAT文件，并在某些逻辑上转换为一个或多个CSV。我们正在使用 mdfreader python中的库。

import mdfreader
import pandas as pd

def convert_mdf_to_csvs(file_name, output_file_loc) :
    yop=mdfreader.Mdf(file_name)

    yop.convert_to_pandas()
    # print(list(yop.keys()))
    # print([keys for keys in list(yop.keys()) if keys.endswith("group")])
    all_groups_keys = [keys for keys in list(yop.keys()) if keys.endswith("group")]
    for keys in all_groups_keys :
        print(yop[keys])
        timeframe = keys.split("group")[0]
        yop[keys].to_csv(str(output_file_loc) +  str(timeframe) + ".csv" )

以上代码在本地机器中工作正常，但是由于AWS S3是对象存储“ yop = mdfreader.mdf（file_name）”功能？ MDF功能似乎接受完整的文件路径。我知道我可以将其复制到lambda的TMP并使用它，但是由于那个黑客，我不想这样做。

在SO Q/A上进行了很多搜索，但没有从AWS S3读取的.DAT文件类型的清晰度。

另外，是否有更好的方法来解决此问题，也许使用简单的CSV库或其他任何方法？

有帮助吗？

原文

I'm using Python 3.7 and trying to read a .dat file from AWS S3 and convert it to one or more CSV on certain logic. We're using mdfreader library in Python.

import mdfreader
import pandas as pd

def convert_mdf_to_csvs(file_name, output_file_loc) :
    yop=mdfreader.Mdf(file_name)

    yop.convert_to_pandas()
    # print(list(yop.keys()))
    # print([keys for keys in list(yop.keys()) if keys.endswith("group")])
    all_groups_keys = [keys for keys in list(yop.keys()) if keys.endswith("group")]
    for keys in all_groups_keys :
        print(yop[keys])
        timeframe = keys.split("group")[0]
        yop[keys].to_csv(str(output_file_loc) +  str(timeframe) + ".csv" )

This above code is working fine in a local machine, but since AWS S3 is object storage so the read will be using boto3, but due to lack of documentation on the mdfreader library side, am not very sure how to pass this read stream into the "yop=mdfreader.Mdf(file_name)" function? Mdf function seems to accept a full file path. I know I can copy that to Lambda's tmp and use it, but since that a hack, I do not want to do that.

Searched quite a bit on SO Q/A but didn't get this clarity for .dat file type read from AWS S3.

Also, is there a better way to solve this, maybe using simple csv library or anything else?

Any help?

分享到QQ

分享到微博