如何将Excel表/工作簿直接从Jupyter笔记本电脑上传到Amazon S3?

发布于 2025-02-13 06:31:36 字数 1522 浏览 0 评论 0原文

aim

我在python上使用pandas创建了一些不同的数据范围(在jupyter笔记本中)。我想将它们作为单独的床单上传到Excel Workbook Straight 到亚马逊的S3。

preprex

## Creating two example dataframes

data1 = {'first_column':  ['first_value','second_value'],
        'second_column': ['first_value', 'second_value']}
df1 = pd.DataFrame(data1)

data2 = {'first_column':  ['xvalue', 'yvalue'],
        'second_column': ['xavalue', 'yavalue']}
df2 = pd.DataFrame(data2)

## Convert them into Excel Workbook and storing locally

with pd.ExcelWriter('fake_file.xlsx') as writer:  
    df1.to_excel(writer, sheet_name='df1')
    df2.to_excel(writer, sheet_name='df2')

## Uploading the locally stored Excel Workbook onto S3

import boto3
import pathlib
import os

s3 = boto3.client("s3")
bucket_name = "my_bucket_name"
object_name = "final_fake.xlsx"
__file__ = "my_python_script.ipynb"
file_name = os.path.join(pathlib.Path(__file__).parent.resolve(), "fake_file.xlsx")

s3.upload_file(file_name, bucket_name, object_name)

解决方案寻求

如何使用我的不同数据框架在S3上创建Excel Workbook而不将其本地存储在Jupyter Notebook上?

下面,我能够将数据框架直接上传到S3作为CSV。我该如何做同样的事情,但是将其作为表格发送到Excel工作簿中?

## Sending one of my dataframes straight to S3 as a CSV

from io import StringIO 
import boto3

bucket = "my_bucket_name"
csv_buffer = StringIO()
df1.to_csv(csv_buffer)

s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df1.csv').put(Body=csv_buffer.getvalue())

Aim

I have a few different dataframes created with Pandas on Python (in Jupyter Notebook). I want to upload them as separate sheets to an Excel workbook STRAIGHT to Amazon's S3.

Reprex

## Creating two example dataframes

data1 = {'first_column':  ['first_value','second_value'],
        'second_column': ['first_value', 'second_value']}
df1 = pd.DataFrame(data1)

data2 = {'first_column':  ['xvalue', 'yvalue'],
        'second_column': ['xavalue', 'yavalue']}
df2 = pd.DataFrame(data2)

## Convert them into Excel Workbook and storing locally

with pd.ExcelWriter('fake_file.xlsx') as writer:  
    df1.to_excel(writer, sheet_name='df1')
    df2.to_excel(writer, sheet_name='df2')

## Uploading the locally stored Excel Workbook onto S3

import boto3
import pathlib
import os

s3 = boto3.client("s3")
bucket_name = "my_bucket_name"
object_name = "final_fake.xlsx"
__file__ = "my_python_script.ipynb"
file_name = os.path.join(pathlib.Path(__file__).parent.resolve(), "fake_file.xlsx")

s3.upload_file(file_name, bucket_name, object_name)

Solution sought

How can I create an excel workbook on S3 using my different dataframes without storing it locally on Jupyter Notebook?

Below, I was able to upload a dataframe straight to S3 as a csv. How can I do the same but send it as a sheet onto an Excel Workbook?

## Sending one of my dataframes straight to S3 as a CSV

from io import StringIO 
import boto3

bucket = "my_bucket_name"
csv_buffer = StringIO()
df1.to_csv(csv_buffer)

s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df1.csv').put(Body=csv_buffer.getvalue())

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

静若繁花 2025-02-20 06:31:36

您可以使用 bytesio当保存到XLSX文件以将其保存到内存时,目的是要完成相同的基本想法,然后将该数据上传到S3:

import pandas as pd
import boto3, io, os, pathlib

## Creating two example dataframes
df1 = pd.DataFrame({'first_column': ['first_value','second_value'], 'second_column': ['first_value', 'second_value']})
df2 = pd.DataFrame({'first_column': ['xvalue', 'yvalue'], 'second_column': ['xavalue', 'yavalue']})

## Convert them into Excel Workbook in memory
with io.BytesIO() as xlsx_data:
    with pd.ExcelWriter(xlsx_data) as writer:  
        df1.to_excel(writer, sheet_name='df1')
        df2.to_excel(writer, sheet_name='df2')

    ## Upload the in-memory data to S3
    s3 = boto3.client("s3")
    bucket_name = "-example-"
    object_name = "final_fake.xlsx"
    s3.put_object(Bucket=bucket_name, Key=object_name, Body=xlsx_data.getvalue())

You can use a BytesIO object to accomplish the same basic idea when saving to a xlsx file to save it to memory first, and then upload that data to S3:

import pandas as pd
import boto3, io, os, pathlib

## Creating two example dataframes
df1 = pd.DataFrame({'first_column': ['first_value','second_value'], 'second_column': ['first_value', 'second_value']})
df2 = pd.DataFrame({'first_column': ['xvalue', 'yvalue'], 'second_column': ['xavalue', 'yavalue']})

## Convert them into Excel Workbook in memory
with io.BytesIO() as xlsx_data:
    with pd.ExcelWriter(xlsx_data) as writer:  
        df1.to_excel(writer, sheet_name='df1')
        df2.to_excel(writer, sheet_name='df2')

    ## Upload the in-memory data to S3
    s3 = boto3.client("s3")
    bucket_name = "-example-"
    object_name = "final_fake.xlsx"
    s3.put_object(Bucket=bucket_name, Key=object_name, Body=xlsx_data.getvalue())
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文