使用read()方法在读取Amazon S3的zipperiped shapefile时

发布于 2025-01-20 00:44:58 字数 3450 浏览 1 评论 0原文

我是 AWS 和 Shapefiles 的新手,请耐心等待。

我目前正在尝试将形状文件从 Amazon S3 读取到 RDS Postgres 数据库中。我正在努力读取该文件,因为它引发了内存错误。到目前为止我已经尝试过:

from io import StringIO, BytesIO
from zipfile import ZipFile
from urllib.request import urlopen

import shapefile
import geopandas as gpd
from shapely.geometry import shape  

import pandas as pd
import requests
import boto3
def lambda_handler(event,context):


    """Accessing the S3 buckets using boto3 client"""
    s3_client =boto3.client('s3')
    s3_bucket_name='myshapeawsbucket'
    s3 = boto3.resource('s3',aws_access_key_id="my_id", aws_secret_access_key="my_key")
        
    my_bucket=s3.Bucket(s3_bucket_name)
    bucket_list = []
    for file in my_bucket.objects.filter():
        print(file.key)
        bucket_list.append(file.key)
    for file in bucket_list:
        obj = s3.Object(s3_bucket_name,file)
        data=obj.get()['Body'].read()


    return {
        
        'message':"Success!"
    }

一旦代码尝试执行 obj.get()['Body'].read() 我就会收到以下错误:

Response
{
  "errorMessage": "",
  "errorType": "MemoryError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 27, in lambda_handler\n    data=obj.get()['Body'].read()\n",
    "  File \"/var/runtime/botocore/response.py\", line 82, in read\n    chunk = self._raw_stream.read(amt)\n",
    "  File \"/opt/python/urllib3/response.py\", line 518, in read\n    data = self._fp.read() if not fp_closed else b\"\"\n",
    "  File \"/var/lang/lib/python3.8/http/client.py\", line 472, in read\n    s = self._safe_read(self.length)\n",
    "  File \"/var/lang/lib/python3.8/http/client.py\", line 613, in _safe_read\n    data = self.fp.read(amt)\n"
  ]
}

Function Logs
START RequestId: 7b70c331-ad7a-4964-b841-91da345b5174 Version: $LATEST
Roads_shp.zip
[ERROR] MemoryError
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 27, in lambda_handler
    data=obj.get()['Body'].read()
  File "/var/runtime/botocore/response.py", line 82, in read
    chunk = self._raw_stream.read(amt)
  File "/opt/python/urllib3/response.py", line 518, in read
    data = self._fp.read() if not fp_closed else b""
  File "/var/lang/lib/python3.8/http/client.py", line 472, in read
    s = self._safe_read(self.length)
  File "/var/lang/lib/python3.8/http/client.py", line 613, in _safe_read
    data = self.fp.read(amt)END RequestId: 7b70c331-ad7a-4964-b841-91da345b5174
REPORT RequestId: 7b70c331-ad7a-4964-b841-91da345b5174  Duration: 3980.11 ms    Billed Duration: 3981 ms    Memory Size: 128 MB Max Memory Used: 128 MB Init Duration: 2334.01 ms

Request ID
7b70c331-ad7a-4964-b841-91da345b5174

我正在遵循教程这里:将 Shapefiles 从 URL 读取到 GeoPandas

我已经研究了这些问题,但找不到特定于 ShapeFiles 的答案。

我查看的链接:

  1. 使用 read() 方法从 Amazon S3 读取大尺寸 JSON 文件时出现内存错误
  2. 使用 Python 将大型压缩 JSON 文件从 Amazon S3 导入 AWS RDS-PostgreSQL

I'm new to AWS and Shapefiles so bear with me.

I'm currently trying to read a shape file from Amazon S3 onto an RDS Postgres database. I'm struggling to read the file as it throws me a memory error. What I've tried so far:

from io import StringIO, BytesIO
from zipfile import ZipFile
from urllib.request import urlopen

import shapefile
import geopandas as gpd
from shapely.geometry import shape  

import pandas as pd
import requests
import boto3
def lambda_handler(event,context):


    """Accessing the S3 buckets using boto3 client"""
    s3_client =boto3.client('s3')
    s3_bucket_name='myshapeawsbucket'
    s3 = boto3.resource('s3',aws_access_key_id="my_id", aws_secret_access_key="my_key")
        
    my_bucket=s3.Bucket(s3_bucket_name)
    bucket_list = []
    for file in my_bucket.objects.filter():
        print(file.key)
        bucket_list.append(file.key)
    for file in bucket_list:
        obj = s3.Object(s3_bucket_name,file)
        data=obj.get()['Body'].read()


    return {
        
        'message':"Success!"
    }

As soon as the code tries to execute obj.get()['Body'].read() i get the following error:

Response
{
  "errorMessage": "",
  "errorType": "MemoryError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 27, in lambda_handler\n    data=obj.get()['Body'].read()\n",
    "  File \"/var/runtime/botocore/response.py\", line 82, in read\n    chunk = self._raw_stream.read(amt)\n",
    "  File \"/opt/python/urllib3/response.py\", line 518, in read\n    data = self._fp.read() if not fp_closed else b\"\"\n",
    "  File \"/var/lang/lib/python3.8/http/client.py\", line 472, in read\n    s = self._safe_read(self.length)\n",
    "  File \"/var/lang/lib/python3.8/http/client.py\", line 613, in _safe_read\n    data = self.fp.read(amt)\n"
  ]
}

Function Logs
START RequestId: 7b70c331-ad7a-4964-b841-91da345b5174 Version: $LATEST
Roads_shp.zip
[ERROR] MemoryError
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 27, in lambda_handler
    data=obj.get()['Body'].read()
  File "/var/runtime/botocore/response.py", line 82, in read
    chunk = self._raw_stream.read(amt)
  File "/opt/python/urllib3/response.py", line 518, in read
    data = self._fp.read() if not fp_closed else b""
  File "/var/lang/lib/python3.8/http/client.py", line 472, in read
    s = self._safe_read(self.length)
  File "/var/lang/lib/python3.8/http/client.py", line 613, in _safe_read
    data = self.fp.read(amt)END RequestId: 7b70c331-ad7a-4964-b841-91da345b5174
REPORT RequestId: 7b70c331-ad7a-4964-b841-91da345b5174  Duration: 3980.11 ms    Billed Duration: 3981 ms    Memory Size: 128 MB Max Memory Used: 128 MB Init Duration: 2334.01 ms

Request ID
7b70c331-ad7a-4964-b841-91da345b5174

I'm following the tutorial off here: Reading Shapefiles from a URL into GeoPandas

I have looked into the issues but couldn't find an answer specific to ShapeFiles.

Links I looked at:

  1. MemoryError when Using the read() Method in Reading a Large Size of JSON file from Amazon S3
  2. Importing Large Size of Zipped JSON File from Amazon S3 into AWS RDS-PostgreSQL Using Python

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

活雷疯 2025-01-27 00:44:58

我自己已经解决了这个问题。该问题与逻辑无关,而是与 AWS 为 lambda 设置的 RAM 限制有关。 https://aws.amazon.com/lambda/pricing/?icmpid=docs_console_unmapped< /a>.将内存从默认的 128 MB 增加到您想要的大小

I have fixed the problem myself. The problem does not have anything to do with the logic but AWS's RAM limit set for lambdas. https://aws.amazon.com/lambda/pricing/?icmpid=docs_console_unmapped. Increase the memory from the default 128 MB to whatever you'd like

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文