如何删除或清除 S3 上的旧文件?

发布于 2024-12-28 10:29:41 字数 33 浏览 0 评论 0原文

是否有现有的解决方案可以删除超过 x 天的任何文件?

Are there existing solutions to delete any files older than x days?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

耳钉梦 2025-01-04 10:29:42

Amazon 最近推出了对象过期

Amazon S3 宣布对象过期

Amazon S3 宣布了新的
功能,对象过期,允许您安排删除
在预定义的时间段后您的对象。使用对象过期
安排定期清除物体,无需您
识别要删除的对象并向亚马逊提交删除请求
S3。

您可以为一组对象定义对象过期规则
你的水桶。每个对象过期规则允许您指定一个
前缀和有效期(以天为单位)。前缀字段(例如
logs/) 标识受过期规则约束的对象,并且
过期期限指定从创建日期算起的天数
(即年龄)之后应移除物体。一旦物体
已过期,它们将排队等待删除。你
对象在其存储之日或之后将不会被收取存储费用
到期日期。

Amazon has introduced object expiration recently.

Amazon S3 Announces Object Expiration

Amazon S3 announced a new
feature, Object Expiration that allows you to schedule the deletion of
your objects after a pre-defined time period. Using Object Expiration
to schedule periodic removal of objects eliminates the need for you
to identify objects for deletion and submit delete requests to Amazon
S3.

You can define Object Expiration rules for a set of objects in
your bucket. Each Object Expiration rule allows you to specify a
prefix and an expiration period in days. The prefix field (e.g.
logs/) identifies the object(s) subject to the expiration rule, and
the expiration period specifies the number of days from creation date
(i.e. age) after which object(s) should be removed. Once the objects
are past their expiration date, they will be queued for deletion. You
will not be billed for storage for objects on or after their
expiration date.

太阳哥哥 2025-01-04 10:29:42

以下是有关如何执行此操作的一些信息...

生命周期配置元素

Here is some information on how to do it...

Lifecycle configuration elements

笑叹一世浮沉 2025-01-04 10:29:42

以下是如何使用 CloudFormation 模板实现它:

  JenkinsArtifactsBucket:
    Type: "AWS::S3::Bucket"
    Properties:
      BucketName: !Sub "jenkins-artifacts"
      LifecycleConfiguration:
        Rules:
          - Id: "remove-old-artifacts"
            ExpirationInDays: 3
            NoncurrentVersionExpirationInDays: 3
            Status: Enabled

这将创建一个生命周期规则 拉维·巴特解释

了解更多相关内容:AWS::S3::Bucket Rule

对象生命周期管理的工作原理:管理您的存储生命周期

Here is how to implement it using a CloudFormation template:

  JenkinsArtifactsBucket:
    Type: "AWS::S3::Bucket"
    Properties:
      BucketName: !Sub "jenkins-artifacts"
      LifecycleConfiguration:
        Rules:
          - Id: "remove-old-artifacts"
            ExpirationInDays: 3
            NoncurrentVersionExpirationInDays: 3
            Status: Enabled

This creates a lifecycle rule as explained by Ravi Bhatt.

Read more on that: AWS::S3::Bucket Rule

How object lifecycle management works: Managing your storage lifecycle

半衾梦 2025-01-04 10:29:42

您可以使用 AWS S3 生命周期规则使文件过期并删除它们。您所要做的就是选择存储桶,单击“添加生命周期规则”按钮并进行配置,AWS 将为您处理这些规则。

您可以参考 Joe 的以下博客文章以获取分步说明。实际上很简单:

Amazon S3 – 如何删除超过 x 天的文件

You can use AWS S3 Life cycle rules to expire the files and delete them. All you have to do is select the bucket, click on "Add lifecycle rules" button and configure it and AWS will take care of them for you.

You can refer the below blog post from Joe for step-by-step instructions. It's quite simple actually:

Amazon S3 – How to delete files older than x days

不喜欢何必死缠烂打 2025-01-04 10:29:42

下面是一个用于删除 N 天前的文件的 Python 脚本:

from boto3 import client, Session
from botocore.exceptions import ClientError
from datetime import datetime, timezone
import argparse

if __name__ == '__main__':

    parser = argparse.ArgumentParser()

    parser.add_argument('--access_key_id', required=True)
    parser.add_argument('--secret_access_key', required=True)
    parser.add_argument('--delete_after_retention_days', required=False, default=15)
    parser.add_argument('--bucket', required=True)
    parser.add_argument('--prefix', required=False, default="")
    parser.add_argument('--endpoint', required=True)

    args = parser.parse_args()

    access_key_id = args.access_key_id
    secret_access_key = args.secret_access_key
    delete_after_retention_days = int(args.delete_after_retention_days)
    bucket = args.bucket
    prefix = args.prefix
    endpoint = args.endpoint

    # Get current date
    today = datetime.now(timezone.utc)

    try:
        # create a connection to Wasabi
        s3_client = client(
            's3',
            endpoint_url=endpoint,
            access_key_id=access_key_id,
            secret_access_key=secret_access_key)
    except Exception as e:
        raise e

    try:
        # List all the buckets under the account
        list_buckets = s3_client.list_buckets()
    except ClientError:
        # Invalid access keys
        raise Exception("Invalid Access or Secret key")

    # Create a paginator for all objects.
    object_response_paginator = s3_client.get_paginator('list_object_versions')
    if len(prefix) > 0:
        operation_parameters = {'Bucket': bucket,
                                'Prefix': prefix}
    else:
        operation_parameters = {'Bucket': bucket}

    # Instantiate temp variables.
    delete_list = []
    count_current = 0
    count_non_current = 0

    print("$ Paginating bucket " + bucket)
    for object_response_itr in object_response_paginator.paginate(**operation_parameters):
        for version in object_response_itr['Versions']:
            if version["IsLatest"] is True:
                count_current += 1
            elif version["IsLatest"] is False:
                count_non_current += 1
            if (today - version['LastModified']).days > delete_after_retention_days:
                delete_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})

    # Print objects count
    print("-" * 20)
    print("$ Before deleting objects")
    print("$ current objects: " + str(count_current))
    print("$ non-current objects: " + str(count_non_current))
    print("-" * 20)

    # Delete objects 1000 at a time
    print("$ Deleting objects from bucket " + bucket)
    for i in range(0, len(delete_list), 1000):
        response = s3_client.delete_objects(
            Bucket=bucket,
            Delete={
                'Objects': delete_list[i:i + 1000],
                'Quiet': True
            }
        )
        print(response)

    # Reset counts
    count_current = 0
    count_non_current = 0

    # Paginate and recount
    print("$ Paginating bucket " + bucket)
    for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
        if 'Versions' in object_response_itr:
            for version in object_response_itr['Versions']:
                if version["IsLatest"] is True:
                    count_current += 1
                elif version["IsLatest"] is False:
                    count_non_current += 1

    # Print objects count
    print("-" * 20)
    print("$ After deleting objects")
    print("$ current objects: " + str(count_current))
    print("$ non-current objects: " + str(count_non_current))
    print("-" * 20)
    print("$ task complete")

以下是我运行它的方式:

python s3_cleanup.py --aws_access_key_id="access-key" --aws_secret_access_key="secret-key-here" --endpoint="https://s3.us-west-1.wasabisys.com" --bucket="ondemand-downloads" --prefix="" --delete_after_retention_days=5

如果您只想从特定文件夹中删除文件,请使用 prefix 参数。

Here is a Python script to delete N-days old files:

from boto3 import client, Session
from botocore.exceptions import ClientError
from datetime import datetime, timezone
import argparse

if __name__ == '__main__':

    parser = argparse.ArgumentParser()

    parser.add_argument('--access_key_id', required=True)
    parser.add_argument('--secret_access_key', required=True)
    parser.add_argument('--delete_after_retention_days', required=False, default=15)
    parser.add_argument('--bucket', required=True)
    parser.add_argument('--prefix', required=False, default="")
    parser.add_argument('--endpoint', required=True)

    args = parser.parse_args()

    access_key_id = args.access_key_id
    secret_access_key = args.secret_access_key
    delete_after_retention_days = int(args.delete_after_retention_days)
    bucket = args.bucket
    prefix = args.prefix
    endpoint = args.endpoint

    # Get current date
    today = datetime.now(timezone.utc)

    try:
        # create a connection to Wasabi
        s3_client = client(
            's3',
            endpoint_url=endpoint,
            access_key_id=access_key_id,
            secret_access_key=secret_access_key)
    except Exception as e:
        raise e

    try:
        # List all the buckets under the account
        list_buckets = s3_client.list_buckets()
    except ClientError:
        # Invalid access keys
        raise Exception("Invalid Access or Secret key")

    # Create a paginator for all objects.
    object_response_paginator = s3_client.get_paginator('list_object_versions')
    if len(prefix) > 0:
        operation_parameters = {'Bucket': bucket,
                                'Prefix': prefix}
    else:
        operation_parameters = {'Bucket': bucket}

    # Instantiate temp variables.
    delete_list = []
    count_current = 0
    count_non_current = 0

    print("$ Paginating bucket " + bucket)
    for object_response_itr in object_response_paginator.paginate(**operation_parameters):
        for version in object_response_itr['Versions']:
            if version["IsLatest"] is True:
                count_current += 1
            elif version["IsLatest"] is False:
                count_non_current += 1
            if (today - version['LastModified']).days > delete_after_retention_days:
                delete_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})

    # Print objects count
    print("-" * 20)
    print("$ Before deleting objects")
    print("$ current objects: " + str(count_current))
    print("$ non-current objects: " + str(count_non_current))
    print("-" * 20)

    # Delete objects 1000 at a time
    print("$ Deleting objects from bucket " + bucket)
    for i in range(0, len(delete_list), 1000):
        response = s3_client.delete_objects(
            Bucket=bucket,
            Delete={
                'Objects': delete_list[i:i + 1000],
                'Quiet': True
            }
        )
        print(response)

    # Reset counts
    count_current = 0
    count_non_current = 0

    # Paginate and recount
    print("$ Paginating bucket " + bucket)
    for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
        if 'Versions' in object_response_itr:
            for version in object_response_itr['Versions']:
                if version["IsLatest"] is True:
                    count_current += 1
                elif version["IsLatest"] is False:
                    count_non_current += 1

    # Print objects count
    print("-" * 20)
    print("$ After deleting objects")
    print("$ current objects: " + str(count_current))
    print("$ non-current objects: " + str(count_non_current))
    print("-" * 20)
    print("$ task complete")

And here is how I run it:

python s3_cleanup.py --aws_access_key_id="access-key" --aws_secret_access_key="secret-key-here" --endpoint="https://s3.us-west-1.wasabisys.com" --bucket="ondemand-downloads" --prefix="" --delete_after_retention_days=5

If you want to delete files only from a specific folder, then use the prefix parameter.

纵山崖 2025-01-04 10:29:42

您可以使用以下 PowerShell 脚本删除 x 天后过期的对象。

[CmdletBinding()]
Param(
  [Parameter(Mandatory=$True)]
  [string]$BUCKET_NAME,             # Name of the Bucket

  [Parameter(Mandatory=$True)]
  [string]$OBJ_PATH,                # Key prefix of s3 object (directory path)

  [Parameter(Mandatory=$True)]
  [string]$EXPIRY_DAYS              # Number of days to expire
)

$CURRENT_DATE = Get-Date
$OBJECTS = Get-S3Object $BUCKET_NAME -KeyPrefix $OBJ_PATH
Foreach($OBJ in $OBJECTS){
    IF($OBJ.key -ne $OBJ_PATH){
        IF(($CURRENT_DATE - $OBJ.LastModified).Days -le $EXPIRY_DAYS){
            Write-Host "Deleting Object= " $OBJ.key
            Remove-S3Object -BucketName $BUCKET_NAME -Key $OBJ.Key -Force
        }
    }
}

You can use the following PowerShell script to delete object expired after x days.

[CmdletBinding()]
Param(
  [Parameter(Mandatory=$True)]
  [string]$BUCKET_NAME,             # Name of the Bucket

  [Parameter(Mandatory=$True)]
  [string]$OBJ_PATH,                # Key prefix of s3 object (directory path)

  [Parameter(Mandatory=$True)]
  [string]$EXPIRY_DAYS              # Number of days to expire
)

$CURRENT_DATE = Get-Date
$OBJECTS = Get-S3Object $BUCKET_NAME -KeyPrefix $OBJ_PATH
Foreach($OBJ in $OBJECTS){
    IF($OBJ.key -ne $OBJ_PATH){
        IF(($CURRENT_DATE - $OBJ.LastModified).Days -le $EXPIRY_DAYS){
            Write-Host "Deleting Object= " $OBJ.key
            Remove-S3Object -BucketName $BUCKET_NAME -Key $OBJ.Key -Force
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文