在 Amazon S3 中逐行读取文件?

发布于 2024-10-31 14:28:31 字数 125 浏览 4 评论 0原文

是否可以使用 Amazon S3 逐行读取文件?我希望让人们在某个地方上传大文件,然后让一些代码(可能在亚马逊上运行)逐行读取他们的文件并用它做一些事情,可能以地图减少的多线程方式。或者也许一次只能加载 1000 行...有什么建议吗?

Is it possible to read a file line-by-line with Amazon S3? I'm looking to let people upload large files somewhere, then have some code (probably running on Amazon) read their file line-by-line and do something with it, probably in a map-reduced multithreaded fashion. Or maybe just being able to load 1000 lines at a time... Any suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

流绪微梦 2024-11-07 14:28:31

Amazon S3 确实支持范围请求,但其设计目的不是逐行读取文件。

不过,看起来 Amazon Elastic MapReduce 可能很适合您的情况寻找。 S3 和所使用的 EC2 实例之间的传输速度非常快,然后您可以按照您喜欢的任何方式分配工作。

Amazon S3 does support range requests but its not designed to read a file line by line.

However it looks like Amazon Elastic MapReduce might be a good fit what you are looking for. Transfers between S3 and the EC2 instances used will be very fast and then you can divide up the work in any way you please.

岁月无声 2024-11-07 14:28:31

下面是一个使用 PHP 7 和 Laravel 5 的简单示例,说明如何从 Amazon S3 逐行读取文件:

S3StreamReader.php

<?php
declare(strict_types=1);

namespace App\Helpers\Json;

use App\Helpers\S3StreamFactory;
use Generator;
use SplFileObject;

final class S3StreamReader
{
    /**
     * @var \App\Helpers\S3StreamFactory
     */
    private $streamFactory;


    /**
     * @param \App\Helpers\S3StreamFactory $s3StreamFactory
     */
    public function __construct(S3StreamFactory $s3StreamFactory)
    {
        $this->streamFactory = $s3StreamFactory;
    }

    /**
     * @param string $filename
     * @return \Generator
     */
    public function get(string $filename): Generator
    {
        $file = new SplFileObject($this->streamFactory->create($filename), 'r');

        while (!$file->eof()) {
            yield $file->fgets();
        }
    }
}

S3StreamFactory.php

<?php
declare(strict_types=1);

namespace App\Helpers;

use League\Flysystem\AwsS3v3\AwsS3Adapter;

final class S3StreamFactory
{
    /**
     * @var \League\Flysystem\AwsS3v3\AwsS3Adapter
     */
    private $adapter;


    /**
     * @param \League\Flysystem\AwsS3v3\AwsS3Adapter $adapter
     */
    public function __construct(AwsS3Adapter $adapter)
    {
        $this->adapter = $adapter;
        $adapter->getClient()->registerStreamWrapper();
    }

    /**
     * @param string $filename
     * @return string
     */
    public function create(string $filename): string
    {
        return "s3://{$this->adapter->getBucket()}/{$filename}";
    }
}

使用示例:

$lines = (new S3JsonReader(new S3StreamFactory(Storage::disk('s3')->getAdapter())))->get($sourceFile);

while ($lines->valid()) {
    $line = $lines->current();
    // do something with the current line...
    $lines->next();
}

即使您不使用 Laravel ,您仍然可以使用此代码,因为 Laravel 仅使用 league/flysystem-aws-s3-v3 包。

Here is a simple example using PHP 7 and Laravel 5 how to read a file line-by-line from Amazon S3:

S3StreamReader.php

<?php
declare(strict_types=1);

namespace App\Helpers\Json;

use App\Helpers\S3StreamFactory;
use Generator;
use SplFileObject;

final class S3StreamReader
{
    /**
     * @var \App\Helpers\S3StreamFactory
     */
    private $streamFactory;


    /**
     * @param \App\Helpers\S3StreamFactory $s3StreamFactory
     */
    public function __construct(S3StreamFactory $s3StreamFactory)
    {
        $this->streamFactory = $s3StreamFactory;
    }

    /**
     * @param string $filename
     * @return \Generator
     */
    public function get(string $filename): Generator
    {
        $file = new SplFileObject($this->streamFactory->create($filename), 'r');

        while (!$file->eof()) {
            yield $file->fgets();
        }
    }
}

S3StreamFactory.php

<?php
declare(strict_types=1);

namespace App\Helpers;

use League\Flysystem\AwsS3v3\AwsS3Adapter;

final class S3StreamFactory
{
    /**
     * @var \League\Flysystem\AwsS3v3\AwsS3Adapter
     */
    private $adapter;


    /**
     * @param \League\Flysystem\AwsS3v3\AwsS3Adapter $adapter
     */
    public function __construct(AwsS3Adapter $adapter)
    {
        $this->adapter = $adapter;
        $adapter->getClient()->registerStreamWrapper();
    }

    /**
     * @param string $filename
     * @return string
     */
    public function create(string $filename): string
    {
        return "s3://{$this->adapter->getBucket()}/{$filename}";
    }
}

Example of usage:

$lines = (new S3JsonReader(new S3StreamFactory(Storage::disk('s3')->getAdapter())))->get($sourceFile);

while ($lines->valid()) {
    $line = $lines->current();
    // do something with the current line...
    $lines->next();
}

Even if you don't use Laravel, you can still use this code, since Laravel just uses league/flysystem-aws-s3-v3 package.

梦幻的味道 2024-11-07 14:28:31

这是 PHP 中的一个示例片段,它似乎可以满足您的要求(抓取 file.txt 中的前 1000 行并将它们连接起来)。这有点遗憾,但这个想法可以用其他语言或使用其他技术来实现。关键是像对待 Windows 或 Linux 等任何其他文件系统一样对待 S3,唯一的区别是您使用 S3 密钥凭据并将文件路径设置为 s3://your_directory_tree/your_file.txt”:

<?php 
    set_time_limit(0); 
    include("gs3.php"); 
    /* fake keys!, please put yours */ 
    define('S3_KEY', 'DA5S4D5A6S4D'); 
    define('S3_PRIVATE','adsadasd');

    $f = fopen('s3://mydir/file.txt', 'r');
    $c = "";
    $d = 0;

    $handle = @fopen('s3://mydir/file.txt', "r");
    if ($handle) {
        while (($buffer = fgets($handle)) !== false  && $d < 1000) {
            $c .= $buffer; /* concatenate the string (newlines attached)*/
            $d += 1; /* increment the count*?
        }
        if (!feof($handle)) {
            echo "Error: unexpected fgets() fail\n";
        }
        else{
            print "$c"
        }

        fclose($handle);
    }
?> 

Here's an example snippet in PHP that seems to do what you're asking (grabs the first 1000 lines in file.txt and concatenates them). It's a bit contrite, but the idea can be implemented in other languages or using other techniques. The key is to treat S3 the same as you would any other file system like windows or linux, the only difference being that you use your S3 keys credentials and set the file path to s3://your_directory_tree/your_file.txt":

<?php 
    set_time_limit(0); 
    include("gs3.php"); 
    /* fake keys!, please put yours */ 
    define('S3_KEY', 'DA5S4D5A6S4D'); 
    define('S3_PRIVATE','adsadasd');

    $f = fopen('s3://mydir/file.txt', 'r');
    $c = "";
    $d = 0;

    $handle = @fopen('s3://mydir/file.txt', "r");
    if ($handle) {
        while (($buffer = fgets($handle)) !== false  && $d < 1000) {
            $c .= $buffer; /* concatenate the string (newlines attached)*/
            $d += 1; /* increment the count*?
        }
        if (!feof($handle)) {
            echo "Error: unexpected fgets() fail\n";
        }
        else{
            print "$c"
        }

        fclose($handle);
    }
?> 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文