在 Sagemaker 中执行 Sagemaker.sklearn.processing.SKLearnProcessor.run 作业时出现延迟

发布于 2025-01-09 22:13:31 字数 1143 浏览 0 评论 0原文

我使用 Sagemaker 的 SKLearnProcessor.run 来执行我的训练工作。从我的处理作业开始执行到读取processing.py 文件中的第一行代码，有 4-5 分钟的延迟。作业开始执行后，无论输入文件有多大，作业都会快速完成执行，这符合 Sagemaker 处理能力的预期。

我的问题是，我能否以某种方式减少开始执行processing.py 文件所需的时间。

sklearn_job.run(code= os.path.join('s3://',bucket, code_prefix, 'preprocessing_v2.py'),

'''

            inputs=[ProcessingInput(
                input_name='raw1',
                source= os.path.join('s3://',bucket, input_prefix, 'file1.csv'),
                destination='/opt/ml/processing/input1'),
                   ProcessingInput(
                input_name='raw2',
                source= os.path.join('s3://',bucket, input_prefix, 'file2.csv'),
                destination='/opt/ml/processing/input2')],
            outputs=[ProcessingOutput(output_name='sample_file',
                                      source='/opt/ml/processing/dataset',
                                      destination=os.path.join('s3://',bucket, output_prefix))],
                  
            arguments=["--train_size", "0.8","--test_size","0.2"],
            wait=True, logs=True,
           )

'''

原文

I use Sagemaker's SKLearnProcessor.run for executing my training job. Between the time my processing job starts executing and the time my first line of the code in the processing.py file is read, there is a delay of 4-5 minutes.
After the job starts executing, irrespective of how large the input file is, the job completes execution quickly, as is expected from Sagemaker's processing capabilities.

My question is, can I somehow reduce the time it takes to start executing my processing.py file.

sklearn_job.run(code= os.path.join('s3://',bucket, code_prefix, 'preprocessing_v2.py'),

'''

            inputs=[ProcessingInput(
                input_name='raw1',
                source= os.path.join('s3://',bucket, input_prefix, 'file1.csv'),
                destination='/opt/ml/processing/input1'),
                   ProcessingInput(
                input_name='raw2',
                source= os.path.join('s3://',bucket, input_prefix, 'file2.csv'),
                destination='/opt/ml/processing/input2')],
            outputs=[ProcessingOutput(output_name='sample_file',
                                      source='/opt/ml/processing/dataset',
                                      destination=os.path.join('s3://',bucket, output_prefix))],
                  
            arguments=["--train_size", "0.8","--test_size","0.2"],
            wait=True, logs=True,
           )

'''

分享到QQ

分享到微博