Pig 执行

发布于 2024-06-23 16:54:50 字数 6563 浏览 0 评论 0 收藏 0

在上一章中，我们解释了如何安装Apache Pig。在本章中，我们将讨论如何执行Apache Pig。

Apache Pig执行模式

您可以在两种模式下运行Apache Pig，即本地模式和HDFS模式。

本地模式

在这种模式下，所有文件都从本地主机和本地文件系统安装并运行。无需Hadoop或HDFS。此模式通常用于测试目的。

MapReduce模式

MapReduce模式是我们使用Apache Pig加载或处理Hadoop文件系统（HDFS）中存在的数据的地方。在这种模式下，每当我们执行Pig Latin语句来处理数据时，都会在后端调用MapReduce作业以对HDFS中存在的数据执行特定操作。

Apache Pig执行机制

Apache Pig脚本可以通过三种方式执行，即交互方式，批处理方式和嵌入式方式。

交互模式（Grunt shell）：您可以使用Grunt shell在交互模式下运行Apache Pig。在此外壳程序中，您可以输入Pig Latin语句并获取输出（使用Dump运算符）。
批处理模式（脚本）：您可以通过在单个扩展名为.pig的文件中编写Pig Latin脚本，以批处理模式运行Apache Pig 。
嵌入模式（UDF）： Apache Pig提供了定义我们自己的函数（的规定USER Defined Functions）在诸如Java编程语言，并在我们的脚本中使用它们。

调用Grunt Shell

您可以使用-x选项以所需的方式（本地/MapReduce）调用Grunt shell，如下所示。

本地模式

本地模式命令如下：

$ pig -x local
2021-01-11 15:09:08,420 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-01-11 15:09:08,690 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
2021-01-11 15:09:08,690 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2021-01-11 15:09:08,825 [main] INFO  org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
2021-01-11 15:09:08,825 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/jc2182/pig/pig_1610348948819.log
2021-01-11 15:09:08,939 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/jc2182/.pigbootup not found
2021-01-11 15:09:09,131 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2021-01-11 15:09:09,133 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2021-01-11 15:09:09,537 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2021-01-11 15:09:09,618 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-47ff99a2-5aab-497f-9966-0ffebd44f115
2021-01-11 15:09:09,618 [main] WARN  org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt>

MapReduce 模式

MapReduct模式命令如下：

$ pig -x mapreduce
2021-01-11 15:11:00,724 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
2021-01-11 15:11:00,726 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
2021-01-11 15:11:00,726 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2021-01-11 15:11:00,816 [main] INFO  org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
2021-01-11 15:11:00,816 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/jc2182/pig/pig_1610349060803.log
2021-01-11 15:11:00,856 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/jc2182/.pigbootup not found
2021-01-11 15:11:01,150 [main] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-01-11 15:11:01,179 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2021-01-11 15:11:01,179 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2021-01-11 15:11:02,161 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-424e89f8-dc16-49eb-89dc-5584cb0c47f7
2021-01-11 15:11:02,161 [main] WARN  org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
grunt>

注意：MapReduce模式必需启动Hadoop

这两个命令都会为您提供Grunt shell提示符，如下所示。

grunt>

您可以使用’ctrl + d’退出Grunt shell。

调用Grunt shell之后，您可以通过直接在其中输入Pig Latin语句来执行Pig脚本。

grunt> customers = LOAD 'customers.txt' USING PigStorage(',');

以批处理模式执行Apache Pig

您可以在文件中编写整个Pig Latin脚本，然后使用–x命令执行它。我们假设在名为sample_script.pig的文件中有一个Pig脚本，如下所示。
脚本文件 Sample_script.pig

student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
   PigStorage(',') as (id:int,name:chararray,city:chararray);
Dump student;

现在，您可以在上述文件中执行脚本，如下所示。

本地模式 $ pig -x local Sample_script.pig
mapreduce 模式 $ pig -x mapreduce Sample_script.pig

注意：我们将在后续章节中详细讨论如何在Bach模式和嵌入式模式下运行Pig脚本。

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

Pig 执行

Apache Pig执行模式

本地模式

MapReduce模式

Apache Pig执行机制

调用Grunt Shell

本地模式

MapReduce 模式

以批处理模式执行Apache Pig

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。