无法从 apache Pig 中的地图中提取值

发布于 2024-12-26 03:02:49 字数 4079 浏览 4 评论 0原文

我在 Apache Pig 中有一个简单的关系 v

dump v;

(151364,[ 'ref'#'R813','highway'#'secondary', 'name:ga'#'Lána Chairdif', 'name'#'Cardiff Lane'],(31015271, 31053762))
(151368,[ 'ref'#'N1', 'oneway'#'yes','designation'#'Buses Only', 'highway'#'trunk', 'motor_vehicle'#'designated', 'name:ga'#'Cearnóg Pharnell Thoir', 'maxspeed'#'30', 'name'#'Parnell Square East'],(389365, 540403072))
(151596,[ 'name:en'#'Liffey', 'boundary'#'administrative', 'name:ga'#'An Life','admin_level'#'8', 'name'#'Liffey', 'waterway'#'river'],(1347749, 1426049020, 1347745, 1426049019, 1347742, 900075612))
(367947,[ 'maxspeed'#'80', 'ref'#'L2223','highway'#'tertiary'],(13259933, 2384217, 335978958))
(367952,['created_by'#'YahooApplet 1.0', 'name'#'Charnwood Avenue', 'highway'#'residential'],(2384386, 25963471, 14949594, 2384385, 6146344, 2384254))
(508603,[ 'ref'#'L3018','highway'#'tertiary', 'maxspeed'#'50', 'name'#'Shelerin Road'],(2854184, 2854168, 335978984, 2853307, 2384254, 335978978, 335978975, 2655735, 2655703, 392675957, 11676198, 920037194, 244531387, 2655952, 11675077))
(727153,[ 'ref'#'N8','highway'#'trunk', 'name'#'Merchants' Quay'],(354153, 453344873))
(727157,['highway'#'unclassified', 'oneway'#'yes', 'maxspeed'#'30', 'name'#'Kyle Street'],(354168, 354167))
(727159,['highway'#'unclassified', 'oneway'#'yes', 'maxspeed'#'30', 'name'#'North Main Street'],(354178, 465226768, 354167, 413995429, 72219131, 685537307, 1232381779, 354164))
(727161,[ 'maxspeed'#'30','highway'#'pedestrian', 'name'#'Maylor Street'],(1486492976, 1515360721, 1515360722, 1515345383, 1515344226, 1515344227, 1515344228, 1515344231))

根据 @orangeoctopus 的建议,我尝试使用键名称中的任何 ' 重新生成数据,并且我拥有此数据:

(151364,[ ref#'R813', name:ga#'Lána Chairdif', name#'Cardiff Lane',highway#'secondary'],(31015271, 31053762))
(151368,[ motor_vehicle#'designated', name#'Parnell Square East', highway#'trunk', oneway#'yes',designation#'Buses Only', maxspeed#'30', name:ga#'Cearnóg Pharnell Thoir', ref#'N1'],(389365, 540403072))
(151596,[ name:en#'Liffey', boundary#'administrative', waterway#'river', name:ga#'An Life',admin_level#'8', name#'Liffey'],(1347749, 1426049020, 1347745, 1426049019, 1347742, 900075612))
(367947,[highway#'tertiary', maxspeed#'80', ref#'L2223'],(13259933, 2384217, 335978958))
(367952,[ name#'Charnwood Avenue',created_by#'YahooApplet 1.0', highway#'residential'],(2384386, 25963471, 14949594, 2384385, 6146344, 2384254))
(508603,[ maxspeed#'50', ref#'L3018', name#'Shelerin Road',highway#'tertiary'],(2854184, 2854168, 335978984, 2853307, 2384254, 335978978, 335978975, 2655735, 2655703, 392675957, 11676198, 920037194, 244531387, 2655952, 11675077))
(727153,[highway#'trunk', name#'Merchants' Quay', ref#'N8'],(354153, 453344873))
(727157,[ oneway#'yes', maxspeed#'30', name#'Kyle Street',highway#'unclassified'],(354168, 354167))
(727159,[ oneway#'yes', maxspeed#'30', name#'North Main Street',highway#'unclassified' (354178, 465226768, 354167, 413995429, 72219131, 685537307, 1232381779, 354164))
(727161,[highway#'pedestrian', name#'Maylor Street', maxspeed#'30'],(1486492976, 1515360721, 1515360722, 1515345383, 1515344226, 1515344227, 1515344228, 1515344231))

在这两种情况下 v 都有相同的模式/结构:

grunt> describe v;
2012-01-09 22:55:34,271 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
v: {id: int,tags: map[ ],nodes: (null)}

然后我尝试从 tags 映射中仅提取一个值:

grunt> w = foreach v generate tags#'ref';    
dump w;

但它只给我空数据,即使某些元素在这里有数据。

()
()
()
()
()
()
()
()
()
()

使用旧的“引用”键,我尝试了(按照@orangeoctopus'解决方案)

w = foreach v generate tags#'\'ref\''; 

,这给了我相同的“空”数据,但不起作用。 (我还尝试了 '" 的其他组合,例如 "'ref'"/'"ref"'/etc.,但除 '\'ref\'' 之外的所有内容都是无效的 Pig Latin 语法)

如果我尝试根据标签值进行过滤(例如 filter v by Tags#'highway' != ''),我明白了什么都没有,这与上述无法从地图中提取数据的问题一致,我做错了什么吗?

I have a simple relation, v, in Apache Pig:

dump v;

(151364,[ 'ref'#'R813','highway'#'secondary', 'name:ga'#'Lána Chairdif', 'name'#'Cardiff Lane'],(31015271, 31053762))
(151368,[ 'ref'#'N1', 'oneway'#'yes','designation'#'Buses Only', 'highway'#'trunk', 'motor_vehicle'#'designated', 'name:ga'#'Cearnóg Pharnell Thoir', 'maxspeed'#'30', 'name'#'Parnell Square East'],(389365, 540403072))
(151596,[ 'name:en'#'Liffey', 'boundary'#'administrative', 'name:ga'#'An Life','admin_level'#'8', 'name'#'Liffey', 'waterway'#'river'],(1347749, 1426049020, 1347745, 1426049019, 1347742, 900075612))
(367947,[ 'maxspeed'#'80', 'ref'#'L2223','highway'#'tertiary'],(13259933, 2384217, 335978958))
(367952,['created_by'#'YahooApplet 1.0', 'name'#'Charnwood Avenue', 'highway'#'residential'],(2384386, 25963471, 14949594, 2384385, 6146344, 2384254))
(508603,[ 'ref'#'L3018','highway'#'tertiary', 'maxspeed'#'50', 'name'#'Shelerin Road'],(2854184, 2854168, 335978984, 2853307, 2384254, 335978978, 335978975, 2655735, 2655703, 392675957, 11676198, 920037194, 244531387, 2655952, 11675077))
(727153,[ 'ref'#'N8','highway'#'trunk', 'name'#'Merchants' Quay'],(354153, 453344873))
(727157,['highway'#'unclassified', 'oneway'#'yes', 'maxspeed'#'30', 'name'#'Kyle Street'],(354168, 354167))
(727159,['highway'#'unclassified', 'oneway'#'yes', 'maxspeed'#'30', 'name'#'North Main Street'],(354178, 465226768, 354167, 413995429, 72219131, 685537307, 1232381779, 354164))
(727161,[ 'maxspeed'#'30','highway'#'pedestrian', 'name'#'Maylor Street'],(1486492976, 1515360721, 1515360722, 1515345383, 1515344226, 1515344227, 1515344228, 1515344231))

On @orangeoctopus's advice, I have tried regenerating my data with any ' in the key names, and I have this data:

(151364,[ ref#'R813', name:ga#'Lána Chairdif', name#'Cardiff Lane',highway#'secondary'],(31015271, 31053762))
(151368,[ motor_vehicle#'designated', name#'Parnell Square East', highway#'trunk', oneway#'yes',designation#'Buses Only', maxspeed#'30', name:ga#'Cearnóg Pharnell Thoir', ref#'N1'],(389365, 540403072))
(151596,[ name:en#'Liffey', boundary#'administrative', waterway#'river', name:ga#'An Life',admin_level#'8', name#'Liffey'],(1347749, 1426049020, 1347745, 1426049019, 1347742, 900075612))
(367947,[highway#'tertiary', maxspeed#'80', ref#'L2223'],(13259933, 2384217, 335978958))
(367952,[ name#'Charnwood Avenue',created_by#'YahooApplet 1.0', highway#'residential'],(2384386, 25963471, 14949594, 2384385, 6146344, 2384254))
(508603,[ maxspeed#'50', ref#'L3018', name#'Shelerin Road',highway#'tertiary'],(2854184, 2854168, 335978984, 2853307, 2384254, 335978978, 335978975, 2655735, 2655703, 392675957, 11676198, 920037194, 244531387, 2655952, 11675077))
(727153,[highway#'trunk', name#'Merchants' Quay', ref#'N8'],(354153, 453344873))
(727157,[ oneway#'yes', maxspeed#'30', name#'Kyle Street',highway#'unclassified'],(354168, 354167))
(727159,[ oneway#'yes', maxspeed#'30', name#'North Main Street',highway#'unclassified' (354178, 465226768, 354167, 413995429, 72219131, 685537307, 1232381779, 354164))
(727161,[highway#'pedestrian', name#'Maylor Street', maxspeed#'30'],(1486492976, 1515360721, 1515360722, 1515345383, 1515344226, 1515344227, 1515344228, 1515344231))

In both cases v has the same schema/structure:

grunt> describe v;
2012-01-09 22:55:34,271 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
v: {id: int,tags: map[ ],nodes: (null)}

Then I try to extract out just one value from the tags map:

grunt> w = foreach v generate tags#'ref';    
dump w;

But it only gives me empty data, even though some elements have data here.

()
()
()
()
()
()
()
()
()
()

With the old 'quoted' keys I tried (as per @orangeoctopus' solution)

w = foreach v generate tags#'\'ref\''; 

And that gave me the same 'empty' data, and didn't work. (I also tried other combinations of ' and ", like "'ref'"/'"ref"'/etc. but all except '\'ref\'' were invalid pig latin syntax)

What's going on? If i try to filter based on the tag value, (e.g. filter v by tags#'highway' != ''), I get nothing, which is consistant with this above problem of not being able to extract data from the map, am I doing something wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

千と千尋 2025-01-02 03:02:49

非常棘手!

您的问题是您的文字数据包含单引号。您的字符串不是 ref (3 个字符长),而是 'ref' (5 个字符长)。我意识到这一点是因为包含字符串的映射的转储通常没有引号。

因此,您需要键入包括这些引号的键(您必须使用 \ 转义它们):

grunt> w = foreach v generate tags#'\'ref\'';    

您的另一个选择是更改数据加载的方式,使其不包含单引号在字符串本身中,并将它们剥离出来。 PigStorage 不会免费执行此操作,但您可以使用类似 REPLACE 或您自己的 UDF 来执行此操作。

Very tricky!

Your problem is that your literal data includes single quotes. Your string is not ref (3 characters long), it is 'ref' (5 characters long). I realized this because the dump of a map containing strings does not typically have the quotes there.

Therefore, you need to be keying including those quotes (you have to escape them with \):

grunt> w = foreach v generate tags#'\'ref\'';    

Your other option would be to change the way your data is being loaded so it doesn't include the single quotes in the strings themselves, and strips them out. PigStorage doesn't do this for free, but you could use something like REPLACE or your own UDF to do this.

这个俗人 2025-01-02 03:02:49

您是否也正确加载数据?奇怪的是,当您转储地图时,[ 后面和 ] 之前有一个空格。

此外,删除 输入数据。例如:

输入文件

151364  [ref#R813,highway#secondary]

Pig

a = LOAD 'data.txt' AS (id:INT, m:MAP[]);
DUMP a;
b = FOREACH a GENERATE m#'ref';
DUMP b;

输出

(151364,[highway#secondary,ref#R813])

(R813)

Are you loading the data correctly too? It is weird that there is a space after the [ and before the ] when you dump your map.

Also it is more simple to drop all the quotes in the key and value in the input data. For example:

Input file

151364  [ref#R813,highway#secondary]

Pig

a = LOAD 'data.txt' AS (id:INT, m:MAP[]);
DUMP a;
b = FOREACH a GENERATE m#'ref';
DUMP b;

Output

(151364,[highway#secondary,ref#R813])

(R813)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文