pyspark:基于子字符串获得数组元素的索引
我有以下数据框,其中包含数组列(col1
)。我需要获取包含某个子字符串的元素索引(“ 58 =”)。
+-----------------------------------------------------------+-----+
| col1 |a_pos|
+-----------------------------------------------------------+-----+
|[8=FIX.4.4, 55=ITUBD264, 58=AID[43e39b2e-c6e2-4947] | 0|
+-----------------------------------------------------------+-----+
我尝试使用array_position(col1,“ 58 =”)
,但看来它仅适用于确切的匹配,而不是子字符串。
在Python中,我正是这样做的,但是在Pandas中,使用以下代码:
df['idx'] = [max(range(len(l)), key=lambda x: '58=' in l[x]) for l in df['col1']]
I have the following dataframe, that contains a column of arrays (col1
). I need to get the index of the element that contains a certain substring ("58=").
+-----------------------------------------------------------+-----+
| col1 |a_pos|
+-----------------------------------------------------------+-----+
|[8=FIX.4.4, 55=ITUBD264, 58=AID[43e39b2e-c6e2-4947] | 0|
+-----------------------------------------------------------+-----+
I've tried to use array_position(col1, "58=")
, but it seems it only works with the exact match and not substrings.
In Python i'm doing exactly this, but in pandas, by using the following code:
df['idx'] = [max(range(len(l)), key=lambda x: '58=' in l[x]) for l in df['col1']]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用
58
使用rlike
功能在高阶功能中检查。使用array_position
确定位置。下面的代码Check existence of
58
using therlike
function in a higher order function. Determine position usingarray_position
. Code below