Pyspark UDF检测“演员”

发布于 2025-01-29 17:52:00 字数 1054 浏览 3 评论 0原文

我有一个矩阵(dataframe),我想在其中找到所有行,列和列与“ 1”相交。 (“字符”行值匹配列名)

示例。山姆是演员。 (他在“演员”列中有一个“ 1”,而行是“演员”的“字符”值。)这将是我想要返回的行。

df = spark.createDataFrame(
    [
        ("actor", "sam", "1", "0", "0", "0", "0"),  
        ("villan", "jack", "0", "0", "0", "0", "0"),
        ("actress", "rose", "0", "0", "0", "1", "0"),
        ("comedian", "mike", "0", "1", "1", "0", "1"),
        ("musician", "young", "1", "1", "1", "1", "0")
    ],
    ["character", "name", "actor", "villan", "comedian", "actress", "musician"]  
)
+---------+-----+-----+------+--------+-------+--------+
|character| name|actor|villan|comedian|actress|musician|
+---------+-----+-----+------+--------+-------+--------+
|    actor|  sam|    1|     0|       0|      0|       0|
|   villan| jack|    0|     0|       0|      0|       0|
|  actress| rose|    0|     0|       0|      1|       0|
| comedian| mike|    0|     1|       1|      0|       1|
| musician|young|    1|     1|       1|      1|       0|
+---------+-----+-----+------+--------+-------+--------+

I have a matrix(dataframe) I want to find all the rows there the row and columns intersect with a '1'. (The 'Character' row value matches the column name)

Example. Sam is an actor. (He has a '1' in the column 'actor' and the row the 'character' value of 'actor'.) This would be a row I'm would want returned.

df = spark.createDataFrame(
    [
        ("actor", "sam", "1", "0", "0", "0", "0"),  
        ("villan", "jack", "0", "0", "0", "0", "0"),
        ("actress", "rose", "0", "0", "0", "1", "0"),
        ("comedian", "mike", "0", "1", "1", "0", "1"),
        ("musician", "young", "1", "1", "1", "1", "0")
    ],
    ["character", "name", "actor", "villan", "comedian", "actress", "musician"]  
)
+---------+-----+-----+------+--------+-------+--------+
|character| name|actor|villan|comedian|actress|musician|
+---------+-----+-----+------+--------+-------+--------+
|    actor|  sam|    1|     0|       0|      0|       0|
|   villan| jack|    0|     0|       0|      0|       0|
|  actress| rose|    0|     0|       0|      1|       0|
| comedian| mike|    0|     1|       1|      0|       1|
| musician|young|    1|     1|       1|      1|       0|
+---------+-----+-----+------+--------+-------+--------+

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

如何视而不见 2025-02-05 17:52:00
#create function
def myMatch( needle, haystack ):
  return haystack[needle]

#create udf
matched = udf(myMatch, StringType()) # your existing data is strings

#apply udf
df.select(\
  df.name ,\ 
  matched( \
    df.character, \
    f.struct( *[df[col] for col in df.columns] ) )\ # shortcut to add all columns to a struct so it can be passed to udf
  .alias("IsPlayingCharacter") )\
.show()
#create function
def myMatch( needle, haystack ):
  return haystack[needle]

#create udf
matched = udf(myMatch, StringType()) # your existing data is strings

#apply udf
df.select(\
  df.name ,\ 
  matched( \
    df.character, \
    f.struct( *[df[col] for col in df.columns] ) )\ # shortcut to add all columns to a struct so it can be passed to udf
  .alias("IsPlayingCharacter") )\
.show()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文