Pyspark UDF检测“演员”

发布于 2025-01-29 17:52:00 字数 1054 浏览 3 评论 0原文

我有一个矩阵（dataframe），我想在其中找到所有行，列和列与“ 1”相交。（“字符”行值匹配列名）

示例。山姆是演员。（他在“演员”列中有一个“ 1”，而行是“演员”的“字符”值。）这将是我想要返回的行。

df = spark.createDataFrame(
    [
        ("actor", "sam", "1", "0", "0", "0", "0"),  
        ("villan", "jack", "0", "0", "0", "0", "0"),
        ("actress", "rose", "0", "0", "0", "1", "0"),
        ("comedian", "mike", "0", "1", "1", "0", "1"),
        ("musician", "young", "1", "1", "1", "1", "0")
    ],
    ["character", "name", "actor", "villan", "comedian", "actress", "musician"]  
)
+---------+-----+-----+------+--------+-------+--------+
|character| name|actor|villan|comedian|actress|musician|
+---------+-----+-----+------+--------+-------+--------+
|    actor|  sam|    1|     0|       0|      0|       0|
|   villan| jack|    0|     0|       0|      0|       0|
|  actress| rose|    0|     0|       0|      1|       0|
| comedian| mike|    0|     1|       1|      0|       1|
| musician|young|    1|     1|       1|      1|       0|
+---------+-----+-----+------+--------+-------+--------+

原文

I have a matrix(dataframe) I want to find all the rows there the row and columns intersect with a '1'. (The 'Character' row value matches the column name)

Example. Sam is an actor. (He has a '1' in the column 'actor' and the row the 'character' value of 'actor'.) This would be a row I'm would want returned.

df = spark.createDataFrame(
    [
        ("actor", "sam", "1", "0", "0", "0", "0"),  
        ("villan", "jack", "0", "0", "0", "0", "0"),
        ("actress", "rose", "0", "0", "0", "1", "0"),
        ("comedian", "mike", "0", "1", "1", "0", "1"),
        ("musician", "young", "1", "1", "1", "1", "0")
    ],
    ["character", "name", "actor", "villan", "comedian", "actress", "musician"]  
)
+---------+-----+-----+------+--------+-------+--------+
|character| name|actor|villan|comedian|actress|musician|
+---------+-----+-----+------+--------+-------+--------+
|    actor|  sam|    1|     0|       0|      0|       0|
|   villan| jack|    0|     0|       0|      0|       0|
|  actress| rose|    0|     0|       0|      1|       0|
| comedian| mike|    0|     1|       1|      0|       1|
| musician|young|    1|     1|       1|      1|       0|
+---------+-----+-----+------+--------+-------+--------+

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如何视而不见 2025-02-05 17:52:00

#create function
def myMatch( needle, haystack ):
  return haystack[needle]

#create udf
matched = udf(myMatch, StringType()) # your existing data is strings

#apply udf
df.select(\
  df.name ,\ 
  matched( \
    df.character, \
    f.struct( *[df[col] for col in df.columns] ) )\ # shortcut to add all columns to a struct so it can be passed to udf
  .alias("IsPlayingCharacter") )\
.show()

#create function
def myMatch( needle, haystack ):
  return haystack[needle]

#create udf
matched = udf(myMatch, StringType()) # your existing data is strings

#apply udf
df.select(\
  df.name ,\ 
  matched( \
    df.character, \
    f.struct( *[df[col] for col in df.columns] ) )\ # shortcut to add all columns to a struct so it can be passed to udf
  .alias("IsPlayingCharacter") )\
.show()

回复收藏 0 原文

~没有更多了~