为什么使用Spark Regexp_replace()时每个DF值都会改变?

发布于 2025-02-12 13:29:16 字数 431 浏览 3 评论 0原文

我想在pyspark中使用regexp_replace()将数据框架中的所有问号和后斜线转换为null值。这是我使用的代码:

question = "?"
empty_str = "\\\"\\\""

for column in df.columns:
     df = df.withColumn(column, regexp_replace(column, question, None)
     df = df.withColumn(column, regexp_replace(column, empty_str, None)

但是,当我使用此代码 all 时,我的数据框中的值会变成零值 - 不仅是问号和后斜线。有没有办法更改我的代码来解决此问题?

I want to use regexp_replace() in PySpark to convert all question marks and back slashes in my data frame to null values. This is the code I used:

question = "?"
empty_str = "\\\"\\\""

for column in df.columns:
     df = df.withColumn(column, regexp_replace(column, question, None)
     df = df.withColumn(column, regexp_replace(column, empty_str, None)

However, when I use this code all the values in my dataframe turn into null values - not just the question marks and back slashes. Is there a way I can change my code to fix this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

过气美图社 2025-02-19 13:29:16

使用Regexp_replace您不能将值替换为null,您将需要另一种方法,例如 替换

from pyspark.sql import functions as F
df = spark.createDataFrame([("?",), ("\\",), ("b",)], ["col_name"])
df.show()
# +--------+
# |col_name|
# +--------+
# |       ?|
# |       \|
# |       b|
# +--------+

pattern = r"^[?\\]+$"
df = df.withColumn("col_name", F.regexp_replace("col_name", pattern, "")) \
       .replace("", None, "col_name")
df.show()
# +--------+
# |col_name|
# +--------+
# |    null|
# |    null|
# |       b|
# +--------+

in您的尝试,每个值都更改为null,因为您错误地向替换参数提供了错误,而不是str。根据文档

pyspark.sql.functions.regexp_replace( str:columnorname,模式:str,替换:str )→pyspark.sql.column.column.column.column.column.column

With regexp_replace you cannot replace values to null, you will need another method, e.g. replace

from pyspark.sql import functions as F
df = spark.createDataFrame([("?",), ("\\",), ("b",)], ["col_name"])
df.show()
# +--------+
# |col_name|
# +--------+
# |       ?|
# |       \|
# |       b|
# +--------+

pattern = r"^[?\\]+
quot;
df = df.withColumn("col_name", F.regexp_replace("col_name", pattern, "")) \
       .replace("", None, "col_name")
df.show()
# +--------+
# |col_name|
# +--------+
# |    null|
# |    null|
# |       b|
# +--------+

In your attempt, every value changed to null, because you incorrectly provided None to the replacement argument, instead of str. Only str is accepted, according to the documentation.

pyspark.sql.functions.regexp_replace(str: ColumnOrName, pattern: str, replacement: str) → pyspark.sql.column.Column

请止步禁区 2025-02-19 13:29:16

它是这样工作的,ou必须使用\\用null替换backsslash和?用null替换问号

>>> df.show(truncate=False)
+-------------------------------------------------+
|_c0                                              |
+-------------------------------------------------+
|"{""id"\":""e5?2f247c-f46c-4021-bc62-e28e56db1ad8|
+-------------------------------------------------+

>>> df.withColumn("_c0",regexp_replace('_c0','\\\\','')).withColumn("_c0",regexp_replace('_c0','\?','')).show(truncate=False)
+-----------------------------------------------+
|_c0                                            |
+-----------------------------------------------+
|"{""id"":""e52f247c-f46c-4021-bc62-e28e56db1ad8|
+-----------------------------------------------+

it is working like this, ou have to use \\ to replace backslash with null and ? to replace Question mark with null

>>> df.show(truncate=False)
+-------------------------------------------------+
|_c0                                              |
+-------------------------------------------------+
|"{""id"\":""e5?2f247c-f46c-4021-bc62-e28e56db1ad8|
+-------------------------------------------------+

>>> df.withColumn("_c0",regexp_replace('_c0','\\\\','')).withColumn("_c0",regexp_replace('_c0','\?','')).show(truncate=False)
+-----------------------------------------------+
|_c0                                            |
+-----------------------------------------------+
|"{""id"":""e52f247c-f46c-4021-bc62-e28e56db1ad8|
+-----------------------------------------------+
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文