如何处理 pyspark 中的转义字符。尝试用 NULL 替换转义字符

发布于 2025-01-12 00:58:43 字数 540 浏览 1 评论 0原文

我正在尝试用 pyspark 数据框中的 NULL 替换转义字符。 Dataframe 中的数据看起来像下面的

Col1|Col2|Col3 
1|\026\026|026|abcd026efg.

Col2 是垃圾数据并尝试用 NULL 替换。尝试使用replace和regex_replace函数将'\026'替换为Null值，由于转义字符（“\”），数据没有替换为NULL值。

 replace(col2, "026",  'abcd') 
 replace(Col2, "\026",  'abcd')

最后，

我希望我的数据能够

Col1|Col2|Col3 
1|NULL|026|abcd026efg.

高度赞赏解决这种情况的想法。

谢谢 -EVR

在此处输入图像描述

原文

I'm trying to replace a escape character with NULL in pyspark dataframe. Data in dataframe looks like below

Col1|Col2|Col3 
1|\026\026|026|abcd026efg.

Col2 is a garbage data and trying to replace with NULL. Tried replace and regex_replace functions to replace '\026' with Null value, because of escape character (" \ "), data is not replaced with NULL value.

 replace(col2, "026",  'abcd') 
 replace(Col2, "\026",  'abcd')

Finally,

I want my data as

Col1|Col2|Col3 
1|NULL|026|abcd026efg.

Highly appreciate for thoughts to resolve this scenario.

Thanks
-EVR

enter image description here

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

哆兒滾 2025-01-19 00:58:43

使用替换所有数字和前面的非数字

 import pyspark.sql.functions as F
 df.withColumn('col2',F.regexp_replace('col2','\D\d+',None)).show()

+----+----+-----------+
|col1|col2|       col3|
+----+----+-----------+
|   1|null|abcd026efg.|
+----+----+-----------+

Use replace all digits and preceding non digits

 import pyspark.sql.functions as F
 df.withColumn('col2',F.regexp_replace('col2','\D\d+',None)).show()

+----+----+-----------+
|col1|col2|       col3|
+----+----+-----------+
|   1|null|abcd026efg.|
+----+----+-----------+

回复收藏 0 原文

~没有更多了~