如何删除空间和点并转换为小写
我有一个pyspark数据框架,其中
N. Plainfield
North Plainfield
West Home Land
NEWYORK
newyork
So. Plainfield
S. Plaindield
一些名称包含姓名缩写之间的点和空格,而有些则没有。如何将它们转换为:(
n Plainfield
north plainfield
west homeland
newyork
newyork
so plainfield
s plainfield
在缩写和名称之间没有点和1个空间之间没有点和空间)
我尝试使用以下内容,但它只能替换点,而不会在缩写之间删除空格:
names_modified = names.withColumn("name_clean", regexp_replace("name", r"\.",""))
删除了whitespaces和dots之后有任何方式获得不同的值。 像这样。
north plainfield
west homeland
newyork
so plainfield
I have a pyspark dataframe with names like
N. Plainfield
North Plainfield
West Home Land
NEWYORK
newyork
So. Plainfield
S. Plaindield
Some of them contain dots and spaces between initials, and some do not. How can they be converted to:
n Plainfield
north plainfield
west homeland
newyork
newyork
so plainfield
s plainfield
(with no dots and spaces between initials and 1 space between initials and name)
I tried using the following, but it only replaces dots and doesn't remove spaces between initials:
names_modified = names.withColumn("name_clean", regexp_replace("name", r"\.",""))
After removing the whitespaces and dots is there any way get the distinct values.
Like this.
north plainfield
west homeland
newyork
so plainfield
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您应该分开步骤。
从大写到小写
使用Regex_replace函数替换点
I think you should divide the step.
from uppercase to lowercase
replace dot using regex_replace function