如何修剪前导和尾随空白?
我在处理 data.frame 中的前导和尾随空格时遇到一些问题。
例如,我根据特定条件查看 data.frame
中的特定行
:
> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)]
[1] codeHelper country dummyLI dummyLMI dummyUMI
[6] dummyHInonOECD dummyHIOECD dummyOECD
<0 rows> (or 0-length row.names)
我想知道为什么自奥地利国家以来我没有获得预期输出显然存在于我的 data.frame
中。在查看了我的代码历史记录并试图找出问题所在后,我尝试了:
> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]
codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD
18 AUT Austria 0 0 0 0 1
dummyOECD
18 1
我在命令中所做的所有更改都是在奥地利之后增加了一个空格。
显然还会出现更多恼人的问题。例如,当我喜欢根据国家/地区列合并两个框架时。一个 data.frame
使用 “Austria”
,而另一个框架则使用 “Austria”
。匹配不起作用。
- 有没有一种好方法可以“显示”屏幕上的空白,以便我意识到问题所在?
- 我可以删除 R 中的前导和尾随空格吗?
到目前为止,我曾经编写过一个简单的 Perl 脚本,它消除了白人的步伐,但这会很好如果我能以某种方式在 R 中做到这一点。
I am having some trouble with leading and trailing white space in a data.frame.
For example, I look at a specific row
in a data.frame
based on a certain condition:
> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)]
[1] codeHelper country dummyLI dummyLMI dummyUMI
[6] dummyHInonOECD dummyHIOECD dummyOECD
<0 rows> (or 0-length row.names)
I was wondering why I didn't get the expected output since the country Austria obviously existed in my data.frame
. After looking through my code history and trying to figure out what went wrong I tried:
> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]
codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD
18 AUT Austria 0 0 0 0 1
dummyOECD
18 1
All I have changed in the command is an additional white space after Austria.
Further annoying problems obviously arise. For example, when I like to merge two frames based on the country column. One data.frame
uses "Austria "
while the other frame has "Austria"
. The matching doesn't work.
- Is there a nice way to 'show' the white space on my screen so that I am aware of the problem?
- And can I remove the leading and trailing white space in R?
So far I used to write a simple Perl script which removes the whites pace, but it would be nice if I can somehow do it inside R.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
从 R 3.2.0 开始,引入了一个新函数来删除前导/尾随空格:
请参阅:删除前导/尾随空格
As of R 3.2.0 a new function was introduced for removing leading/trailing white spaces:
See: Remove Leading/Trailing Whitespace
最好的方法可能是在读取数据文件时处理尾随空格。如果您使用
read.csv
或read.table
,您可以设置参数strip.white=TRUE
。如果您想随后清理字符串,可以使用以下函数之一:
在
myDummy$country
上使用以下函数之一:“显示”您可以使用的空格:
这将显示字符串用引号 (") 括起来,使空格更容易被发现。
Probably the best way is to handle the trailing white spaces when you read your data file. If you use
read.csv
orread.table
you can set the parameterstrip.white=TRUE
.If you want to clean strings afterwards you could use one of these functions:
To use one of these functions on
myDummy$country
:To 'show' the white space you could use:
which will show you the strings surrounded by quotation marks (") making white spaces easier to spot.
要操作空白,请使用 stringr 包中的 str_trim() 。
该软件包的手册日期为 2013 年 2 月 15 日,位于 CRAN 中。
该函数还可以处理字符串向量。
(感谢评论者:R. Cotton)
To manipulate the white space, use str_trim() in the stringr package.
The package has manual dated Feb 15, 2013 and is in CRAN.
The function can also handle string vectors.
(Credit goes to commenter: R. Cotton)
一个简单的函数,用于删除前导和尾随空格:
用法:
A simple function to remove leading and trailing whitespace:
Usage:
广告 1) 要查看空格,您可以使用修改后的参数直接调用
print.data.frame
:另请参阅
?print.data.frame
了解其他选项。Ad 1) To see white spaces you could directly call
print.data.frame
with modified arguments:See also
?print.data.frame
for other options.使用 grep< /em> 或grepl 查找带有空格和 sub 来摆脱它们。
Use grep or grepl to find observations with white spaces and sub to get rid of them.
删除前导和尾随空白也可以通过 gdata 包中的 trim() 函数来实现:
使用示例:
我更愿意将答案作为注释添加到 user56 的,但我还不能作为独立的人编写回答。
Removing leading and trailing blanks might be achieved through the trim() function from the gdata package as well:
Usage example:
I'd prefer to add the answer as comment to user56's, but I am yet unable so writing as an independent answer.
另一种选择是使用
stringi
包中的stri_trim
函数,该函数默认删除前导空格和尾随空格:如果仅删除前导空格,请使用
stri_trim_left
。如果仅删除尾随空格,请使用stri_trim_right
。当您想要删除其他前导或尾随字符时,必须使用pattern =
进行指定。另请参阅
?stri_trim
了解更多信息。Another option is to use the
stri_trim
function from thestringi
package which defaults to removing leading and trailing whitespace:For only removing leading whitespace, use
stri_trim_left
. For only removing trailing whitespace, usestri_trim_right
. When you want to remove other leading or trailing characters, you have to specify that withpattern =
.See also
?stri_trim
for more info.我创建了一个
trim.strings ()
函数来修剪前导和/或尾随空白,如下所示:
I created a
trim.strings ()
function to trim leading and/or trailing whitespace as:For illustration,
如果输入之间有多个空格,则会出现另一个相关问题:
然后,您可以使用
split
参数的正则表达式轻松将此字符串拆分为“真实”标记:请注意,如果在(非空)字符串的开头,输出的第一个元素是 '""',但如果字符串末尾有匹配项,则输出与删除匹配项相同。
Another related problem occurs if you have multiple spaces in between inputs:
You can then easily split this string into "real" tokens using a regular expression to the
split
argument:Note that if there is a match at the beginning of a (non-empty) string, the first element of the output is ‘""’, but if there is a match at the end of the string, the output is the same as with the match removed.
使用 dplyr/tidyverse
mutate_all
和str_trim
来修剪整个数据框:由 reprex 包 (v0.3.0)
Use dplyr/tidyverse
mutate_all
withstr_trim
to trim the entire data frame:Created on 2021-05-07 by the reprex package (v0.3.0)
最好的方法是 trimws()。
以下代码将将此函数应用于整个数据帧。
The best method is trimws().
The following code will apply this function to the entire dataframe.
此后,您需要强制 R 不将“Austria”识别为关卡。假设您还有
"USA"
和"Spain"
作为级别:它比得票最高的响应稍微不那么令人生畏,但它应该仍然有效。
After this, you'll need to force R not to recognize
"Austria "
as a level. Let's pretend you also have"USA"
and"Spain"
as levels:It is a little less intimidating than the highest voted response, but it should still work.
本线程中主要方法的基准测试。这并没有捕获所有奇怪的情况,但到目前为止我们仍然缺少
str_trim
删除空格而trimws
不删除的示例(请参阅理查德·特尔福德对此答案的评论)。似乎并不重要 - gsub 选项似乎是最快的:)Benchmarking of the main approaches in this thread. This is not capturing all weird cases, but so far we are still lacking the example where
str_trim
removes whitespace andtrimws
doesn't (see Richard Telford's comment to this answer). Doesn't seem to matter - the gsub option seems to be fastest :)我试过修剪()。它适用于空格和“\n”。
I tried trim(). It works well with white spaces as well as the '\n'.