R 绘制具有特定模式的字符串的频率
给定一个数据框,其中有一列包含字符串。我想绘制具有特定模式的字符串的频率。例如,
strings <- c("abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")
df <- as.data.frame(strings)
df
strings
1 abcd
2 defd
3 hfjfjcd
4 kgjgcdjrye
5 yryriiir
6 twtettec
我想绘制包含模式“cd”的字符串的频率 有人有快速解决方案吗?
Given a data frame with a column that contains strings. I would like to plot the frequency of strings that bear a certain pattern. For example
strings <- c("abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")
df <- as.data.frame(strings)
df
strings
1 abcd
2 defd
3 hfjfjcd
4 kgjgcdjrye
5 yryriiir
6 twtettec
I would like to plot the frequency of the strings that contain the pattern `"cd"
Anyone with a quick solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据您的问题,我推测您打算让一些条目多次出现,因此我添加了一个重复的字符串:
要仅查找包含特定模式的那些字符串,请使用
grep
或grepl
:要获取频率表,可以使用
table
并且可以使用
plot
或barplot
绘制它,如下所示:I presume from your question that you meant to have some entries that appear more than once, so I've added one duplicate string:
To find only those strings that contain a specific pattern, use
grep
orgrepl
:To get a table of frequencies, you can use
table
And you can plot it using
plot
orbarplot
as follows:其他人已经提到了 grep。这是一个使用 grep 来获取匹配位置的plot.密度的实现
如果您不这样做就像密度图超出范围的扩展一样,“logspline”包中还有其他方法可以让人们在范围极值处获得清晰的边界。搜索RSiteSearch
Others have already mentioned grepl. Here is an implementation with plot.density using grep to get the positions of the matches
If you don't like the extension of the density plot beyond the range there are other methods in the 'logspline' package that allow one to get sharp border at range extremes. Searching RSiteSearch
检查"Kernlab" 包。
您可以定义一个内核(模式),它可以是任何类型的字符串,并稍后对它们进行计数。
check "Kernlab" package.
You can define a kernel (pattern) which could any kind of string and count them later on.