R 绘制具有特定模式的字符串的频率

发布于 2024-11-17 00:27:47 字数 316 浏览 6 评论 0原文

给定一个数据框,其中有一列包含字符串。我想绘制具有特定模式的字符串的频率。例如,

strings  <- c("abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")
df <- as.data.frame(strings)
df
     strings
1       abcd
2       defd
3    hfjfjcd
4 kgjgcdjrye
5   yryriiir
6  twtettec

我想绘制包含模式“cd”的字符串的频率 有人有快速解决方案吗?

Given a data frame with a column that contains strings. I would like to plot the frequency of strings that bear a certain pattern. For example

strings  <- c("abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")
df <- as.data.frame(strings)
df
     strings
1       abcd
2       defd
3    hfjfjcd
4 kgjgcdjrye
5   yryriiir
6  twtettec

I would like to plot the frequency of the strings that contain the pattern `"cd"
Anyone with a quick solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

情仇皆在手 2024-11-24 00:27:47

根据您的问题,我推测您打算让一些条目多次出现,因此我添加了一个重复的字符串:

x <- c("abcd","abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")

要仅查找包含特定模式的那些字符串,请使用 grepgrepl

y <- x[grepl("cd", x)]

要获取频率表,可以使用 table

table(y)

y
      abcd    hfjfjcd kgjgcdjrye  twtettecd 
         2          1          1          1 

并且可以使用 plotbarplot 绘制它,如下所示:

barplot(table(y))

在此处输入图像描述

I presume from your question that you meant to have some entries that appear more than once, so I've added one duplicate string:

x <- c("abcd","abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")

To find only those strings that contain a specific pattern, use grep or grepl:

y <- x[grepl("cd", x)]

To get a table of frequencies, you can use table

table(y)

y
      abcd    hfjfjcd kgjgcdjrye  twtettecd 
         2          1          1          1 

And you can plot it using plot or barplot as follows:

barplot(table(y))

enter image description here

柒七 2024-11-24 00:27:47

其他人已经提到了 grep。这是一个使用 grep 来获取匹配位置的plot.密度的实现 在此处输入图像描述

plot( density(0+grepl("cd", strings)) )

如果您不这样做就像密度图超出范围的扩展一样,“logspline”包中还有其他方法可以让人们在范围极值处获得清晰的边界。搜索RSiteSearch

Others have already mentioned grepl. Here is an implementation with plot.density using grep to get the positions of the matchesenter image description here

plot( density(0+grepl("cd", strings)) )

If you don't like the extension of the density plot beyond the range there are other methods in the 'logspline' package that allow one to get sharp border at range extremes. Searching RSiteSearch

稚然 2024-11-24 00:27:47

检查"Kernlab" 包。
您可以定义一个内核(模式),它可以是任何类型的字符串,并稍后对它们进行计数。

check "Kernlab" package.
You can define a kernel (pattern) which could any kind of string and count them later on.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文