基于R中的定义函数在数据框中订购条目
我有一个dataframe包含这样的概率值,
# Spot 1 Spot 2
# 1 0.140 0.12
# 2 0.220 0.50
# 3 0.154 0.40
# 4 0.300 0.12
# 5 0.220 0.60
# 6 0.400 0.23
# 7 0.550 0.40
# 8 0.600 0.56
例如,rownames实际上是单元格名称,例如1表示单元格类型1等。我可以函数来分类是否存在单元格中的单元格类型。如果概率值> =阈值t,则存在单元格类型。因此,我发挥了一个函数来识别某个位置或不这样的单元格类型,
solution_matrix <- function(probability_matrix, threshold, n=0) {
#parameter n is the number of cell type ranking we want. For example, n=3 means we choose the highest ranking of 3 cell types to be present in a spot
probability_matrix <- probability_matrix
for (i in 1:nrow(probability_matrix)) {
for (j in 1:ncol(probability_matrix)) {
if (probability_matrix[i, j] >= threshold) {
probability_matrix[i, j] <- probability_matrix[i, j]
} else {
probability_matrix[i, j] <- 0
}
}
}
probability_matrix <- probability_matrix >= threshold
solution_matrix <- probability_matrix
for (i in 1:nrow(solution_matrix)) {
for (j in 1:ncol(solution_matrix)) {
if (solution_matrix[i, j] == TRUE) {
solution_matrix[i, j] <- i - 1 #We minus 1 since the cluster numbers start from 0 to 9 (there are 10 clusters)
} else {
solution_matrix[i, j] <- NA
}
}
}
for (j in 1:ncol(solution_matrix)) {
solution_matrix[, j] <- sort(as.numeric(solution_matrix[, j]), na.last=TRUE) #This is to make the NA entries to be filled after the number entries
}
solution_matrix <- solution_matrix
colnames(solution_matrix) <- gsub("\\.", "-", colnames(solution_matrix)) #We need to rename the spot IDs in estimated_solution from using . to - to match it to true_solution obtained from the synthetic_data
return(list(probability_matrix, solution_matrix))
}#solution_matrix#
solution_matrix(mydata, 0.2, 3)
它将像以下那样导致输出,
[[1]]
Spot 1 Spot 2
1 FALSE FALSE
2 TRUE TRUE
3 FALSE TRUE
4 TRUE FALSE
5 TRUE TRUE
6 TRUE TRUE
7 TRUE TRUE
8 TRUE TRUE
[[2]]
Spot 1 Spot 2
1 1 1
2 3 2
3 4 4
4 5 5
5 6 6
6 7 7
7 NA NA
8 NA NA
我想根据其概率值订购现场中存在的单元格类型(从高值中应为等级1 ,等级2,依此类推)。但是,我不知道如何对这些单元格进行排名,同时记录哪种细胞类型在哪个位置。在此输出中,我不知道哪种细胞类型在某个位置中具有更强的存在。变量n应包含在功能中,以显示我们想要多少最佳单元格类型。例如,在此输出中,我们在SPOT 1中有6种单元格类型。如果我设置n = 3,那么我想要的就是只有最佳的3个单元格类型,而不是6种。请有什么想法?
数据
mydata <- structure(list(`Spot 1` = c(0.14, 0.22, 0.154, 0.3, 0.22, 0.4,
0.55, 0.6), `Spot 2` = c(0.12, 0.5, 0.4, 0.12, 0.6, 0.23, 0.4,
0.56)), class = "data.frame", row.names = c(NA, 8L))
I have a dataframe contains probability values like this
# Spot 1 Spot 2
# 1 0.140 0.12
# 2 0.220 0.50
# 3 0.154 0.40
# 4 0.300 0.12
# 5 0.220 0.60
# 6 0.400 0.23
# 7 0.550 0.40
# 8 0.600 0.56
The rownames are actually cell names, for example, 1 means cell type 1, etc. I make a function to categorise whether the cell type is present in a spot or not. The cell type is present if the probability value >= threshold t. So, I make a function to identify the cell type present in a spot or not like this
solution_matrix <- function(probability_matrix, threshold, n=0) {
#parameter n is the number of cell type ranking we want. For example, n=3 means we choose the highest ranking of 3 cell types to be present in a spot
probability_matrix <- probability_matrix
for (i in 1:nrow(probability_matrix)) {
for (j in 1:ncol(probability_matrix)) {
if (probability_matrix[i, j] >= threshold) {
probability_matrix[i, j] <- probability_matrix[i, j]
} else {
probability_matrix[i, j] <- 0
}
}
}
probability_matrix <- probability_matrix >= threshold
solution_matrix <- probability_matrix
for (i in 1:nrow(solution_matrix)) {
for (j in 1:ncol(solution_matrix)) {
if (solution_matrix[i, j] == TRUE) {
solution_matrix[i, j] <- i - 1 #We minus 1 since the cluster numbers start from 0 to 9 (there are 10 clusters)
} else {
solution_matrix[i, j] <- NA
}
}
}
for (j in 1:ncol(solution_matrix)) {
solution_matrix[, j] <- sort(as.numeric(solution_matrix[, j]), na.last=TRUE) #This is to make the NA entries to be filled after the number entries
}
solution_matrix <- solution_matrix
colnames(solution_matrix) <- gsub("\\.", "-", colnames(solution_matrix)) #We need to rename the spot IDs in estimated_solution from using . to - to match it to true_solution obtained from the synthetic_data
return(list(probability_matrix, solution_matrix))
}#solution_matrix#
solution_matrix(mydata, 0.2, 3)
It will result output like this
[[1]]
Spot 1 Spot 2
1 FALSE FALSE
2 TRUE TRUE
3 FALSE TRUE
4 TRUE FALSE
5 TRUE TRUE
6 TRUE TRUE
7 TRUE TRUE
8 TRUE TRUE
[[2]]
Spot 1 Spot 2
1 1 1
2 3 2
3 4 4
4 5 5
5 6 6
6 7 7
7 NA NA
8 NA NA
I want to order the cell types present in the spot based on their probability values (from the high values should be rank 1, rank 2, and so on). However, I do not know how to rank those cell types while keeping a record of what cell type is present in what spot. In this output, I have no idea which cell types have a stronger presence in a spot. The variable n should be included in the function to show how many best cell types presences we want. For example, in this output, we have 6 cell types in spot 1. If I set n = 3 then what I want is to have only the best 3 cell types, instead of 6. Any idea, please?
data
mydata <- structure(list(`Spot 1` = c(0.14, 0.22, 0.154, 0.3, 0.22, 0.4,
0.55, 0.6), `Spot 2` = c(0.12, 0.5, 0.4, 0.12, 0.6, 0.23, 0.4,
0.56)), class = "data.frame", row.names = c(NA, 8L))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论