基于R中的定义函数在数据框中订购条目

发布于 2025-01-31 00:26:43 字数 2684 浏览 2 评论 0原文

我有一个dataframe包含这样的概率值，

#   Spot 1 Spot 2
# 1  0.140   0.12
# 2  0.220   0.50
# 3  0.154   0.40
# 4  0.300   0.12
# 5  0.220   0.60
# 6  0.400   0.23
# 7  0.550   0.40
# 8  0.600   0.56

例如，rownames实际上是单元格名称，例如1表示单元格类型1等。我可以函数来分类是否存在单元格中的单元格类型。如果概率值＆gt; =阈值t，则存在单元格类型。因此，我发挥了一个函数来识别某个位置或不这样的单元格类型，

solution_matrix <- function(probability_matrix, threshold, n=0) {
  #parameter n is the number of cell type ranking we want. For example, n=3 means we choose the highest ranking of 3 cell types to be present in a spot
  probability_matrix <- probability_matrix
  for (i in 1:nrow(probability_matrix)) {
    for (j in 1:ncol(probability_matrix)) {
      if (probability_matrix[i, j] >= threshold) {
        probability_matrix[i, j] <- probability_matrix[i, j]
      } else {
        probability_matrix[i, j] <- 0
      }
    }
  }
  
  probability_matrix <- probability_matrix >= threshold  
  
  solution_matrix <- probability_matrix
  for (i in 1:nrow(solution_matrix)) {
    for (j in 1:ncol(solution_matrix)) {
      if (solution_matrix[i, j] == TRUE) {
        solution_matrix[i, j] <- i - 1 #We minus 1 since the cluster numbers start from 0 to 9 (there are 10 clusters)
      } else {
        solution_matrix[i, j] <- NA
      }
    }
  }
  for (j in 1:ncol(solution_matrix)) {
    solution_matrix[, j] <- sort(as.numeric(solution_matrix[, j]), na.last=TRUE) #This is to make the NA entries to be filled after the number entries
  }
  
  solution_matrix <- solution_matrix
  colnames(solution_matrix) <- gsub("\\.", "-", colnames(solution_matrix)) #We need to rename the spot IDs in estimated_solution from using . to - to match it to true_solution obtained from the synthetic_data 
  
  return(list(probability_matrix, solution_matrix))
}#solution_matrix#
solution_matrix(mydata, 0.2, 3)

它将像以下那样导致输出，

[[1]]
Spot 1 Spot 2
1  FALSE  FALSE
2   TRUE   TRUE
3  FALSE   TRUE
4   TRUE  FALSE
5   TRUE   TRUE
6   TRUE   TRUE
7   TRUE   TRUE
8   TRUE   TRUE

[[2]]
Spot 1 Spot 2
1      1      1
2      3      2
3      4      4
4      5      5
5      6      6
6      7      7
7     NA     NA
8     NA     NA

我想根据其概率值订购现场中存在的单元格类型（从高值中应为等级1 ，等级2，依此类推）。但是，我不知道如何对这些单元格进行排名，同时记录哪种细胞类型在哪个位置。在此输出中，我不知道哪种细胞类型在某个位置中具有更强的存在。变量n应包含在功能中，以显示我们想要多少最佳单元格类型。例如，在此输出中，我们在SPOT 1中有6种单元格类型。如果我设置n = 3，那么我想要的就是只有最佳的3个单元格类型，而不是6种。请有什么想法？

数据

mydata <- structure(list(`Spot 1` = c(0.14, 0.22, 0.154, 0.3, 0.22, 0.4, 
0.55, 0.6), `Spot 2` = c(0.12, 0.5, 0.4, 0.12, 0.6, 0.23, 0.4, 
0.56)), class = "data.frame", row.names = c(NA, 8L))

原文

I have a dataframe contains probability values like this

#   Spot 1 Spot 2
# 1  0.140   0.12
# 2  0.220   0.50
# 3  0.154   0.40
# 4  0.300   0.12
# 5  0.220   0.60
# 6  0.400   0.23
# 7  0.550   0.40
# 8  0.600   0.56

The rownames are actually cell names, for example, 1 means cell type 1, etc. I make a function to categorise whether the cell type is present in a spot or not. The cell type is present if the probability value >= threshold t. So, I make a function to identify the cell type present in a spot or not like this

solution_matrix <- function(probability_matrix, threshold, n=0) {
  #parameter n is the number of cell type ranking we want. For example, n=3 means we choose the highest ranking of 3 cell types to be present in a spot
  probability_matrix <- probability_matrix
  for (i in 1:nrow(probability_matrix)) {
    for (j in 1:ncol(probability_matrix)) {
      if (probability_matrix[i, j] >= threshold) {
        probability_matrix[i, j] <- probability_matrix[i, j]
      } else {
        probability_matrix[i, j] <- 0
      }
    }
  }
  
  probability_matrix <- probability_matrix >= threshold  
  
  solution_matrix <- probability_matrix
  for (i in 1:nrow(solution_matrix)) {
    for (j in 1:ncol(solution_matrix)) {
      if (solution_matrix[i, j] == TRUE) {
        solution_matrix[i, j] <- i - 1 #We minus 1 since the cluster numbers start from 0 to 9 (there are 10 clusters)
      } else {
        solution_matrix[i, j] <- NA
      }
    }
  }
  for (j in 1:ncol(solution_matrix)) {
    solution_matrix[, j] <- sort(as.numeric(solution_matrix[, j]), na.last=TRUE) #This is to make the NA entries to be filled after the number entries
  }
  
  solution_matrix <- solution_matrix
  colnames(solution_matrix) <- gsub("\\.", "-", colnames(solution_matrix)) #We need to rename the spot IDs in estimated_solution from using . to - to match it to true_solution obtained from the synthetic_data 
  
  return(list(probability_matrix, solution_matrix))
}#solution_matrix#
solution_matrix(mydata, 0.2, 3)

It will result output like this

[[1]]
Spot 1 Spot 2
1  FALSE  FALSE
2   TRUE   TRUE
3  FALSE   TRUE
4   TRUE  FALSE
5   TRUE   TRUE
6   TRUE   TRUE
7   TRUE   TRUE
8   TRUE   TRUE

[[2]]
Spot 1 Spot 2
1      1      1
2      3      2
3      4      4
4      5      5
5      6      6
6      7      7
7     NA     NA
8     NA     NA

I want to order the cell types present in the spot based on their probability values (from the high values should be rank 1, rank 2, and so on). However, I do not know how to rank those cell types while keeping a record of what cell type is present in what spot. In this output, I have no idea which cell types have a stronger presence in a spot. The variable n should be included in the function to show how many best cell types presences we want. For example, in this output, we have 6 cell types in spot 1. If I set n = 3 then what I want is to have only the best 3 cell types, instead of 6. Any idea, please?

data

mydata <- structure(list(`Spot 1` = c(0.14, 0.22, 0.154, 0.3, 0.22, 0.4, 
0.55, 0.6), `Spot 2` = c(0.12, 0.5, 0.4, 0.12, 0.6, 0.23, 0.4, 
0.56)), class = "data.frame", row.names = c(NA, 8L))

分享到QQ

分享到微博