使用r在Excel中找到大胆的文本

发布于 2025-02-11 01:52:16 字数 621 浏览 1 评论 0原文

我正在尝试在Excel访问中制作几个电子表格。我需要根据细胞的特定分组来替换大胆的文本和一些单元格的内容。例如,如果我有此表:

”不访问表“

我想拥有等效的“可访问”表:

“可访问表”

我不担心在Excel文件中写作,我的目标是阅读从电子表格中的表格创建一个数据框架,该数据框架看起来像上表,在第一列中具有变量名。

我的想法是确定第一列中有大胆的文本的位置,以便我可以将文本预先为不大胆的名称,因为BOLD代表子组。

我了解这可能不是解决问题的最佳解决方案,希望有人可以找到适当的解决方案。

非常感谢大家。

I am trying to make a couple of spreadsheets in excel accessible. I need to replace bold text and some contents of the cells depending on their specific grouping. For example, if I have this table:

Not accessible table

I would like to have the equivalent "accessible" table:

Accessible table

I am not worried about writing in the excel file, my goal is to read the table from the spreadsheet and create a data frame that looks like the table above with the variable names in the first column.

My idea was to identify where there is a bold text in the first column so I could prepend that text to the names that are not in bold as bold represents the subgroups.

I understand this might not be the best solution to the problem, I hope someone can find a proper solution.

Thank you all very much.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

江湖彼岸 2025-02-18 01:52:16

- 已编辑 - 事实证明,您可以说出哪种样式和哪种样式的样式,具体取决于工作簿中使用的样式的一致性数量/如何确定您的需求。但是我将使用下面的总列方法留下其他答案。第一种方法依赖于始终使用的大胆文本。第二种方法依赖于总列子类别始终等于总类别。两者最终都使用类似的方法,只有识别类别文本的初始策略与每种方法都不同。

---我不认为OpenXlsx可以确定哪些单元格具有大胆的样式 - 只是在工作簿中存在大胆的样式。---我再也不会错了!

- - 折叠的文本搜索答案 -

library(openxlsx)
library(tidyverse)

wb <- createWorkbook()

wb <- loadWorkbook("Path\\Your_File_Name.xlsx")

#Examine structure of the workbook
str(wb)

#Tells number of styles in workbook
wb$styleObjects %>% length()

#In this example there is just 1 style. So index the 1st style below this text logical if fontDecoration is bolded (answer is TRUE)

wb$styleObjects[[1]]$style$fontDecoration == "BOLD"

由于这是所需的样式,请拔出具有此fontnecoration的行。 note 如果在不同的单元格中使用粗体和其他样式类型(例如,摩托车处于红色字体),那么在用粗体字体的标记/收集行时,这可能会变得更加复杂(因此,方法2可能是更安全的选项) 。

#This indicates that rows 1 and 5 have bolded text (i.e., Car and Motorcycle)

thesebold <- wb$styleObjects[[1]]$rows


df <- read.xlsx("Path\\filename", colNames = FALSE)

这确定了每个类别应具有的重复数量。因此,将行位置1和5之间的位置不同,然后剩余的数据集长度。如果多个类别(第二种方法),请参见第二种方法

thesereps <- c(diff(thesebold), dim(df)[1] - diff(thesebold))

#Named variables for ease
df %>% 
    set_names("Category", "Total") %>% 
    bind_cols(newcat = rep(df[thesebold,1], thesereps)) %>% 
    mutate(Category = case_when(Category == newcat ~ Category,
                                Category != newcat ~ paste0(newcat, ":", Category))) %>% 
    select(-newcat)

-
因此,此答案不是使用粗体文本方法,但是假设数据集的结构是您在示例中显示的方式,则下面应该有效。数据是在您拥有类别(汽车,摩托车)然后子类别(Tesla,Honda,Toyota等)的情况下结构的,每个类别的总列为总数,然后是子类别的子类别,这些子类别对总数有助于总计。使用此列,您可以定义类别边界(即,从自身中减去总数时,它在切换到下一个类别之前达到0)。对于演示,我添加了另外两个不同长度的类别,但相同的限制(子类别的总和必须总计类别总计)。我记下了一个可能需要调整目的的事情,因为我是从头开始创建数据框,而不是在使用OpenXLSX中阅读它

library(tidyverse)

#Make expanded data set for demo - adding extra categories
thesenames = c("Car", "Tesla", "Honda", "Toyota", "Motorcylce", "Honda", "Yamaha", "Suzuki", "Fruit", "Apple", "Orange", "Grape", "Strawberry", "Lemon", "Lime", "Shape", "Circle", "Square", "Octogon")
thesetotals = c(12,3,2,7,20,13,5,2, 32, 8, 4, 4, 8, 4, 4, 24, 2, 4, 18)
df <- bind_cols(thesenames, thesetotals)%>% 
        set_names("Type", "Total") 

#Empty tibble to save running total result to
y <- tibble(NULL)

#Initialize the current.total as 0

current.total = 0

for(i in thesetotals){
    
    if (current.total == 0){
        current.total = current.total + i
    } else{
        current.total = current.total - i
    }
tmp <- current.total  
y <- rbind(y, tmp) 
}


y <- as_tibble(y) %>% 
    set_names("RT")

#Gets the number of subcategories between each of main categories
thislong <- c(diff(which(y$RT ==0)))
thislong <- c((length(y$RT) - sum(thislong)),thislong)

#This part assumes the structure of the df I created above which may need modified in your dataset
#This pulls from first column, first row, which here is "Car"

firstrow <- df[1,1] %>% pull()

#Gets vector of each category; determines category by looking at the lag RT value
    
thesetypes <-  bind_cols(df,y) %>% 
               mutate(Category = case_when(firstrow ==   Type ~ Type,
                                                 RT > lag(RT) ~ Type,
                                                         TRUE ~ "0")) %>% 
               filter(Category != "0") %>% 
               pull(Category)

#Adds new category to existing df, repeating the specified number of times
df$Category <- rep(thesetypes,thislong)


#Modifies the subcategory text with prefixing the category membership then drops Category
df <- df %>% 
    mutate(Type = case_when(Type != Category ~ paste0(Category, ":", Type),
                            TRUE ~ Type)) %>% 
    select(-Category)

df

--EDITED -- it turns out you can tell the style of which cell has which style and depending on how many styles/how consistent the styles are used in the workbook would determine your needs. But I will leave the other answer using the Total column approach below. The first approach relies on the bold text being consistently used. And the second approach relies on the Total column subcategories always equaling the total categories. The both end up using similar approaches, just initial strategy of identifying the category text is different with each approach.

---I don't believe that openxlsx can determine which cells have a bolded style-- only that a bolded style exists in the workbook.--- I couldn't have been more wrong!

---Bold text search answer --

library(openxlsx)
library(tidyverse)

wb <- createWorkbook()

wb <- loadWorkbook("Path\\Your_File_Name.xlsx")

#Examine structure of the workbook
str(wb)

#Tells number of styles in workbook
wb$styleObjects %>% length()

#In this example there is just 1 style. So index the 1st style below this text logical if fontDecoration is bolded (answer is TRUE)

wb$styleObjects[[1]]$style$fontDecoration == "BOLD"

Since this is the style desired, pull out the rows that have this fontDecoration. Note if BOLD and other style type in different cell (e.g., Motorcycle was in red font) then this may get more complex in flagging/collecting the rows with bolded font (hence approach 2 may be safer option).

#This indicates that rows 1 and 5 have bolded text (i.e., Car and Motorcycle)

thesebold <- wb$styleObjects[[1]]$rows


df <- read.xlsx("Path\\filename", colNames = FALSE)

This identifies number of repetitions each category should have. So taking different between row position 1 and 5, then remaining length of the dataset. See second approach if more than two categories

thesereps <- c(diff(thesebold), dim(df)[1] - diff(thesebold))

#Named variables for ease
df %>% 
    set_names("Category", "Total") %>% 
    bind_cols(newcat = rep(df[thesebold,1], thesereps)) %>% 
    mutate(Category = case_when(Category == newcat ~ Category,
                                Category != newcat ~ paste0(newcat, ":", Category))) %>% 
    select(-newcat)

--Second approach --
So, this answer isn't using the bold text approach, but assuming the structure of the dataset is how you have it displayed in the example, then the below should work. The data are structured where you have categories (Car, Motorcycle) then subcategories (Tesla, Honda, Toyota, etc.) with the Total column being Total for each category, then subcategory subtotal that contributes to the Total. Using this column, you can define category boundaries (i.e., when subtracting Total from itself, it reaches 0 before switching to the next category). For demo, I added two more categories of varying lengths but same restrictions (sum of subcategories' totals must equal category total). I made a note where things may need adjusted for your purposes since I am creating the dataframe from scratch instead of reading it in using openxlsx

library(tidyverse)

#Make expanded data set for demo - adding extra categories
thesenames = c("Car", "Tesla", "Honda", "Toyota", "Motorcylce", "Honda", "Yamaha", "Suzuki", "Fruit", "Apple", "Orange", "Grape", "Strawberry", "Lemon", "Lime", "Shape", "Circle", "Square", "Octogon")
thesetotals = c(12,3,2,7,20,13,5,2, 32, 8, 4, 4, 8, 4, 4, 24, 2, 4, 18)
df <- bind_cols(thesenames, thesetotals)%>% 
        set_names("Type", "Total") 

#Empty tibble to save running total result to
y <- tibble(NULL)

#Initialize the current.total as 0

current.total = 0

for(i in thesetotals){
    
    if (current.total == 0){
        current.total = current.total + i
    } else{
        current.total = current.total - i
    }
tmp <- current.total  
y <- rbind(y, tmp) 
}


y <- as_tibble(y) %>% 
    set_names("RT")

#Gets the number of subcategories between each of main categories
thislong <- c(diff(which(y$RT ==0)))
thislong <- c((length(y$RT) - sum(thislong)),thislong)

#This part assumes the structure of the df I created above which may need modified in your dataset
#This pulls from first column, first row, which here is "Car"

firstrow <- df[1,1] %>% pull()

#Gets vector of each category; determines category by looking at the lag RT value
    
thesetypes <-  bind_cols(df,y) %>% 
               mutate(Category = case_when(firstrow ==   Type ~ Type,
                                                 RT > lag(RT) ~ Type,
                                                         TRUE ~ "0")) %>% 
               filter(Category != "0") %>% 
               pull(Category)

#Adds new category to existing df, repeating the specified number of times
df$Category <- rep(thesetypes,thislong)


#Modifies the subcategory text with prefixing the category membership then drops Category
df <- df %>% 
    mutate(Type = case_when(Type != Category ~ paste0(Category, ":", Type),
                            TRUE ~ Type)) %>% 
    select(-Category)

df
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文