使用r在Excel中找到大胆的文本
我正在尝试在Excel访问中制作几个电子表格。我需要根据细胞的特定分组来替换大胆的文本和一些单元格的内容。例如,如果我有此表:
我想拥有等效的“可访问”表:
我不担心在Excel文件中写作,我的目标是阅读从电子表格中的表格创建一个数据框架,该数据框架看起来像上表,在第一列中具有变量名。
我的想法是确定第一列中有大胆的文本的位置,以便我可以将文本预先为不大胆的名称,因为BOLD代表子组。
我了解这可能不是解决问题的最佳解决方案,希望有人可以找到适当的解决方案。
非常感谢大家。
I am trying to make a couple of spreadsheets in excel accessible. I need to replace bold text and some contents of the cells depending on their specific grouping. For example, if I have this table:
I would like to have the equivalent "accessible" table:
I am not worried about writing in the excel file, my goal is to read the table from the spreadsheet and create a data frame that looks like the table above with the variable names in the first column.
My idea was to identify where there is a bold text in the first column so I could prepend that text to the names that are not in bold as bold represents the subgroups.
I understand this might not be the best solution to the problem, I hope someone can find a proper solution.
Thank you all very much.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
- 已编辑 - 事实证明,您可以说出哪种样式和哪种样式的样式,具体取决于工作簿中使用的样式的一致性数量/如何确定您的需求。但是我将使用下面的总列方法留下其他答案。第一种方法依赖于始终使用的大胆文本。第二种方法依赖于总列子类别始终等于总类别。两者最终都使用类似的方法,只有识别类别文本的初始策略与每种方法都不同。
---我不认为OpenXlsx可以确定哪些单元格具有大胆的样式 - 只是在工作簿中存在大胆的样式。---我再也不会错了!
- - 折叠的文本搜索答案 -
由于这是所需的样式,请拔出具有此fontnecoration的行。 note 如果在不同的单元格中使用粗体和其他样式类型(例如,摩托车处于红色字体),那么在用粗体字体的标记/收集行时,这可能会变得更加复杂(因此,方法2可能是更安全的选项) 。
这确定了每个类别应具有的重复数量。因此,将行位置1和5之间的位置不同,然后剩余的数据集长度。如果多个类别(第二种方法),请参见第二种方法
-
因此,此答案不是使用粗体文本方法,但是假设数据集的结构是您在示例中显示的方式,则下面应该有效。数据是在您拥有类别(汽车,摩托车)然后子类别(Tesla,Honda,Toyota等)的情况下结构的,每个类别的总列为总数,然后是子类别的子类别,这些子类别对总数有助于总计。使用此列,您可以定义类别边界(即,从自身中减去总数时,它在切换到下一个类别之前达到0)。对于演示,我添加了另外两个不同长度的类别,但相同的限制(子类别的总和必须总计类别总计)。我记下了一个可能需要调整目的的事情,因为我是从头开始创建数据框,而不是在使用OpenXLSX中阅读它
--EDITED -- it turns out you can tell the style of which cell has which style and depending on how many styles/how consistent the styles are used in the workbook would determine your needs. But I will leave the other answer using the Total column approach below. The first approach relies on the bold text being consistently used. And the second approach relies on the Total column subcategories always equaling the total categories. The both end up using similar approaches, just initial strategy of identifying the category text is different with each approach.
---I don't believe that openxlsx can determine which cells have a bolded style-- only that a bolded style exists in the workbook.--- I couldn't have been more wrong!
---Bold text search answer --
Since this is the style desired, pull out the rows that have this fontDecoration. Note if BOLD and other style type in different cell (e.g., Motorcycle was in red font) then this may get more complex in flagging/collecting the rows with bolded font (hence approach 2 may be safer option).
This identifies number of repetitions each category should have. So taking different between row position 1 and 5, then remaining length of the dataset. See second approach if more than two categories
--Second approach --
So, this answer isn't using the bold text approach, but assuming the structure of the dataset is how you have it displayed in the example, then the below should work. The data are structured where you have categories (Car, Motorcycle) then subcategories (Tesla, Honda, Toyota, etc.) with the Total column being Total for each category, then subcategory subtotal that contributes to the Total. Using this column, you can define category boundaries (i.e., when subtracting Total from itself, it reaches 0 before switching to the next category). For demo, I added two more categories of varying lengths but same restrictions (sum of subcategories' totals must equal category total). I made a note where things may need adjusted for your purposes since I am creating the dataframe from scratch instead of reading it in using openxlsx