我的单词在一个单元格中与想提取的单词混合在一起。如何提取单词或删除我不需要的单词?
您好,我正在寻找Pandas或Excel中的解决方案。我有一个带有一个列的铺张表,其中包含由半洛龙隔开的单词,
apple - slice123; banana; apple - slice321; orange; citron; apple - slice345;
我想将“香蕉”,“橙色”和“ citron”提取到新的列中。
我寻找具有单词列表的令牌化和熊猫提取物,但找不到解决方案。
我的原始CSV包含1058行,所讨论的列有1个正确的单词(橙色等)和1个错误(Apple -Slicexyz),但也有5个正确的单词,最多可达100个错误。
我希望有人有一个想法解决这个问题。
编辑以进行澄清。
我在表中有1027行,但只有带有“水果”数据的列是相关的。我知道我在列
编辑中的某个地方有27种不同的水果:我添加了HTML表以进行澄清。单词列表用于从列数据中识别相关的“水果”,并在列数据中使用哪些水果的结果告诉我。
<style>
table, th, td {
border: 1px solid black;
border-collapse: collapse; padding: 15px;
}
</style>
<table>
<tr>
<td><p><strong>Colunmn 1</strong></p></td>
<td><p><strong>Colunmn 2</strong></p></td>
<td><p><strong>Data</strong></p></td>
<td><p><strong>Result</strong></p></td>
</tr>
<tr>
<td><p>not relevant</p></td>
<td><p>not relevant</p></td>
<td><p>apple - slice123; banana; apple - slice321; orange; citron; apple - slice345</p></td>
<td><p>banana; orange; citron</p></td>
</tr>
<tr>
<td><p>not relevant</p></td>
<td><p>not relevant</p></td>
<td><p>apple - slice435; banana; apple - slice687; orange; citron; apple - slice334; mango; papaya</p></td>
<td><p>banana; orange; citron; mango; papaya</p></td>
</tr>
</table>
<p></p>
<table>
<tr>
<td><p> <strong>word list</strong><p></td>
</tr>
<tr>
<td><p>banana</p></td>
</tr>
<tr>
<td><p>orange<p></td>
</tr>
<tr>
<td><p>citron<p></td>
</tr>
<tr>
<td><p>mango<p></td>
</tr>
<tr>
<td><p>papaya<p></td>
</tr>
</table>
Hello I'm looking for a solution in Pandas or excel. I have a spread sheet with a column that contain words separated by a semicolon
apple - slice123; banana; apple - slice321; orange; citron; apple - slice345;
I want to extract "banana" and "orange" and "citron" into a new column.
I looked for tokenization and pandas extract with word list but I didn't not find a solution.
My original csv contains 1058 rows and the column in question has 1 correct word (orange etc) and 1 error (apple - sliceXYZ) but also 5 correct words and up to 100 errors.
I hope someone has an idea how to solve this.
Edit for clarification.
I have 1027 rows in in the table but only the column with the data of the "fruits" is relevant. I know that I have 27 different fruits somewhere in the columns
Edit: I added a html table for clarification. The word list is used to identify the relevant "fruits" out of the column data and tells me in the results which of the fruits was used in the column data.
<style>
table, th, td {
border: 1px solid black;
border-collapse: collapse; padding: 15px;
}
</style>
<table>
<tr>
<td><p><strong>Colunmn 1</strong></p></td>
<td><p><strong>Colunmn 2</strong></p></td>
<td><p><strong>Data</strong></p></td>
<td><p><strong>Result</strong></p></td>
</tr>
<tr>
<td><p>not relevant</p></td>
<td><p>not relevant</p></td>
<td><p>apple - slice123; banana; apple - slice321; orange; citron; apple - slice345</p></td>
<td><p>banana; orange; citron</p></td>
</tr>
<tr>
<td><p>not relevant</p></td>
<td><p>not relevant</p></td>
<td><p>apple - slice435; banana; apple - slice687; orange; citron; apple - slice334; mango; papaya</p></td>
<td><p>banana; orange; citron; mango; papaya</p></td>
</tr>
</table>
<p></p>
<table>
<tr>
<td><p> <strong>word list</strong><p></td>
</tr>
<tr>
<td><p>banana</p></td>
</tr>
<tr>
<td><p>orange<p></td>
</tr>
<tr>
<td><p>citron<p></td>
</tr>
<tr>
<td><p>mango<p></td>
</tr>
<tr>
<td><p>papaya<p></td>
</tr>
</table>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
IIUC您可以这样做:
IIUC you can do it like this:
如果您拥有Excel 2019(或更高),并且第一个带有数据的单元格是A1,则可以使用
Excel 2013至2016,则可以使用以上的FilterXml()部分,但必须以作为一个阵列公式,例如选择单元格B1:D1,在公式栏中输入公式,然后按Ctrl+Shift+Enter以确认它
(您正在选择3个单元格,因为您希望有3个结果)
If you have Excel 2019 (or greater) and the first cell with data is A1 you could use
If you have Excel 2013 to 2016 then you could just use the FILTERXML() portion of the above, but it would have to be entered as an array formula, e.g. select cells B1:D1, enter the formula in the formula bar, and press CTRL+Shift+Enter to confirm it
(you're selecting 3 cells because you expect to have 3 results)