按案例更改句子,其中两个单词被“卡住”。一起
我正在尝试清理从HTML提取的以下数据。
在一个句子的开头“卡住”到上一个单词时,有些句子并未完全用大写字母分开。
下图说明了我要实现的目标:
因此,如果有这样的句子,则本质上是: 男孩和球玩的女孩一起玩游戏机
。这将分为以下:
The boy plays with the ball
The Girl plays with the Console
到目前为止使用实际数据(必须在Power BI中运行,作为使用html.tml。
let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://echa.europa.eu/registration-dossier/-/registered-dossier/14184/7/1"))}),
#"Added Custom" = Table.AddColumn(Source, "Custom", each if Text.Contains([Column1], "General Population - Hazard via oral route") then [Column1] else null),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] <> null)),
#"Kept Last Rows" = Table.LastN(#"Filtered Rows", 1),
#"Removed Other Columns" = Table.SelectColumns(#"Kept Last Rows",{"Custom"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Removed Other Columns", {{"Custom", Splitter.SplitTextByDelimiter("</dd><dt>", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Custom"),
#"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter", "Text", each Html.Table([Custom], {{"Custom",":root"}})),
#"Expanded Text" = Table.ExpandTableColumn(#"Added Custom1", "Text", {"Custom"}, {"Custom.1"})
in
#"Expanded Text"
I am attempting to clean up the following data which has been extracted from HTML.
Some sentences haven't quite split correctly with the Capitalised word at the start of one sentence "stuck" to the preceding word.
The image below illustrates what I am trying to achieve:
So in essence if there is a sentence like: The boy plays with the ballThe Girl plays with the Console in a row
. This would split to:
The boy plays with the ball
The Girl plays with the Console
M code so far with the actual data ( must be run in power BI as uses Html.Table function which is not available in excel).
let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://echa.europa.eu/registration-dossier/-/registered-dossier/14184/7/1"))}),
#"Added Custom" = Table.AddColumn(Source, "Custom", each if Text.Contains([Column1], "General Population - Hazard via oral route") then [Column1] else null),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] <> null)),
#"Kept Last Rows" = Table.LastN(#"Filtered Rows", 1),
#"Removed Other Columns" = Table.SelectColumns(#"Kept Last Rows",{"Custom"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Removed Other Columns", {{"Custom", Splitter.SplitTextByDelimiter("</dd><dt>", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Custom"),
#"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter", "Text", each Html.Table([Custom], {{"Custom",":root"}})),
#"Expanded Text" = Table.ExpandTableColumn(#"Added Custom1", "Text", {"Custom"}, {"Custom.1"})
in
#"Expanded Text"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
图像看起来仍然不正确(Information overall不会分开),但是如果您想通过角色过渡分开,则可以从功能区中进行。
Image still looks incorrect (informationOverall is not split) but if you want to split by character transition, you can do so from the ribbon.