如何从Excel中的列中提取指定文本

发布于 2025-01-18 16:10:19 字数 718 浏览 4 评论 0 原文

我已经尝试了许多方法,但未能将正确的数据获取到新列。

列描述包含很多信息,我想提取到分开的列

我想将上述内容提取到以下列:

部门 日期 支持时间表 日期 日期时间 变化时间
老年护理HCP 29/06/2021 社会支持 工作日 上午7点至7pm 09:00 am -12:00 PM

任何人可以指导我或为我提供给我解决我如何清洁上述列的数据的解决方案?我已经为您的细读附加了数据。

我尝试使用

下面使用Excel Power查询是文档的链接

I have tried numerous ways but have failed to get the right data to the new columns.

The column description contains a lot of information in which I want to extract to separate columns. Examples

Aged Care HCP - 29/6/2021 - Social Support Weekday 7am – 7pm 9:00AM 12:00PM

I want to extract the above to the below Columns:

Sector Date Support Schedule Day Day time Shift Time
Aged Care HCP 29/06/2021 Social Support Weekday 7am - 7pm 09:00am - 12:00pm

Can anyone please guide me or provide me the solution on how I can clean the data to the above said columns? I have attached the data for your perusal.

I have tried using excel power query

Below is the link for the document
https://www.dropbox.com/scl/fi/4o09adq936cc7cls28sfz/Worksheet.xlsx?dl=0&rlkey=zalomc9pagexd435z014q6qjc

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

原野 2025-01-25 16:10:19

非结构化数据很困难,但这是电源查询的一个示例,说明了如何解析其中的一些示例。根据实际数据和品种,您必须对其进行调整。

let
    Source = Excel.Workbook(File.Contents("C:\Sample.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Sheet1_Sheet,{{"Column1", type any}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type any}, {"Column6", type any}, {"Column7", type any}, {"Column8", type any}, {"Column9", type any}, {"Column10", type text}, {"Column11", type text}}),
    #"Removed Top Rows" = Table.Skip(#"Changed Type",4),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type any}, {"Source", type text}, {"Description", type text}, {"Reference", type text}, {"Debit", type number}, {"Credit", type number}, {"Running Balance", type number}, {"Gross", type number}, {"GST", type number}, {"Account Code", Int64.Type}, {"Department", type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type1", each ([Description] <> null)),
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.StartsWith([Description], "Aged Care HCP")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows1", "Description", Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv), {"Sector", "Service Date", "Description"}),
    #"Changed Type with Locale" = Table.TransformColumnTypes(#"Split Column by Delimiter", {{"Service Date", type date}}, "en-GB"),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type with Locale","Weekday","|Weekday|",Replacer.ReplaceText,{"Description"}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value", "Description", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Support Schedule", "Day", "Description"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Split Column by Delimiter1"," – ","–",Replacer.ReplaceText,{"Description"}),
    #"Trimmed Text" = Table.TransformColumns(#"Replaced Value1",{{"Description", Text.Trim, type text}}),
    #"Split Column by Delimiter2" = Table.SplitColumn(#"Trimmed Text", "Description", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Day time", "Shift Time"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Sector", type text}, {"Support Schedule", type text}, {"Day", type text}, {"Day time", type text}, {"Shift Time", type text}})
in
    #"Changed Type2"

It's difficult with unstructured data, but here's an example in Power Query of how you can parse some of it out. Depending on the actual data and variety, you will have to tweak it.

let
    Source = Excel.Workbook(File.Contents("C:\Sample.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Sheet1_Sheet,{{"Column1", type any}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type any}, {"Column6", type any}, {"Column7", type any}, {"Column8", type any}, {"Column9", type any}, {"Column10", type text}, {"Column11", type text}}),
    #"Removed Top Rows" = Table.Skip(#"Changed Type",4),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type any}, {"Source", type text}, {"Description", type text}, {"Reference", type text}, {"Debit", type number}, {"Credit", type number}, {"Running Balance", type number}, {"Gross", type number}, {"GST", type number}, {"Account Code", Int64.Type}, {"Department", type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type1", each ([Description] <> null)),
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.StartsWith([Description], "Aged Care HCP")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows1", "Description", Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv), {"Sector", "Service Date", "Description"}),
    #"Changed Type with Locale" = Table.TransformColumnTypes(#"Split Column by Delimiter", {{"Service Date", type date}}, "en-GB"),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type with Locale","Weekday","|Weekday|",Replacer.ReplaceText,{"Description"}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value", "Description", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Support Schedule", "Day", "Description"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Split Column by Delimiter1"," – ","–",Replacer.ReplaceText,{"Description"}),
    #"Trimmed Text" = Table.TransformColumns(#"Replaced Value1",{{"Description", Text.Trim, type text}}),
    #"Split Column by Delimiter2" = Table.SplitColumn(#"Trimmed Text", "Description", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Day time", "Shift Time"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Sector", type text}, {"Support Schedule", type text}, {"Day", type text}, {"Day time", type text}, {"Shift Time", type text}})
in
    #"Changed Type2"

Result

莫言歌 2025-01-25 16:10:19

使用pandas

import pandas as pd 

df = pd.read_csv('filename.csv')

# will print the column
print(df.col_name)

use pandas https://pypi.org/project/pandas/

import pandas as pd 

df = pd.read_csv('filename.csv')

# will print the column
print(df.col_name)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文