不确定为什么条件列变换没有识别要更新的值(text.contains)

发布于 2025-02-11 22:41:30 字数 1650 浏览 1 评论 0原文

我有一个人群记录,其中包含各种人口统计信息(种族,种族,性别等)。

对于null [族裔]中的值(即西班牙裔Y/n),我想为字符串“非西班牙裔”搜索相应的[race]值,因为对于某些记录作为[race]下的合并价值(例如,“白色(非西班牙裔)”)。我想清洁/归一化两个字段([竞赛]在下游单独的步骤中清洁)。

但是,我不确定为什么我的代码没有成功地识别前两个条件的匹配,因为我知道有很多实例“白色(非西班牙裔)”至少:

cleanData = 
    Table.ReplaceValue(rawData, each [Ethnicity], each
        if [Ethnicity] = null and (
            Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or 
            Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase)) 
            then "Non-hispanic" else
        if [Ethnicity] = null and 
            Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase) 
            then "Hispanic" else
        [Ethnicity], Replacer.ReplaceText, {"Ethnicity"}
        ),

这两个字段是类型文本,我不会遇到错误 - 只是缺乏预期行为。 [族裔]中的null值不变。

样本输入:

种族种族
白色
亚洲
白色(非西班牙裔)拒绝做出反应
白色(非西班牙裔)null
white(西班牙null
) /em>

样本输出:

种族种族
白色
亚洲
白色(非西班牙裔)拒绝做出反应
(非西班牙裔)no
白色(西班牙裔)是是是
亚洲人是 > null

I have a table of people records with various demographic information (Race, Ethnicity, Gender etc.).

For null values in [Ethnicity] (i.e., Hispanic Y/N), I want to search the corresponding [Race] value for a string "non-Hispanic", since for some records these have been stored as a combined value under [Race] (e.g., "White (non-Hispanic)"). and I'd like to clean/normalize both fields ([Race] is cleaned in a separate downstream step).

However, I'm unsure why my code is not successfully identifying matches to the first two conditions, since I know there are many instances of "White (non-Hispanic)" at the very least:

cleanData = 
    Table.ReplaceValue(rawData, each [Ethnicity], each
        if [Ethnicity] = null and (
            Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or 
            Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase)) 
            then "Non-hispanic" else
        if [Ethnicity] = null and 
            Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase) 
            then "Hispanic" else
        [Ethnicity], Replacer.ReplaceText, {"Ethnicity"}
        ),

Both fields are type Text, and I'm not hitting an error - just a lack of expected behavior. The null values in [Ethnicity] are unchanged.

Sample input:

RaceEthnicity
WhiteYes
AsianNo
White (non-Hispanic)Decline to respond
White (non-Hispanic)null
White (Hispanic)null
Asiannull

Sample output:

RaceEthnicity
WhiteYes
AsianNo
White (non-Hispanic)Decline to Respond
White (non-Hispanic)No
White (Hispanic)Yes
Asiannull

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

沫雨熙 2025-02-18 22:41:30

它的替代品

cleanData =  Table.ReplaceValue(rawData,  each [Ethnicity], each
    if [Ethnicity] = null then 
        if (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or  Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase)) 
        then "Non-hispanic" else 
        if  Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)  then "Hispanic"  else [Ethnicity]
    else [Ethnicity]
 ,Replacer.ReplaceValue,{"Ethnicity"}),

cleanData =  Table.ReplaceValue(rawData,  each [Ethnicity], each
    if [Ethnicity] = null and
        (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or  Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase) )
        then "Non-hispanic" else 
         if [Ethnicity] = null  and Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)  then "Hispanic"  else [Ethnicity]
 ,Replacer.ReplaceValue,{"Ethnicity"}),

Its the Replacer.ReplaceText which should be Replacer.ReplaceValue

cleanData =  Table.ReplaceValue(rawData,  each [Ethnicity], each
    if [Ethnicity] = null then 
        if (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or  Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase)) 
        then "Non-hispanic" else 
        if  Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)  then "Hispanic"  else [Ethnicity]
    else [Ethnicity]
 ,Replacer.ReplaceValue,{"Ethnicity"}),

or

cleanData =  Table.ReplaceValue(rawData,  each [Ethnicity], each
    if [Ethnicity] = null and
        (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or  Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase) )
        then "Non-hispanic" else 
         if [Ethnicity] = null  and Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)  then "Hispanic"  else [Ethnicity]
 ,Replacer.ReplaceValue,{"Ethnicity"}),
池予 2025-02-18 22:41:30

@horseyride您可以抽象它以使用任何大小的列表

    Text.ContainsAny = (source as text, searchStrings as list) as logical =>
        let
            matches = List.Transform( searchStrings, (string) =>
                Text.Contains(source, string, Comparer.OrdinalIgnoreCase) )
        in
            List.AnyTrue(matches),

,然后您可以写

if [Ethnicity] = null and Text.ContainsAny( [Race], {"non-hispanic", "not-hispanic"} ) then "Non-hispanic"
else ...

“在此处输入映像”

我是使用table.addcolumn的另一种方法,然后重命名。这听起来比使用替换列(就位)更糟糕。测试后,我使用两种方法获得了相同的最终折叠SQL查询。

let
    Source = Table.FromRows(
        Json.Document( Binary.Decompress( Binary.FromText( "i45WCs/ILElVUNJRikwtVorViVZyLM5MzAMJ+OWD+RAFGnn5eboemcUFiXmZyZogaZfU5JzMvFSFknyFotTigvy8FLzKkSUxJeCWoppRgm5GLAA=", BinaryEncoding.Base64 ), Compression.Deflate ) ),
        let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Race = _t, Ethnicity = _t]
    ),
    // eliminate whitespace, or control chars
    #"Changed Type"   = Table.TransformColumnTypes( Source, {{"Race", type text}, {"Ethnicity", type nullable text}} ),
    #"Cleaned Text"   = Table.TransformColumns( Source, {{"Ethnicity", Text.Clean, type nullable text}}),
    #"Replaced Value" = Table.ReplaceValue( #"Cleaned Text", "", null, Replacer.ReplaceValue, {"Ethnicity"} ),

    merge_columns = Table.AddColumn(
        #"Changed Type",
        "Ethnicity2",
        (row) =>
            let
                isBlank = row[Ethnicity] = null or row[Ethnicity] = "",
                race = row[Race],
                replacement =
                    if (
                        Text.Contains(race, "non-Hispanic", Comparer.OrdinalIgnoreCase)
                        or Text.Contains(race, "not-Hispanic", Comparer.OrdinalIgnoreCase)
                    ) then
                        "Non-Hispanic"
                    else if Text.Contains(race, "Hispanic", Comparer.OrdinalIgnoreCase) then
                        "Hispanic"
                    else
                        race
            in
                if isBlank then
                    replacement
                else
                    race,
        type text
    ),
    #"ReplaceColumns" = Table.RenameColumns(
        Table.RemoveColumns(merge_columns, {"Ethnicity"}), {{"Ethnicity2", "Ethnicity"}}
    )
in
    #"ReplaceColumns"

@horseyride you can abstract it to use a list of any size

    Text.ContainsAny = (source as text, searchStrings as list) as logical =>
        let
            matches = List.Transform( searchStrings, (string) =>
                Text.Contains(source, string, Comparer.OrdinalIgnoreCase) )
        in
            List.AnyTrue(matches),

then you can write

if [Ethnicity] = null and Text.ContainsAny( [Race], {"non-hispanic", "not-hispanic"} ) then "Non-hispanic"
else ...

enter image description here

I here's another way using Table.AddColumn, then rename. That might sound worse than using replacing a column, in-place. After testing, I was getting the same final folded SQL query -- using either method.

let
    Source = Table.FromRows(
        Json.Document( Binary.Decompress( Binary.FromText( "i45WCs/ILElVUNJRikwtVorViVZyLM5MzAMJ+OWD+RAFGnn5eboemcUFiXmZyZogaZfU5JzMvFSFknyFotTigvy8FLzKkSUxJeCWoppRgm5GLAA=", BinaryEncoding.Base64 ), Compression.Deflate ) ),
        let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Race = _t, Ethnicity = _t]
    ),
    // eliminate whitespace, or control chars
    #"Changed Type"   = Table.TransformColumnTypes( Source, {{"Race", type text}, {"Ethnicity", type nullable text}} ),
    #"Cleaned Text"   = Table.TransformColumns( Source, {{"Ethnicity", Text.Clean, type nullable text}}),
    #"Replaced Value" = Table.ReplaceValue( #"Cleaned Text", "", null, Replacer.ReplaceValue, {"Ethnicity"} ),

    merge_columns = Table.AddColumn(
        #"Changed Type",
        "Ethnicity2",
        (row) =>
            let
                isBlank = row[Ethnicity] = null or row[Ethnicity] = "",
                race = row[Race],
                replacement =
                    if (
                        Text.Contains(race, "non-Hispanic", Comparer.OrdinalIgnoreCase)
                        or Text.Contains(race, "not-Hispanic", Comparer.OrdinalIgnoreCase)
                    ) then
                        "Non-Hispanic"
                    else if Text.Contains(race, "Hispanic", Comparer.OrdinalIgnoreCase) then
                        "Hispanic"
                    else
                        race
            in
                if isBlank then
                    replacement
                else
                    race,
        type text
    ),
    #"ReplaceColumns" = Table.RenameColumns(
        Table.RemoveColumns(merge_columns, {"Ethnicity"}), {{"Ethnicity2", "Ethnicity"}}
    )
in
    #"ReplaceColumns"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文