不确定为什么条件列变换没有识别要更新的值（text.contains）

发布于 2025-02-11 22:41:30 字数 1650 浏览 1 评论 0原文

我有一个人群记录，其中包含各种人口统计信息（种族，种族，性别等）。

对于null [族裔]中的值（即西班牙裔Y/n），我想为字符串“非西班牙裔”搜索相应的[race]值，因为对于某些记录作为[race]下的合并价值（例如，“白色（非西班牙裔）”）。我想清洁/归一化两个字段（[竞赛]在下游单独的步骤中清洁）。

但是，我不确定为什么我的代码没有成功地识别前两个条件的匹配，因为我知道有很多实例“白色（非西班牙裔）”至少：

cleanData = 
    Table.ReplaceValue(rawData, each [Ethnicity], each
        if [Ethnicity] = null and (
            Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or 
            Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase)) 
            then "Non-hispanic" else
        if [Ethnicity] = null and 
            Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase) 
            then "Hispanic" else
        [Ethnicity], Replacer.ReplaceText, {"Ethnicity"}
        ),

这两个字段是类型文本，我不会遇到错误 - 只是缺乏预期行为。 [族裔]中的null值不变。

样本输入：

种族	种族
白色	是
亚洲	否
白色（非西班牙裔）	拒绝做出反应
白色（非西班牙裔）	null
white（西班牙	null
裔	） /em>

样本输出：

种族	种族
白色	是
亚洲	否
白色（非西班牙裔）	拒绝做出反应
（非西班牙裔）	no
白色（西班牙裔）	是是是
亚洲人	是 > null

原文

I have a table of people records with various demographic information (Race, Ethnicity, Gender etc.).

For null values in [Ethnicity] (i.e., Hispanic Y/N), I want to search the corresponding [Race] value for a string "non-Hispanic", since for some records these have been stored as a combined value under [Race] (e.g., "White (non-Hispanic)"). and I'd like to clean/normalize both fields ([Race] is cleaned in a separate downstream step).

However, I'm unsure why my code is not successfully identifying matches to the first two conditions, since I know there are many instances of "White (non-Hispanic)" at the very least:

cleanData = 
    Table.ReplaceValue(rawData, each [Ethnicity], each
        if [Ethnicity] = null and (
            Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or 
            Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase)) 
            then "Non-hispanic" else
        if [Ethnicity] = null and 
            Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase) 
            then "Hispanic" else
        [Ethnicity], Replacer.ReplaceText, {"Ethnicity"}
        ),

Both fields are type Text, and I'm not hitting an error - just a lack of expected behavior. The null values in [Ethnicity] are unchanged.

Sample input:

Race	Ethnicity
White	Yes
Asian	No
White (non-Hispanic)	Decline to respond
White (non-Hispanic)	null
White (Hispanic)	null
Asian	null

Sample output:

Race	Ethnicity
White	Yes
Asian	No
White (non-Hispanic)	Decline to Respond
White (non-Hispanic)	No
White (Hispanic)	Yes
Asian	null

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沫雨熙 2025-02-18 22:41:30

它的替代品

cleanData =  Table.ReplaceValue(rawData,  each [Ethnicity], each
    if [Ethnicity] = null then 
        if (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or  Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase)) 
        then "Non-hispanic" else 
        if  Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)  then "Hispanic"  else [Ethnicity]
    else [Ethnicity]
 ,Replacer.ReplaceValue,{"Ethnicity"}),

。

cleanData =  Table.ReplaceValue(rawData,  each [Ethnicity], each
    if [Ethnicity] = null and
        (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or  Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase) )
        then "Non-hispanic" else 
         if [Ethnicity] = null  and Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)  then "Hispanic"  else [Ethnicity]
 ,Replacer.ReplaceValue,{"Ethnicity"}),

Its the Replacer.ReplaceText which should be Replacer.ReplaceValue

cleanData =  Table.ReplaceValue(rawData,  each [Ethnicity], each
    if [Ethnicity] = null then 
        if (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or  Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase)) 
        then "Non-hispanic" else 
        if  Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)  then "Hispanic"  else [Ethnicity]
    else [Ethnicity]
 ,Replacer.ReplaceValue,{"Ethnicity"}),

cleanData =  Table.ReplaceValue(rawData,  each [Ethnicity], each
    if [Ethnicity] = null and
        (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or  Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase) )
        then "Non-hispanic" else 
         if [Ethnicity] = null  and Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)  then "Hispanic"  else [Ethnicity]
 ,Replacer.ReplaceValue,{"Ethnicity"}),

回复收藏 0 原文

池予 2025-02-18 22:41:30

@horseyride您可以抽象它以使用任何大小的列表

    Text.ContainsAny = (source as text, searchStrings as list) as logical =>
        let
            matches = List.Transform( searchStrings, (string) =>
                Text.Contains(source, string, Comparer.OrdinalIgnoreCase) )
        in
            List.AnyTrue(matches),

，然后您可以写

if [Ethnicity] = null and Text.ContainsAny( [Race], {"non-hispanic", "not-hispanic"} ) then "Non-hispanic"
else ...

我是使用table.addcolumn的另一种方法，然后重命名。这听起来比使用替换列（就位）更糟糕。测试后，我使用两种方法获得了相同的最终折叠SQL查询。

let
    Source = Table.FromRows(
        Json.Document( Binary.Decompress( Binary.FromText( "i45WCs/ILElVUNJRikwtVorViVZyLM5MzAMJ+OWD+RAFGnn5eboemcUFiXmZyZogaZfU5JzMvFSFknyFotTigvy8FLzKkSUxJeCWoppRgm5GLAA=", BinaryEncoding.Base64 ), Compression.Deflate ) ),
        let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Race = _t, Ethnicity = _t]
    ),
    // eliminate whitespace, or control chars
    #"Changed Type"   = Table.TransformColumnTypes( Source, {{"Race", type text}, {"Ethnicity", type nullable text}} ),
    #"Cleaned Text"   = Table.TransformColumns( Source, {{"Ethnicity", Text.Clean, type nullable text}}),
    #"Replaced Value" = Table.ReplaceValue( #"Cleaned Text", "", null, Replacer.ReplaceValue, {"Ethnicity"} ),

    merge_columns = Table.AddColumn(
        #"Changed Type",
        "Ethnicity2",
        (row) =>
            let
                isBlank = row[Ethnicity] = null or row[Ethnicity] = "",
                race = row[Race],
                replacement =
                    if (
                        Text.Contains(race, "non-Hispanic", Comparer.OrdinalIgnoreCase)
                        or Text.Contains(race, "not-Hispanic", Comparer.OrdinalIgnoreCase)
                    ) then
                        "Non-Hispanic"
                    else if Text.Contains(race, "Hispanic", Comparer.OrdinalIgnoreCase) then
                        "Hispanic"
                    else
                        race
            in
                if isBlank then
                    replacement
                else
                    race,
        type text
    ),
    #"ReplaceColumns" = Table.RenameColumns(
        Table.RemoveColumns(merge_columns, {"Ethnicity"}), {{"Ethnicity2", "Ethnicity"}}
    )
in
    #"ReplaceColumns"

@horseyride you can abstract it to use a list of any size

    Text.ContainsAny = (source as text, searchStrings as list) as logical =>
        let
            matches = List.Transform( searchStrings, (string) =>
                Text.Contains(source, string, Comparer.OrdinalIgnoreCase) )
        in
            List.AnyTrue(matches),

then you can write

if [Ethnicity] = null and Text.ContainsAny( [Race], {"non-hispanic", "not-hispanic"} ) then "Non-hispanic"
else ...

I here's another way using Table.AddColumn, then rename. That might sound worse than using replacing a column, in-place. After testing, I was getting the same final folded SQL query -- using either method.

let
    Source = Table.FromRows(
        Json.Document( Binary.Decompress( Binary.FromText( "i45WCs/ILElVUNJRikwtVorViVZyLM5MzAMJ+OWD+RAFGnn5eboemcUFiXmZyZogaZfU5JzMvFSFknyFotTigvy8FLzKkSUxJeCWoppRgm5GLAA=", BinaryEncoding.Base64 ), Compression.Deflate ) ),
        let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Race = _t, Ethnicity = _t]
    ),
    // eliminate whitespace, or control chars
    #"Changed Type"   = Table.TransformColumnTypes( Source, {{"Race", type text}, {"Ethnicity", type nullable text}} ),
    #"Cleaned Text"   = Table.TransformColumns( Source, {{"Ethnicity", Text.Clean, type nullable text}}),
    #"Replaced Value" = Table.ReplaceValue( #"Cleaned Text", "", null, Replacer.ReplaceValue, {"Ethnicity"} ),

    merge_columns = Table.AddColumn(
        #"Changed Type",
        "Ethnicity2",
        (row) =>
            let
                isBlank = row[Ethnicity] = null or row[Ethnicity] = "",
                race = row[Race],
                replacement =
                    if (
                        Text.Contains(race, "non-Hispanic", Comparer.OrdinalIgnoreCase)
                        or Text.Contains(race, "not-Hispanic", Comparer.OrdinalIgnoreCase)
                    ) then
                        "Non-Hispanic"
                    else if Text.Contains(race, "Hispanic", Comparer.OrdinalIgnoreCase) then
                        "Hispanic"
                    else
                        race
            in
                if isBlank then
                    replacement
                else
                    race,
        type text
    ),
    #"ReplaceColumns" = Table.RenameColumns(
        Table.RemoveColumns(merge_columns, {"Ethnicity"}), {{"Ethnicity2", "Ethnicity"}}
    )
in
    #"ReplaceColumns"

回复收藏 0 原文

~没有更多了~