理解:Detect_pii_entities 和 contains_pii_entities 之间的区别 boto3 comprehend

发布于 2025-01-11 18:48:58 字数 4312 浏览 1 评论 0原文

我试图了解使用 boto3 comprehend 的 detector_pii_entities 和 contains_pii_entities 函数之间的区别。我尝试使用以下代码片段:

str_text = """
Hello Zhang Wei, I am John. Your AnyCompany Financial Services, LLC credit card account 1111-0000-1111-0008 has a minimum payment of $24.53 that is due by July 31st. Based on your autopay settings, we will withdraw your payment on the due date from your bank account number XXXXXX1111 with the routing number XXXXX0000. 

Your latest statement was mailed to 100 Main Street, Any City, WA 98121. 
After your payment is received, you will receive a confirmation text message at 206-555-0100. 
If you have questions about your bill, AnyCompany Customer Service is available by phone at 206-555-0199 or email at [email protected].
"""

client = boto3.client('comprehend')
detect_pii = client.detect_pii_entities(
             Text=str_text,
             LanguageCode='en'
         )
print("detect pii: ", detect_pii)
contains_pii = client.detect_pii_entities(
             Text=str_text,
             LanguageCode='en'
         )
print("contains pii: ", contains_pii)

我得到的输出是:

detect_pii:  {'Entities': [{'Score': 0.9996908903121948, 'Type': 'NAME', 'BeginOffset': 52, 'EndOffset': 61}, {'Score': 0.9999550580978394, 'Type': 'NAME', 'BeginOffset': 68, 'EndOffset': 72}, {'Score': 0.9627901911735535, 'Type': 'CREDIT_DEBIT_NUMBER', 'BeginOffset': 134, 'EndOffset': 153}, {'Score': 0.9714980125427246, 'Type': 'DATE_TIME', 'BeginOffset': 201, 'EndOffset': 210}, {'Score': 0.9999960660934448, 'Type': 'BANK_ACCOUNT_NUMBER', 'BeginOffset': 320, 'EndOffset': 330}, {'Score': 0.999988317489624, 'Type': 'BANK_ROUTING', 'BeginOffset': 355, 'EndOffset': 364}, {'Score': 0.9999522566795349, 'Type': 'ADDRESS', 'BeginOffset': 406, 'EndOffset': 441}, {'Score': 0.9999591112136841, 'Type': 'PHONE', 'BeginOffset': 525, 'EndOffset': 537}, {'Score': 0.999980092048645, 'Type': 'PHONE', 'BeginOffset': 633, 'EndOffset': 645}, {'Score': 0.9995272159576416, 'Type': 'EMAIL', 'BeginOffset': 658, 'EndOffset': 680}], 'ResponseMetadata': {'RequestId': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'content-type': 'application/x-amz-json-1.1', 'content-length': '827', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}

contains_pii: {'Labels': [{'Name': 'DATE_TIME', 'Score': 0.9986850023269653}, {'Name': 'EMAIL', 'Score': 0.9985549449920654}, {'Name': 'BANK_ACCOUNT_NUMBER', 'Score': 0.8221991658210754}, {'Name': 'BANK_ROUTING', 'Score': 0.6654205918312073}, {'Name': 'CREDIT_DEBIT_NUMBER', 'Score': 1.0}, {'Name': 'PHONE', 'Score': 1.0}], 'ResponseMetadata': {'RequestId': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'content-type': 'application/x-amz-json-1.1', 'content-length': '285', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}

我发现在第二种情况下,名称和地址丢失,可能还缺少一些 PII 标签。我如何使用 contains. 文档建议名称和地址应可用以及控制台上的 Comprehend API 可以返回所有 PII 标签。

AWS 控制台:

{
    "Labels": [
        {
            "Name": "EMAIL",
            "Score": 1
        },
        {
            "Name": "DATE_TIME",
            "Score": 1
        },
        {
            "Name": "NAME",
            "Score": 0.8311530351638794
        },
        {
            "Name": "BANK_ROUTING",
            "Score": 0.7879412174224854
        },
        {
            "Name": "ADDRESS",
            "Score": 0.6723417043685913
        },
        {
            "Name": "BANK_ACCOUNT_NUMBER",
            "Score": 0.6297846436500549
        },
        {
            "Name": "CREDIT_DEBIT_NUMBER",
            "Score": 1
        },
        {
            "Name": "PHONE",
            "Score": 1
        }
    ]
}

不确定使用 boto3 包时缺少什么。使用的 boto3 版本:1.18.12

I am trying to understand the difference between using botot3 comprehend's detect_pii_entities and contains_pii_entities functions. I tried to use the following snippet:

str_text = """
Hello Zhang Wei, I am John. Your AnyCompany Financial Services, LLC credit card account 1111-0000-1111-0008 has a minimum payment of $24.53 that is due by July 31st. Based on your autopay settings, we will withdraw your payment on the due date from your bank account number XXXXXX1111 with the routing number XXXXX0000. 

Your latest statement was mailed to 100 Main Street, Any City, WA 98121. 
After your payment is received, you will receive a confirmation text message at 206-555-0100. 
If you have questions about your bill, AnyCompany Customer Service is available by phone at 206-555-0199 or email at [email protected].
"""

client = boto3.client('comprehend')
detect_pii = client.detect_pii_entities(
             Text=str_text,
             LanguageCode='en'
         )
print("detect pii: ", detect_pii)
contains_pii = client.detect_pii_entities(
             Text=str_text,
             LanguageCode='en'
         )
print("contains pii: ", contains_pii)

The output that i get is:

detect_pii:  {'Entities': [{'Score': 0.9996908903121948, 'Type': 'NAME', 'BeginOffset': 52, 'EndOffset': 61}, {'Score': 0.9999550580978394, 'Type': 'NAME', 'BeginOffset': 68, 'EndOffset': 72}, {'Score': 0.9627901911735535, 'Type': 'CREDIT_DEBIT_NUMBER', 'BeginOffset': 134, 'EndOffset': 153}, {'Score': 0.9714980125427246, 'Type': 'DATE_TIME', 'BeginOffset': 201, 'EndOffset': 210}, {'Score': 0.9999960660934448, 'Type': 'BANK_ACCOUNT_NUMBER', 'BeginOffset': 320, 'EndOffset': 330}, {'Score': 0.999988317489624, 'Type': 'BANK_ROUTING', 'BeginOffset': 355, 'EndOffset': 364}, {'Score': 0.9999522566795349, 'Type': 'ADDRESS', 'BeginOffset': 406, 'EndOffset': 441}, {'Score': 0.9999591112136841, 'Type': 'PHONE', 'BeginOffset': 525, 'EndOffset': 537}, {'Score': 0.999980092048645, 'Type': 'PHONE', 'BeginOffset': 633, 'EndOffset': 645}, {'Score': 0.9995272159576416, 'Type': 'EMAIL', 'BeginOffset': 658, 'EndOffset': 680}], 'ResponseMetadata': {'RequestId': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'content-type': 'application/x-amz-json-1.1', 'content-length': '827', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}

contains_pii: {'Labels': [{'Name': 'DATE_TIME', 'Score': 0.9986850023269653}, {'Name': 'EMAIL', 'Score': 0.9985549449920654}, {'Name': 'BANK_ACCOUNT_NUMBER', 'Score': 0.8221991658210754}, {'Name': 'BANK_ROUTING', 'Score': 0.6654205918312073}, {'Name': 'CREDIT_DEBIT_NUMBER', 'Score': 1.0}, {'Name': 'PHONE', 'Score': 1.0}], 'ResponseMetadata': {'RequestId': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'content-type': 'application/x-amz-json-1.1', 'content-length': '285', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}

I see that in the second case Name and Address are missing and maybe some more PII labels. How do I get that using contains. The documentation suggests that Name and Address should be available as well as the Comprehend API on the console gives me back all PII labels.

Output on AWS console:

{
    "Labels": [
        {
            "Name": "EMAIL",
            "Score": 1
        },
        {
            "Name": "DATE_TIME",
            "Score": 1
        },
        {
            "Name": "NAME",
            "Score": 0.8311530351638794
        },
        {
            "Name": "BANK_ROUTING",
            "Score": 0.7879412174224854
        },
        {
            "Name": "ADDRESS",
            "Score": 0.6723417043685913
        },
        {
            "Name": "BANK_ACCOUNT_NUMBER",
            "Score": 0.6297846436500549
        },
        {
            "Name": "CREDIT_DEBIT_NUMBER",
            "Score": 1
        },
        {
            "Name": "PHONE",
            "Score": 1
        }
    ]
}

Not sure what I am missing while using the boto3 package. boto3 version used: 1.18.12

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文