理解:Detect_pii_entities 和 contains_pii_entities 之间的区别 boto3 comprehend
我试图了解使用 boto3 comprehend 的 detector_pii_entities 和 contains_pii_entities 函数之间的区别。我尝试使用以下代码片段:
str_text = """
Hello Zhang Wei, I am John. Your AnyCompany Financial Services, LLC credit card account 1111-0000-1111-0008 has a minimum payment of $24.53 that is due by July 31st. Based on your autopay settings, we will withdraw your payment on the due date from your bank account number XXXXXX1111 with the routing number XXXXX0000.
Your latest statement was mailed to 100 Main Street, Any City, WA 98121.
After your payment is received, you will receive a confirmation text message at 206-555-0100.
If you have questions about your bill, AnyCompany Customer Service is available by phone at 206-555-0199 or email at [email protected].
"""
client = boto3.client('comprehend')
detect_pii = client.detect_pii_entities(
Text=str_text,
LanguageCode='en'
)
print("detect pii: ", detect_pii)
contains_pii = client.detect_pii_entities(
Text=str_text,
LanguageCode='en'
)
print("contains pii: ", contains_pii)
我得到的输出是:
detect_pii: {'Entities': [{'Score': 0.9996908903121948, 'Type': 'NAME', 'BeginOffset': 52, 'EndOffset': 61}, {'Score': 0.9999550580978394, 'Type': 'NAME', 'BeginOffset': 68, 'EndOffset': 72}, {'Score': 0.9627901911735535, 'Type': 'CREDIT_DEBIT_NUMBER', 'BeginOffset': 134, 'EndOffset': 153}, {'Score': 0.9714980125427246, 'Type': 'DATE_TIME', 'BeginOffset': 201, 'EndOffset': 210}, {'Score': 0.9999960660934448, 'Type': 'BANK_ACCOUNT_NUMBER', 'BeginOffset': 320, 'EndOffset': 330}, {'Score': 0.999988317489624, 'Type': 'BANK_ROUTING', 'BeginOffset': 355, 'EndOffset': 364}, {'Score': 0.9999522566795349, 'Type': 'ADDRESS', 'BeginOffset': 406, 'EndOffset': 441}, {'Score': 0.9999591112136841, 'Type': 'PHONE', 'BeginOffset': 525, 'EndOffset': 537}, {'Score': 0.999980092048645, 'Type': 'PHONE', 'BeginOffset': 633, 'EndOffset': 645}, {'Score': 0.9995272159576416, 'Type': 'EMAIL', 'BeginOffset': 658, 'EndOffset': 680}], 'ResponseMetadata': {'RequestId': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'content-type': 'application/x-amz-json-1.1', 'content-length': '827', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}
contains_pii: {'Labels': [{'Name': 'DATE_TIME', 'Score': 0.9986850023269653}, {'Name': 'EMAIL', 'Score': 0.9985549449920654}, {'Name': 'BANK_ACCOUNT_NUMBER', 'Score': 0.8221991658210754}, {'Name': 'BANK_ROUTING', 'Score': 0.6654205918312073}, {'Name': 'CREDIT_DEBIT_NUMBER', 'Score': 1.0}, {'Name': 'PHONE', 'Score': 1.0}], 'ResponseMetadata': {'RequestId': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'content-type': 'application/x-amz-json-1.1', 'content-length': '285', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}
我发现在第二种情况下,名称和地址丢失,可能还缺少一些 PII 标签。我如何使用 contains. 文档建议名称和地址应可用以及控制台上的 Comprehend API 可以返回所有 PII 标签。
{
"Labels": [
{
"Name": "EMAIL",
"Score": 1
},
{
"Name": "DATE_TIME",
"Score": 1
},
{
"Name": "NAME",
"Score": 0.8311530351638794
},
{
"Name": "BANK_ROUTING",
"Score": 0.7879412174224854
},
{
"Name": "ADDRESS",
"Score": 0.6723417043685913
},
{
"Name": "BANK_ACCOUNT_NUMBER",
"Score": 0.6297846436500549
},
{
"Name": "CREDIT_DEBIT_NUMBER",
"Score": 1
},
{
"Name": "PHONE",
"Score": 1
}
]
}
不确定使用 boto3 包时缺少什么。使用的 boto3 版本:1.18.12
I am trying to understand the difference between using botot3 comprehend's detect_pii_entities and contains_pii_entities functions. I tried to use the following snippet:
str_text = """
Hello Zhang Wei, I am John. Your AnyCompany Financial Services, LLC credit card account 1111-0000-1111-0008 has a minimum payment of $24.53 that is due by July 31st. Based on your autopay settings, we will withdraw your payment on the due date from your bank account number XXXXXX1111 with the routing number XXXXX0000.
Your latest statement was mailed to 100 Main Street, Any City, WA 98121.
After your payment is received, you will receive a confirmation text message at 206-555-0100.
If you have questions about your bill, AnyCompany Customer Service is available by phone at 206-555-0199 or email at [email protected].
"""
client = boto3.client('comprehend')
detect_pii = client.detect_pii_entities(
Text=str_text,
LanguageCode='en'
)
print("detect pii: ", detect_pii)
contains_pii = client.detect_pii_entities(
Text=str_text,
LanguageCode='en'
)
print("contains pii: ", contains_pii)
The output that i get is:
detect_pii: {'Entities': [{'Score': 0.9996908903121948, 'Type': 'NAME', 'BeginOffset': 52, 'EndOffset': 61}, {'Score': 0.9999550580978394, 'Type': 'NAME', 'BeginOffset': 68, 'EndOffset': 72}, {'Score': 0.9627901911735535, 'Type': 'CREDIT_DEBIT_NUMBER', 'BeginOffset': 134, 'EndOffset': 153}, {'Score': 0.9714980125427246, 'Type': 'DATE_TIME', 'BeginOffset': 201, 'EndOffset': 210}, {'Score': 0.9999960660934448, 'Type': 'BANK_ACCOUNT_NUMBER', 'BeginOffset': 320, 'EndOffset': 330}, {'Score': 0.999988317489624, 'Type': 'BANK_ROUTING', 'BeginOffset': 355, 'EndOffset': 364}, {'Score': 0.9999522566795349, 'Type': 'ADDRESS', 'BeginOffset': 406, 'EndOffset': 441}, {'Score': 0.9999591112136841, 'Type': 'PHONE', 'BeginOffset': 525, 'EndOffset': 537}, {'Score': 0.999980092048645, 'Type': 'PHONE', 'BeginOffset': 633, 'EndOffset': 645}, {'Score': 0.9995272159576416, 'Type': 'EMAIL', 'BeginOffset': 658, 'EndOffset': 680}], 'ResponseMetadata': {'RequestId': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'content-type': 'application/x-amz-json-1.1', 'content-length': '827', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}
contains_pii: {'Labels': [{'Name': 'DATE_TIME', 'Score': 0.9986850023269653}, {'Name': 'EMAIL', 'Score': 0.9985549449920654}, {'Name': 'BANK_ACCOUNT_NUMBER', 'Score': 0.8221991658210754}, {'Name': 'BANK_ROUTING', 'Score': 0.6654205918312073}, {'Name': 'CREDIT_DEBIT_NUMBER', 'Score': 1.0}, {'Name': 'PHONE', 'Score': 1.0}], 'ResponseMetadata': {'RequestId': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'content-type': 'application/x-amz-json-1.1', 'content-length': '285', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}
I see that in the second case Name and Address are missing and maybe some more PII labels. How do I get that using contains. The documentation suggests that Name and Address should be available as well as the Comprehend API on the console gives me back all PII labels.
Output on AWS console:
{
"Labels": [
{
"Name": "EMAIL",
"Score": 1
},
{
"Name": "DATE_TIME",
"Score": 1
},
{
"Name": "NAME",
"Score": 0.8311530351638794
},
{
"Name": "BANK_ROUTING",
"Score": 0.7879412174224854
},
{
"Name": "ADDRESS",
"Score": 0.6723417043685913
},
{
"Name": "BANK_ACCOUNT_NUMBER",
"Score": 0.6297846436500549
},
{
"Name": "CREDIT_DEBIT_NUMBER",
"Score": 1
},
{
"Name": "PHONE",
"Score": 1
}
]
}
Not sure what I am missing while using the boto3 package. boto3 version used: 1.18.12
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论