SynapseML 中的分析发票函数

发布于 2025-01-20 04:13:41 字数 559 浏览 4 评论 0原文

我正在尝试了解 synapse ML 中的AnalyzeInvoice 函数，但我有几个问题 setImageUrlCol("source") 和 setImageUrlCol("source") 之间有什么区别？ setImageBytesCol("data") 我什么时候应该使用其中一个而不是另一个？这里的“来源”是什么意思？我正在尝试扫描一组发票.jpeg 文件并想要展平数据。这里的输出应该是什么样的？

分析发票 = (分析发票() .setSubscriptionKey(认知密钥) .setLocation("伊斯特斯") .setImageUrlCol("来源") .setOutputCol("发票") .setConcurrency(5))

(analyzeInvoices
        .transform(imageDf)
        .withColumn("documents", explode(col("invoices.analyzeResult.documentResults.fields")))
        .select("source", "documents")).show()

原文

I am trying to understand AnalyzeInvoice function in synapse ML and I have few questions
what is the difference between setImageUrlCol("source") & setImageBytesCol("data") and when should I use one over the other? What does "source" mean here?
I am trying to scan set of invoices.jpeg files and want to flatten the data.
How should be the output look like here?

analyzeInvoices = (AnalyzeInvoices()
.setSubscriptionKey(cognitiveKey)
.setLocation("eastus")
.setImageUrlCol("source")
.setOutputCol("invoices")
.setConcurrency(5))

(analyzeInvoices
        .transform(imageDf)
        .withColumn("documents", explode(col("invoices.analyzeResult.documentResults.fields")))
        .select("source", "documents")).show()

分享到QQ

分享到微博