SynapseML 中的分析发票函数
我正在尝试了解 synapse ML 中的AnalyzeInvoice 函数,但我有几个问题 setImageUrlCol("source") 和 setImageUrlCol("source") 之间有什么区别? setImageBytesCol("data") 我什么时候应该使用其中一个而不是另一个?这里的“来源”是什么意思? 我正在尝试扫描一组发票.jpeg 文件并想要展平数据。 这里的输出应该是什么样的?
分析发票 = (分析发票() .setSubscriptionKey(认知密钥) .setLocation("伊斯特斯") .setImageUrlCol("来源") .setOutputCol("发票") .setConcurrency(5))
(analyzeInvoices
.transform(imageDf)
.withColumn("documents", explode(col("invoices.analyzeResult.documentResults.fields")))
.select("source", "documents")).show()
I am trying to understand AnalyzeInvoice function in synapse ML and I have few questions
what is the difference between setImageUrlCol("source") & setImageBytesCol("data") and when should I use one over the other? What does "source" mean here?
I am trying to scan set of invoices.jpeg files and want to flatten the data.
How should be the output look like here?
analyzeInvoices = (AnalyzeInvoices()
.setSubscriptionKey(cognitiveKey)
.setLocation("eastus")
.setImageUrlCol("source")
.setOutputCol("invoices")
.setConcurrency(5))
(analyzeInvoices
.transform(imageDf)
.withColumn("documents", explode(col("invoices.analyzeResult.documentResults.fields")))
.select("source", "documents")).show()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
setImageBytesCol("data") - 用于将图像文件转换为 base64。基数 64 是所需的输入格式。 setImageBytesCol() 是可以将 JPG 输入图像转换为基本 64 位的库
setImageUrlCol("source") - 转换过程中使用的图像的 URL。
首先需要给出图像数组,这将是 setImageUrlCol("source")。 “Source”是图像位置的输入数组
第二步是将图像转换为 Base 64。
https://mmlspark.blob.core.windows.net/docs/1.0.0-rc1/pyspark/_modules/mmlspark/cognitive/AnalyzeImage.html
https://microsoft.github.io/SynapseML/docs/features/cognitive_services/CognitiveServices%20-%20Overview/
setImageBytesCol("data") - This is used to convert the image file into base64. The base 64 is the required input format. setImageBytesCol() is the library which can convert the JPG input image into base 64 bit
setImageUrlCol("source") - The URL of the image to be used in conversion procedure.
First the image array needs to be given and that will be the setImageUrlCol("source"). "Source" is the input array of images location
Second step is to convert the image into base 64.
https://mmlspark.blob.core.windows.net/docs/1.0.0-rc1/pyspark/_modules/mmlspark/cognitive/AnalyzeImage.html
https://microsoft.github.io/SynapseML/docs/features/cognitive_services/CognitiveServices%20-%20Overview/