我们可以阅读Hazelcast Jet中的镶木quet文件吗?

发布于 2025-01-26 17:49:40 字数 1522 浏览 2 评论 0原文

我试图通过Hazelcast读取Parquet文件,因为我在下面写的代码写了,但是Hazelcast提供了任何读取parquet文件的内构建源?

BatchSource<Object> csvData = SourceBuilder
        .batch("parquet-source", x -> {                               
            try {
    
                ParquetReader<GenericData.Record> reader = AvroParquetReader.<GenericData.Record>builder(new Path("D:/test/1651070287920.parquet")).build();
                return reader;
                
            } catch (Exception e) {
                
                return null;
            }
        })
        .<Object>fillBufferFn((reader, buf) -> {                          
            try {
                GenericRecord record;
            if ((record = reader.read()) != null) {
            
                Map<String, String> map = new HashMap<>();
                for (int i = 0; i < headers[0].length; i++) {
                        String value = record.get(i) == null ? "" : record.get(i).toString();
                        map.put(headers[0][i], value);
                }
            
                
                    if (map != null) {
                        rowcount = rowcount + 1; 
                        buf.add(map);
                    }
            } else {
                buf.close();
                return;
            }
            } catch (Exception e) {
                buf.close();
                return;
            }

        })
        .build();

请让我知道Hazelcast Jet中是否已经有任何消息来源。

I trying to read parquet file via Hazelcast for that I have written below code which is working fine, but do Hazelcast provide any in-build source to read parquet file?

BatchSource<Object> csvData = SourceBuilder
        .batch("parquet-source", x -> {                               
            try {
    
                ParquetReader<GenericData.Record> reader = AvroParquetReader.<GenericData.Record>builder(new Path("D:/test/1651070287920.parquet")).build();
                return reader;
                
            } catch (Exception e) {
                
                return null;
            }
        })
        .<Object>fillBufferFn((reader, buf) -> {                          
            try {
                GenericRecord record;
            if ((record = reader.read()) != null) {
            
                Map<String, String> map = new HashMap<>();
                for (int i = 0; i < headers[0].length; i++) {
                        String value = record.get(i) == null ? "" : record.get(i).toString();
                        map.put(headers[0][i], value);
                }
            
                
                    if (map != null) {
                        rowcount = rowcount + 1; 
                        buf.add(map);
                    }
            } else {
                buf.close();
                return;
            }
            } catch (Exception e) {
                buf.close();
                return;
            }

        })
        .build();

Please let me know if there is already any source in Hazelcast Jet.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

且行且努力 2025-02-02 17:49:40

Parquet files using Avro for serialization can be read using the Unified File Connector. See also the code sample.

最近可好 2025-02-02 17:49:40

统一文件连接器

BatchSource<SpecificUser> source = FileSources.files("/data")
  .glob("users.parquet")
  .format(FileFormat.<SpecificUser>parquet())
  .useHadoopForLocalFiles(true)
  .build();

使用 t有一个与您的架构相对应的类,或者您的模式更具动态性,您想返回org.apache.generic.genericrecord从源代码>到map&lt; string,String&gt;您可以使用以下内容:

BatchSource<GenericRecord> source = FileSources.files(currentDir + "/target/parquet")
  .glob("file.parquet")
  .option("avro.serialization.data.model", GenericData.class.getName())                                                    
  .useHadoopForLocalFiles(true)
  .format(FileFormat.parquet())
  .build();

Parquet is supported using the unified file connector

BatchSource<SpecificUser> source = FileSources.files("/data")
  .glob("users.parquet")
  .format(FileFormat.<SpecificUser>parquet())
  .useHadoopForLocalFiles(true)
  .build();

If you don't have a class corresponding to your schema, or your schema is more dynamic and you want to return org.apache.avro.generic.GenericRecord from the source, which you can then map to Map<String, String> you can use the following:

BatchSource<GenericRecord> source = FileSources.files(currentDir + "/target/parquet")
  .glob("file.parquet")
  .option("avro.serialization.data.model", GenericData.class.getName())                                                    
  .useHadoopForLocalFiles(true)
  .format(FileFormat.parquet())
  .build();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文