我们可以阅读Hazelcast Jet中的镶木quet文件吗？

发布于 2025-01-26 17:49:40 字数 1522 浏览 2 评论 0原文

我试图通过Hazelcast读取Parquet文件，因为我在下面写的代码写了，但是Hazelcast提供了任何读取parquet文件的内构建源？

BatchSource<Object> csvData = SourceBuilder
        .batch("parquet-source", x -> {                               
            try {
    
                ParquetReader<GenericData.Record> reader = AvroParquetReader.<GenericData.Record>builder(new Path("D:/test/1651070287920.parquet")).build();
                return reader;
                
            } catch (Exception e) {
                
                return null;
            }
        })
        .<Object>fillBufferFn((reader, buf) -> {                          
            try {
                GenericRecord record;
            if ((record = reader.read()) != null) {
            
                Map<String, String> map = new HashMap<>();
                for (int i = 0; i < headers[0].length; i++) {
                        String value = record.get(i) == null ? "" : record.get(i).toString();
                        map.put(headers[0][i], value);
                }
            
                
                    if (map != null) {
                        rowcount = rowcount + 1; 
                        buf.add(map);
                    }
            } else {
                buf.close();
                return;
            }
            } catch (Exception e) {
                buf.close();
                return;
            }

        })
        .build();

请让我知道Hazelcast Jet中是否已经有任何消息来源。

原文

I trying to read parquet file via Hazelcast for that I have written below code which is working fine, but do Hazelcast provide any in-build source to read parquet file?

BatchSource<Object> csvData = SourceBuilder
        .batch("parquet-source", x -> {                               
            try {
    
                ParquetReader<GenericData.Record> reader = AvroParquetReader.<GenericData.Record>builder(new Path("D:/test/1651070287920.parquet")).build();
                return reader;
                
            } catch (Exception e) {
                
                return null;
            }
        })
        .<Object>fillBufferFn((reader, buf) -> {                          
            try {
                GenericRecord record;
            if ((record = reader.read()) != null) {
            
                Map<String, String> map = new HashMap<>();
                for (int i = 0; i < headers[0].length; i++) {
                        String value = record.get(i) == null ? "" : record.get(i).toString();
                        map.put(headers[0][i], value);
                }
            
                
                    if (map != null) {
                        rowcount = rowcount + 1; 
                        buf.add(map);
                    }
            } else {
                buf.close();
                return;
            }
            } catch (Exception e) {
                buf.close();
                return;
            }

        })
        .build();

Please let me know if there is already any source in Hazelcast Jet.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

且行且努力 2025-02-02 17:49:40

可以使用 unified File File Connector 。另请参见代码样本。

回复收藏 0 原文

最近可好 2025-02-02 17:49:40

统一文件连接器

BatchSource<SpecificUser> source = FileSources.files("/data")
  .glob("users.parquet")
  .format(FileFormat.<SpecificUser>parquet())
  .useHadoopForLocalFiles(true)
  .build();

使用 t有一个与您的架构相对应的类，或者您的模式更具动态性，您想返回org.apache.generic.genericrecord从源代码>到map＆lt; string，String＆gt;您可以使用以下内容：

BatchSource<GenericRecord> source = FileSources.files(currentDir + "/target/parquet")
  .glob("file.parquet")
  .option("avro.serialization.data.model", GenericData.class.getName())                                                    
  .useHadoopForLocalFiles(true)
  .format(FileFormat.parquet())
  .build();

Parquet is supported using the unified file connector

BatchSource<SpecificUser> source = FileSources.files("/data")
  .glob("users.parquet")
  .format(FileFormat.<SpecificUser>parquet())
  .useHadoopForLocalFiles(true)
  .build();

If you don't have a class corresponding to your schema, or your schema is more dynamic and you want to return org.apache.avro.generic.GenericRecord from the source, which you can then map to Map<String, String> you can use the following:

BatchSource<GenericRecord> source = FileSources.files(currentDir + "/target/parquet")
  .glob("file.parquet")
  .option("avro.serialization.data.model", GenericData.class.getName())                                                    
  .useHadoopForLocalFiles(true)
  .format(FileFormat.parquet())
  .build();

回复收藏 0 原文

~没有更多了~