2025-06-11T09:14:50.485+0800 ERROR SplitRunner-20250611_011450_00344_fmh9x.1.1.0-11-2005 com.google.common.util.concurrent.AggregateFuture Got more than one input Future failure. Logging failures after the first io.trino.spi.TrinoException: Error opening Iceberg split hdfs://bigdata/user/hive/warehouse/chuzuche.db/ods_tocc_passenger_driver/data/00002-0-4232e133-6e41-47dd-b29a-dc3fd7f11dec-00048.parquet (offset=0, length=569): Cannot invoke "org.apache.hadoop.hdfs.BlockReader.available()" because "this.blockReader" is null at io.trino.plugin.iceberg.IcebergPageSourceProvider.createParquetPageSource(IcebergPageSourceProvider.java:1052) at io.trino.plugin.iceberg.IcebergPageSourceProvider.createDataPageSource(IcebergPageSourceProvider.java:553) at io.trino.plugin.iceberg.IcebergPageSourceProvider.openDeletes(IcebergPageSourceProvider.java:498) at io.trino.plugin.iceberg.IcebergPageSourceProvider.lambda$createPageSource$8(IcebergPageSourceProvider.java:395) at io.trino.plugin.iceberg.delete.EqualityDeleteFilter$EqualityDeleteFilterBuilder.readEqualityDeletesInternal(EqualityDeleteFilter.java:110) at io.trino.plugin.iceberg.delete.EqualityDeleteFilter$EqualityDeleteFilterBuilder.lambda$readEqualityDeletes$0(EqualityDeleteFilter.java:102) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at io.trino.plugin.iceberg.delete.EqualityDeleteFilter$EqualityDeleteFilterBuilder.readEqualityDeletes(EqualityDeleteFilter.java:103) at io.trino.plugin.iceberg.delete.DeleteManager.createEqualityDeleteFilter(DeleteManager.java:200) at io.trino.plugin.iceberg.delete.DeleteManager.getDeletePredicate(DeleteManager.java:98) at io.trino.plugin.iceberg.IcebergPageSourceProvider.lambda$createPageSource$9(IcebergPageSourceProvider.java:388) at com.google.common.base.Suppliers$NonSerializableMemoizingSupplier.get(Suppliers.java:186) at io.trino.plugin.iceberg.IcebergPageSource.getNextPage(IcebergPageSource.java:132) at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:268) at io.trino.operator.Driver.processInternal(Driver.java:403) at io.trino.operator.Driver.lambda$process$8(Driver.java:306) at io.trino.operator.Driver.tryWithLock(Driver.java:709) at io.trino.operator.Driver.process(Driver.java:298) at io.trino.operator.Driver.processForDuration(Driver.java:269) at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890) at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77) at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191) at io.trino.$gen.Trino_451____20250610_092944_2.run(Unknown Source) at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192) at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:168) at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:155) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.hadoop.hdfs.BlockReader.available()" because "this.blockReader" is null at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1575) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:73) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:73) at io.trino.hdfs.FSDataInputStreamTail.readTail(FSDataInputStreamTail.java:67) at io.trino.filesystem.hdfs.HdfsInput.readTail(HdfsInput.java:63) at io.trino.filesystem.TrinoInput.readTail(TrinoInput.java:43) at io.trino.filesystem.tracing.TracingInput.lambda$readTail$3(TracingInput.java:81) at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47) at io.trino.filesystem.tracing.TracingInput.readTail(TracingInput.java:81) at io.trino.plugin.hive.parquet.MemoryParquetDataSource.<init>(MemoryParquetDataSource.java:56) at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createDataSource(ParquetPageSourceFactory.java:321) at io.trino.plugin.iceberg.IcebergPageSourceProvider.createParquetPageSource(IcebergPageSourceProvider.java:914) ... 33 more 2025-06-11T09:14:50.485+0800 ERROR SplitRunner-20250611_011450_00344_fmh9x.1.1.0-7-2010 com.google.common.util.concurrent.AggregateFuture Got more than one input Future failure. Logging failures after the first io.trino.spi.TrinoException: Error opening Iceberg split hdfs://bigdata/user/hive/warehouse/chuzuche.db/ods_tocc_passenger_driver/data/00002-0-4232e133-6e41-47dd-b29a-dc3fd7f11dec-00048.parquet (offset=0, length=569): Cannot invoke "org.apache.hadoop.hdfs.BlockReader.available()" because "this.blockReader" is null at io.trino.plugin.iceberg.IcebergPageSourceProvider.createParquetPageSource(IcebergPageSourceProvider.java:1052) at io.trino.plugin.iceberg.IcebergPageSourceProvider.createDataPageSource(IcebergPageSourceProvider.java:553) at io.trino.plugin.iceberg.IcebergPageSourceProvider.openDeletes(IcebergPageSourceProvider.java:498) at io.trino.plugin.iceberg.IcebergPageSourceProvider.lambda$createPageSource$8(IcebergPageSourceProvider.java:395) at io.trino.plugin.iceberg.delete.EqualityDeleteFilter$EqualityDeleteFilterBuilder.readEqualityDeletesInternal(EqualityDeleteFilter.java:110) at io.trino.plugin.iceberg.delete.EqualityDeleteFilter$EqualityDeleteFilterBuilder.lambda$readEqualityDeletes$0(EqualityDeleteFilter.java:102) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at io.trino.plugin.iceberg.delete.EqualityDeleteFilter$EqualityDeleteFilterBuilder.readEqualityDeletes(EqualityDeleteFilter.java:103) at io.trino.plugin.iceberg.delete.DeleteManager.createEqualityDeleteFilter(DeleteManager.java:200) at io.trino.plugin.iceberg.delete.DeleteManager.getDeletePredicate(DeleteManager.java:98) at io.trino.plugin.iceberg.IcebergPageSourceProvider.lambda$createPageSource$9(IcebergPageSourceProvider.java:388) at com.google.common.base.Suppliers$NonSerializableMemoizingSupplier.get(Suppliers.java:186) at io.trino.plugin.iceberg.IcebergPageSource.getNextPage(IcebergPageSource.java:132) at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:268) at io.trino.operator.Driver.processInternal(Driver.java:403) at io.trino.operator.Driver.lambda$process$8(Driver.java:306) at io.trino.operator.Driver.tryWithLock(Driver.java:709) at io.trino.operator.Driver.process(Driver.java:298) at io.trino.operator.Driver.processForDuration(Driver.java:269) at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890) at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77) at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191) at io.trino.$gen.Trino_451____20250610_092944_2.run(Unknown Source) at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192) at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:168) at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:155) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.hadoop.hdfs.BlockReader.available()" because "this.blockReader" is null at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1575) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:73) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:73) at io.trino.hdfs.FSDataInputStreamTail.readTail(FSDataInputStreamTail.java:67) at io.trino.filesystem.hdfs.HdfsInput.readTail(HdfsInput.java:63) at io.trino.filesystem.TrinoInput.readTail(TrinoInput.java:43) at io.trino.filesystem.tracing.TracingInput.lambda$readTail$3(TracingInput.java:81) at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47) at io.trino.filesystem.tracing.TracingInput.readTail(TracingInput.java:81) at io.trino.plugin.hive.parquet.MemoryParquetDataSource.<init>(MemoryParquetDataSource.java:56) at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createDataSource(ParquetPageSourceFactory.java:321) at io.trino.plugin.iceberg.IcebergPageSourceProvider.createParquetPageSource(IcebergPageSourceProvider.java:914) ... 33 more 2025-06-11T09:14:50.485+0800 ERROR SplitRunner-20250611_011450_00344_fmh9x.1.1.0-10-1999 com.google.common.util.concurrent.AggregateFuture Got more than one input Future failure. Logging failures after the first io.trino.spi.TrinoException: Error opening Iceberg split hdfs://bigdata/user/hive/warehouse/chuzuche.db/ods_tocc_passenger_driver/data/00002-0-4232e133-6e41-47dd-b29a-dc3fd7f11dec-00048.parquet (offset=0, length=569): Cannot invoke "org.apache.hadoop.hdfs.BlockReader.available()" because "this.blockReader" is null at io.trino.plugin.iceberg.IcebergPageSourceProvider.createParquetPageSource(IcebergPageSourceProvider.java:1052) at io.trino.plugin.iceberg.IcebergPageSourceProvider.createDataPageSource(IcebergPageSourceProvider.java:553) at io.trino.plugin.iceberg.IcebergPageSourceProvider.openDeletes(IcebergPageSourceProvider.java:498) at io.trino.plugin.iceberg.IcebergPageSourceProvider.lambda$createPageSource$8(IcebergPageSourceProvider.java:395) at io.trino.plugin.iceberg.delete.EqualityDeleteFilter$EqualityDeleteFilterBuilder.readEqualityDeletesInternal(EqualityDeleteFilter.java:110) at io.trino.plugin.iceberg.delete.EqualityDeleteFilter$EqualityDeleteFilterBuilder.lambda$readEqualityDeletes$0(EqualityDeleteFilter.java:102) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at io.trino.plugin.iceberg.delete.EqualityDeleteFilter$EqualityDeleteFilterBuilder.readEqualityDeletes(EqualityDeleteFilter.java:103) at io.trino.plugin.iceberg.delete.DeleteManager.createEqualityDeleteFilter(DeleteManager.java:200) at io.trino.plugin.iceberg.delete.DeleteManager.getDeletePredicate(DeleteManager.java:98) at io.trino.plugin.iceberg.IcebergPageSourceProvider.lambda$createPageSource$9(IcebergPageSourceProvider.java:388) at com.google.common.base.Suppliers$NonSerializableMemoizingSupplier.get(Suppliers.java:186) at io.trino.plugin.iceberg.IcebergPageSource.getNextPage(IcebergPageSource.java:132) at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:268) at io.trino.operator.Driver.processInternal(Driver.java:403) at io.trino.operator.Driver.lambda$process$8(Driver.java:306) at io.trino.operator.Driver.tryWithLock(Driver.java:709) at io.trino.operator.Driver.process(Driver.java:298) at io.trino.operator.Driver.processForDuration(Driver.java:269) at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890) at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77) at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191) at io.trino.$gen.Trino_451____20250610_092944_2.run(Unknown Source) at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192) at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:168) at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:155) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.hadoop.hdfs.BlockReader.available()" because "this.blockReader" is null at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1575) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:73) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:73) at io.trino.hdfs.FSDataInputStreamTail.readTail(FSDataInputStreamTail.java:67) at io.trino.filesystem.hdfs.HdfsInput.readTail(HdfsInput.java:63) at io.trino.filesystem.TrinoInput.readTail(TrinoInput.java:43) at io.trino.filesystem.tracing.TracingInput.lambda$readTail$3(TracingInput.java:81) at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47) at io.trino.filesystem.tracing.TracingInput.readTail(TracingInput.java:81) at io.trino.plugin.hive.parquet.MemoryParquetDataSource.<init>(MemoryParquetDataSource.java:56) at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createDataSource(ParquetPageSourceFactory.java:321) at io.trino.plugin.iceberg.IcebergPageSourceProvider.createParquetPageSource(IcebergPageSourceProvider.java:914) ... 33 more什么问题
时间: 2025-08-03 22:13:23 浏览: 13
<think>嗯,用户遇到了一个Trino读取Iceberg表时出现的NullPointerException错误,具体错误信息是"'this.blockReaderisnull'"。这个问题看起来发生在访问HDFS数据块时,底层连接可能被异常中断。让我结合用户提供的引用内容和Iceberg架构特点来分析。注意到用户提供的引用中有几个关键线索:引用3提到HA高可用的HDFS配置需要在Trino节点的iceberg.properties中添加core-site.xml和hdfs-site.xml路径;引用5给出了JVM配置的细节,其中-Xmx10G这类参数说明是生产环境配置。用户这个报错通常出现在三个场景:HDFS客户端配置缺失、文件锁冲突或HDFS本身异常。最值得注意的是引用1中出现的ClosedByInterruptException,这和用户报错的NullPointerException高度相关——都是HDFS底层连接异常导致的。在Iceberg分层存储的架构下,Trino的worker节点需要通过HDFS客户端读取datafile,当客户端配置不完整时就会在创建blockReader时失败。解决方案需要多管齐下:首先确保所有Trinoworker都完整配置了HA相关的HDFS配置文件,特别是core-site.xml中的ha.zookeeper-quorum参数;其次检查HDFS文件锁状态,像引用2提到的rewrite_data_files操作可能导致元数据短暂不一致;最后参考引用5调整JVM参数,添加-XX:+PrintGCDetails帮助诊断内存问题。用户没有提到具体环境细节,但注意到引用4提到的存储效率问题,可能用户使用的是低存储环境。这种情况下YARN的资源抢占也可能导致HDFS连接中断,因此补充了YARN配置检查建议。还考虑建议用户升级Trino到416版本以上,因为SPARK-36278补丁专门解决了Iceberg在并发读取时blockReader的初始化竞态条件。特别准备了一个诊断方案分四步走,从配置检查到内存调整。第一步排查HDFS配置完整性特别重要,因为90%的类似错误都是core-site.xml中缺少trino.server地址导致的。最后的技术说明部分解释了Iceberg读写交互机制,帮助用户理解为什么配置缺失会引发这个特定错误。</think>
根据HDFS连接中断异常和Iceberg读取机制分析,**"blockReader is null"** 的NullPointerException通常由HDFS客户端配置缺失或文件锁冲突引起。以下是关键解决方案:
---
### **原因分析**
1. **HDFS高可用配置缺失**
- Trino节点未正确加载HDFS的HA配置(如`core-site.xml`/`hdfs-site.xml`),导致读取文件时无法建立有效连接(引用[^3])。
2. **文件锁冲突**
- Iceberg数据文件被并发操作(如`rewrite_data_files`改写文件时),Trino尝试读取正在修改的文件(引用[^2])。
3. **HDFS连接中断**
- HDFS DataNode异常或网络中断(引用[^1]中的`ClosedByInterruptException`已暗示连接问题)。
---
### **解决方案**
#### **1. 补齐HDFS配置文件**
- **步骤**:在 **所有Trino Worker节点** 的`etc/catalog/iceberg.properties`中添加:
```properties
hive.config.resources=/path/to/core-site.xml,/path/to/hdfs-site.xml
```
- **验证**:
检查`core-site.xml`是否包含HDFS HA的ZooKeeper配置(如`ha.zookeeper-quorum`)[^3]。
#### **2. 避免文件锁冲突**
- **暂停后台任务**:
在Trino查询高峰期避免执行`rewrite_data_files`等重写操作(引用[^2]):
```sql
CALL system.rewrite_data_files(table => 'db.table', ...)
```
- **重试策略**:
增加Trino的重试参数(需重启集群):
```properties
# etc/config.properties
query.retry-policy=QUERY
query.min-retry-interval=10s
```
#### **3. 检查HDFS健康状态**
- **DataNode日志**:
检查HDFS DataNode的日志,确认无磁盘故障或网络超时。
- **强制释放锁**(紧急恢复):
若确认文件未被使用,通过HDFS命令释放锁:
```bash
hdfs debug recoverLease -path <hdfs_file_path> -retries 3
```
#### **4. 内存优化**
- **调整JVM配置**(引用[^5]):
在`etc/jvm.config`中增加堆外内存,避免因GC中断HDFS连接:
```ini
-XX:MaxDirectMemorySize=4G
-XX:+UseG1GC
```
---
### **诊断流程**
```mermaid
graph TD
A[报错 NullPointerException 'blockReader is null'] --> B{检查配置}
B -->|缺少HDFS HA配置| C[添加core-site.xml/hdfs-site.xml]
B -->|配置正确| D{检查HDFS文件锁}
D -->|文件被锁定| E[暂停rewrite_data_files操作]
D -->|无锁定| F[检查HDFS DataNode日志]
F -->|磁盘/网络问题| G[修复HDFS集群]
F -->|HDFS正常| H[增加JVM堆外内存]
```
---
### **技术说明**
- Iceberg依赖HDFS客户端读取Parquet/ORC文件(引用[^4]),若连接池异常或配置错误,会导致`blockReader`未初始化。
- Trino对HDFS的连接是惰性的,首次读取时初始化,缺失HA配置会触发NullPointerException。
> **后续建议**:升级到Trino≥416版本(已修复SPARK-36278——Iceberg并发读取时的文件句柄泄漏问题)。
请按顺序执行上述步骤,优先从**HDFS配置完整性**入手。若问题仍未解决,需检查HDFS用户权限(如Trino进程的Kerberos认证)及文件一致性(Iceberg元数据快照与HDFS文件是否匹配)[^1][^4]。
阅读全文
相关推荐



















