需求:获取数据进入kafka的时间,并将此时间放入json数据中,flume采集数据按照此时间存放在hdfs中。
解决思路:
1.使用时间戳拦截器获取进入kafka的时间,失败!
kafkasource,时间戳拦截器,filechannel,hdfssink
原因:时间戳拦截器是在channel阶段处理数据,在使用kafkasource获取kafka数据后,在header中没有数据进入kafka的时间
2.使用自定义flume source,成功!
上代码
public class FlumeSourceDemo extends AbstractSource implements Configurable, PollableSource {
private String GROUP_ID;
private String KAFKA_SERVER;
private String KAFKA_TOPIC;
private Properties props;
private String KEY_DES;
private String VALUE_DES;
private KafkaConsumer consumer;
private static final Logger logger = LoggerFactory.getLogger("source日志");
//TODO source数据处理逻辑
@Override
public Status process() throws EventDeliveryException {
try {
//拉取数据1000毫秒(1秒)
ConsumerRecords<String, String> records = consumer.poll(1000);
//创建事件头信息
HashMap<String, String> hearderMap = new HashMap<>();
//创建事件
SimpleEvent event = new SimpleEvent();
//循环读取records的数据record
for (ConsumerRecord<String, String> record : records) {
String value = record.value();
//TODO logger.info("record的value:"+value);
//将value转成jsonobject
JSONObject jsonObject = JSON.parseObject(value);
//TODO logger.info("value转成jsonobject:"+jsonObject);
//获取record中的timestamp
String timestamp = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(record.timestamp());
//TODO logger.info("record的时间戳"+record.timestamp());
//TODO logger.info("时间戳"+timestamp);
//获取record中的action
String action = jsonObject.getString("action");
//TODO logger.info("取到的action"+action);
//将timestamp放入jsonobject
jsonObject.put("kafkaTime", timestamp);
//TODO logger.info("jsonobject的数据"+jsonObject);
String kafkaTime = jsonObject.toString();
//TODO logger.info("放入时间戳的数据(string格式):"+kafkaTime);
//给事件设置内容
event.setBody(kafkaTime.getBytes());
//给事件设置头信息
HashMap<String, String> map = new HashMap<>();
map.put("timestamp",Long.toString(record.timestamp()));
map.put("action", action);
//TODO logger.info("header的map:"+map);
event.setHeaders(map);
//TODO logger.info("+records中的event:"+event.toString()+",event数据:"+event.getBody());
//将事件写入channel
this.getChannelProcessor().processEvent(event);
}
return Status.READY;
}catch (Exception e){
e.printStackTrace();
return Status.BACKOFF;
}
}
//TODO source初始化
@Override
public void configure(Context context) {
}
@Override
public synchronized void start() {
super.start();
props = new Properties();
props.setProperty("bootstrap.servers", "****:9092,****:9092,****:9092");
props.setProperty("group.id", "flumesource");
props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
consumer = new KafkaConsumer(props);
consumer.subscribe(Collections.singleton("topic"));
}
@Override
public synchronized void stop() {
super.stop();
}
@Override
public long getBackOffSleepIncrement() {
return 0;
}
@Override
public long getMaxBackOffSleepInterval() {
return 0;
}
}
每条数据进入kafka的时间都在ConsumerRecord中,自定义source就是要获取ConsumerRecord
中timestamp的值。
注意点:放入event.headers中的时间戳:key:“timestamp”,value:Long.toString(record.timestamp())
record.timestamp()是long类型,往header放入时间时value是Long.toString,不能用String.valueof