hadoop-Hive加载带引号的字段中带有逗号的CSV
我正在尝试将CSV文件加载到Hive表中,如下所示:
CREATE TABLE mytable
(
num1 INT,
text1 STRING,
num2 INT,
text2 STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
LOAD DATA LOCAL INPATH '/data.csv'
OVERWRITE INTO TABLE mytable;
csv以逗号(,)分隔,如下所示:
1, "some text, with comma in it", 123, "more text"
由于第一个字符串中有一个',',这将返回损坏的数据。
有没有办法设置文本定界符或使Hive忽略字符串中的','?
我无法更改csv的定界符,因为它是从外部来源获取的。
6个解决方案
33 votes
问题是Hive无法处理引用的文本。 您要么需要通过更改字段之间的定界符来预处理数据(例如,使用Hadoop流作业),要么还可以尝试使用自定义CSV SerDe,后者使用OpenCSV来解析文件。
Lorand Bendig answered 2020-06-23T22:25:33Z
32 votes
如果可以重新创建或解析输入数据,则可以为CREATE TABLE指定转义字符:
ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';
将接受此行作为4个字段
1,some text\, with comma in it,123,more text
libjack answered 2020-06-23T22:25:57Z
22 votes
从Hive 0.14开始,CSV SerDe是Hive安装的标准部分
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
(请参阅:[https://cwiki.apache.org/confluence/display/Hive/CSV+Serde)]
wrschneider answered 2020-06-23T22:26:26Z
0 votes
将定界符保留在单引号中将起作用。
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
这会起作用
suyash answered 2020-06-23T22:26:50Z
0 votes
在以'\;'结尾的FIELDS中添加反斜杠
例如:
CREATE TABLE demo_table_1_csv
COMMENT 'my_csv_table 1'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\;'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'your_hdfs_path'
AS
select a.tran_uuid,a.cust_id,a.risk_flag,a.lookback_start_date,a.lookback_end_date,b.scn_name,b.alerted_risk_category,
CASE WHEN (b.activity_id is not null ) THEN 1 ELSE 0 END as Alert_Flag
FROM scn1_rcc1_agg as a LEFT OUTER JOIN scenario_activity_alert as b ON a.tran_uuid = b.activity_id;
我已经测试过了,而且效果很好。
Mantej Singh answered 2020-06-23T22:27:19Z
0 votes
ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE Serde为我工作。 我的分隔符是“ |” 并且其中一列用双引号引起来。
查询:
CREATE EXTERNAL TABLE EMAIL(MESSAGE_ID STRING, TEXT STRING, TO_ADDRS STRING, FROM_ADDRS STRING, SUBJECT STRING, DATE STRING)
ROW FORMAT SERDE 'ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE'
WITH SERDEPROPERTIES (
"SEPARATORCHAR" = "|",
"QUOTECHAR" = "\"",
"ESCAPECHAR" = "\""
)
STORED AS TEXTFILE location '/user/abc/csv_folder';
vivesh saladi answered 2020-06-23T22:27:43Z