Loading JSON file with repeating elements into hive table

Multi tool use
Loading JSON file with repeating elements into hive table
Given this simple JSON file:
{
"EVT": {
"EVT_ID": "12345",
"LINES": {
"LINE": {
"LINE_NUM" : 1,
"AMT" : 100,
"EVT_DT" : "2018-01-01"
},
"LINE": {
"LINE_NUM" : 2,
"AMT" : 150,
"EVT_DT" : "2018-01-02"
}
}
}
}
We need to load that into a hive table. The ultimate goal is to flatten the json, something like this:
+--------+----------+-----+------------+
| EVT_ID | Line_Num | Amt | Evt_Dt |
+--------+----------+-----+------------+
| 12345 | 1 | 100 | 2018-01-01 |
| 12345 | 2 | 150 | 2018-01-02 |
+--------+----------+-----+------------+
Here's my current DDL for the table:
create table foo.bar (
`EVT` struct<
`EVT_ID`:string,
`LINES`:struct<
LINE: struct<`LINE_NUM`: int,`AMT`:int,`EVT_DT`:string>
>
>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
It seems like the second "line" is overwriting the first. A simple select * from the table returns;
{"evt_id":"12345","lines":{"line":{"line_num":2,"amt":150,"evt_dt":"2018-01-02"}}}
What am I doing wrong?
1 Answer
1
The JSON and table definition are wrong. "Repeating elements" is an Array. LINES should be array<struct>
, not struct<struct>
(note square brackets):
array<struct>
struct<struct>
{
"EVT": {
"EVT_ID": "12345",
"LINES": [
{
"LINE_NUM" : 1,
"AMT" : 100,
"EVT_DT" : "2018-01-01"
},
{
"LINE_NUM" : 2,
"AMT" : 150,
"EVT_DT" : "2018-01-02"
}
]
}
}
And you do not need this "LINE":
also, because it is just an array element
"LINE":
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Argh, I should have known that, thanks! Of course that's not working for me either, but that's a job for another question.
– Andrew
Jul 2 at 16:29