Skip to main content



Hive Parse JSON with Array Columns and Explode it in to Multiple rows.


 Say we have a JSON String like below - 

{
"billingCountry":"US"
"orderItems":[
      {
         "itemId":1,
         "product":"D1"
      },
  {
         "itemId":2,
         "product":"D2"
      }
   ]
}

And, our aim is to get output parsed like below - 

itemId

product

1

D1

2

D2

 

First, We can parse JSON as follows to get JSON String

  • get_json_object(value, '$.orderItems.itemId') as itemId
  • get_json_object(value, '$.orderItems.product') as product
Second, Above will result String value like "[1,2]". We want to convert it to Array as follows -
  • split(regexp_extract(get_json_object(value, '$.orderItems.itemId'),'^\\["(.*)\\"]$',1),'","') as itemId
  • split(regexp_extract(get_json_object(value, '$.orderItems.product'),'^\\["(.*)\\"]$',1),'","') as product
Third, We need to explode the multiple array columns to get final result. This can be done as follows-
  • lateral view posexplode(product) tf as pos,val
Fourth, Now we can select individual elements of array using index / position "pos"
  • product[pos] as product,
  • itemId[pos] as itemId

Summary SQL - 

select
product[pos] as product,
itemId[pos] as itemId
from(
   SELECT 
    split(regexp_extract(get_json_object(value, '$.orderItems.product'),'^\\["(.*)\\"]$',1),'","') as product, split(regexp_extract(get_json_object(value, '$.orderItems.itemId'),'^\\["(.*)\\"]$',1),'","') as itemId 
from test_json) T1
lateral view posexplode(product) tf as pos,val

Comments