Java – How do I generate JSON objects from unstructured data in Hadoop MR?

How do I generate JSON objects from unstructured data in Hadoop MR?… here is a solution to the problem.

How do I generate JSON objects from unstructured data in Hadoop MR?

I have a dataset
parent , child
————–
One, two
One, three
b,d
b,e
c,f
c,g
g,h
Gram, me
p,q
p,r
q,s
q,t
I want to convert to a JSON object. I’m trying to do it but don’t know the right way. So I just created a tree structure of a dataset that might help solve it. Can you give me advice on what to do to achieve this?
enter image description here

I’m facing a problem, how to identify the parent node. If there are two trees, as shown in the image. Please suggest me what to do.

This output should be

{
    a:{
            b: {d,e},
            c: { g: {h,i}, f }
        },
    p:{     q:{s,t}, r }
}

Solution

First, I’ll start by looking for all the roots. This can be done by using some existing modules (I don’t know which one…) This is done from the input pair building tree structure, but this can be done more simply. If you create two groups: one with all parents, and the other with all children, and then subtract the parents from the children – you will get the root.

Then, starting from the root, you can build a dictionary and then convert it to JSON (sorry, I’m not good at java, so write this in pseudocode).

for r in roots:
    ## init dictionary key with value = dictionary or Null  
    dict[r] = build_tree(r, pairs)

So, obviously, build_tree() should be a recursive update of the dictionary, and at each stage you create a new dictionary whose key is a child of the input parent, with a value equal to build_tree( ) or null if no more children are found.
At the final stage, there should certainly be a way to save the dictionary as JSON.

Note: The above assumes that you don’t have a circle, but if you have – you need a more complex algorithm and make some assumptions about how you do, for example, stop recursion and make “somewhat” reference to the beginning of the circle

Related Problems and Solutions