Hadoop – Analysis Log Files (Java)
The log file looks like this:
Time stamp,activity,-,User,-,id,-,data
—
2013-01-08T16:21:35.561+0100,reminder,-,User1234,-,131235467,-,-
2013-01-02T15:57:24.024+0100,order,-,User1234,-,-,-,{items:[{"prd":"131235467","count": 5, "amount": 11.6},{"prd": "13123545", "count": 1, "amount": 55.99}], oid: 5556}
2013-01-08T16:21:35.561+0100,login,-,User45687,-,143435467,-,-
2013-01-08T16:21:35.561+0100,reminder,-,User45687,-,143435467,-,-
2013-01-08T16:21:35.561+0100,order,-,User45687,-,-,-,{items:[{"prd":"1315467","count": 5, "amount": 11.6},{"prd": "133545", "count": 1, "amount": 55.99}], oid: 5556}
...
...
Edit
Specific examples in this log:
User1234
has a reminder – this
reminder
has id
=131235467
, after which he makes an order
with the following data
: {items:[{"prd":"131235467","count": 5, "amount": 11.6},{"prd": "13123545", "count": 1, "amount": 55.99}], oid: 5556}
In this case, the id
and prd
of data
are the same, so I want to summarize the count
* amount
-> in this case as 5*11.6 = 58 and output as
User 1234 Prdsum: 58
User45687
also placed an order
, but he didn’t receive an alert
, so there is no summary data for him
Output:
User45687 Prdsum: 0
Final output of this log:
User 1234 Prdsum: 58
User45687 Prdsum: 0
My question is: how do I compare (?) This value –> id
and prd
in data
?
The key is the user. Is it useful to customize Writable -> value= (id, data). I need some ideas.
Solution
I recommend that you get the raw output sum as the result of the first pass of a Hadoop job, so at the end of the Hadoop job, you get something like this:
User1234 Prdsum: 58
User45687 Prdsum: 0
Then there is a second Hadoop job (or standalone job) that compares the various values and generates another report.
Do you need “presence” as part of your first Hadoop job? If so, then you will need to keep a HashMap or HashTable in the mapper or reducer to store the values of all keys (in this case, the user) for comparison – but this is not a good setup, IMHO. It’s best to just aggregate in one Hadoop job and compare in another.