Friday, July 3, 2015

Analyze Twitter Tweets Using Hadoop , HIve - Part 1

We will be looking at how to analyze your own tweets from twitter and find out interesting facts and information about your tweets as well as learn about how to do so in a step wise step guide.

Technologies That we will be using  

We will be using the following technologies/software :

  • Hadoop ( HDFS - to store the database )
  • Hive ( to perform Queries on the data )
  • Java ( To develop the software that will access the data using JDBC and plot charts )
  • Eclipse ( IDE to make the development process easier ) 
  • Python (To properly parse the input )
What will our Final output look like: 


as you can see we in the end we will be able to create a Java Program that will have a GUI that can be used to show us various plots and pie charts after analyzing the tweets. 

Prerequisites : 

Well , I hope that Hadoop and Hive are running already in your system. If not please do that first as the project wont work if you don't have those, And if they are installed and you are able to use them then you can carry one reading this tutorial series.  I can do a tutorial on how to install and configure Hadoop and hive in a single cluster system ( on a single PC ) but lots of tutorial are already available in the internet so you can just Google it. ( If I get enough request I will do a step by step Installation Instruction in the future , for now Google it. ) 

 To cut the long story short you should have Hive and Hadoop Running on your system. 

Here is a list of tutorials you can follow to install and configure hive and Hadoop in your system
So if you are all set and confident lets do this to confirm at least we are on the right track and we can proceed further.

start the hadoop server using the command start-all.sh and then open up terminal and type 
hive
you should get a console like 
hive>
type show tables;

you should see Ok or the list of tables that are there. 

If its all set and done we can proceed to the second part of the tutorial series.

P.S : I had done this as part of the Industrial Training In Sikkim Manipal Institute of Technology,  and I was a member of a team which consisted of Prince Kumar and Payash Pradhan and myself. I would like to thank Amrit Chhetri Sir for training us on Big Data and helping us with the project.   

No comments:

Post a Comment

Comments and Back Links are made here to follow please Read our comments policy before writing any comments.