Data Recipes: Apache Pig 0.8 with Cloudera cdh3

Wednesday, January 19, 2011

Apache Pig 0.8 with Cloudera cdh3

So it's January and Cloudera hasn't released pig 0.8 as a debian package yet. Too bad. Turns out for the particular project I'm working on it's important to have a custom partioner, only available in pig 0.8. Also, I'd like to make use of the HbaseStorage load and storefuncs. Also, only available in 0.8. Anyhow, here's how I got it working with my current install of Hadoop (cdh3):

Get Pig

Go the the Pig releases page here and download the apache release for pig-0.8

Install Pig

Skip this part if you don't care (ie. you're going to put wherever you want and don't give a flip what my opinion is on where it should go). It's usually a good idea to put things you download and install yourself in /usr/local/share/ so it doesn't conflict with /usr/lib/ when you apt-get install it. So go ahead and unpack the downloaded archive into that directory.

As an example (for those of us just getting familiar):


$: wget http://apache.mesi.com.ar//pig/pig-0.8.0/pig-0.8.0.tar.gz
$: tar -zxvf pig-0.8.0.tar.gz
$: sudo mv pig-0.8.0 /usr/local/share/
$: sudo ln -s /usr/local/share/pig-0.8.0 /usr/local/share/pig

Perform Pig Surgery

As it stands your new pig install will not work with cloudera hadoop. Let's fix that.

1. Nuke the current pig jar and rebuild without hadoop


$: sudo rm pig-0.8.0-core.jar 
$: sudo ant jar-withouthadoop

2. Add these lines to bin/pig (I don't think it matters where, I put mine before PIG_CLASSPATH is set):


# Add installed version of Hadoop to classpath
HADOOP_HOME=${HADOOP_HOME:-/usr/lib/hadoop}
. $HADOOP_HOME/bin/hadoop-config.sh

for jar in $HADOOP_HOME/hadoop-core-*.jar $HADOOP_HOME/lib/* ; do
   CLASSPATH=$CLASSPATH:$jar
done
if [ ! -z "$HADOOP_CLASSPATH" ] ; then
  CLASSPATH=$CLASSPATH:$HADOOP_CLASSPATH
fi
if [ ! -z "$HADOOP_CONF_DIR" ] ; then
  CLASSPATH=$CLASSPATH:$HADOOP_CONF_DIR
fi

3. Nuke the build dir and rename pig-withouthadoop.jar


$: sudo mv pig-withouthadoop.jar pig-0.8.0-core.jar
$: sudo rm -r build

4. Test it out


$: bin/pig
2011-01-19 13:49:07,766 [main] INFO  org.apache.pig.Main - Logging error messages to: /usr/local/share/pig-0.8.0/pig_1295466547762.log
2011-01-19 13:49:07,959 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2011-01-19 13:49:08,163 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
grunt>

You can try typing things like 'ls' in the grunt shell to make sure it sees your HDFS. Hurray.

25 comments:

AnonymousJanuary 25, 2011 at 5:39 PM
FYI: it matters where you put the stuff in step 2. Before setting PIG_CLASSPATH is a good spot :-)
ReplyDelete
Replies
UnknownFebruary 23, 2011 at 7:28 PM
Nice article
ReplyDelete
Replies
YakovAugust 25, 2011 at 2:15 PM
Another idea is to use Cloudera's version of Pig, to be found at http://nightly.cloudera.com/cdh/3/
ReplyDelete
Replies
Pierre-LucSeptember 7, 2011 at 9:52 AM
Thank you very much for this :). Saved me a bunch of time.
ReplyDelete
Replies
UnknownSeptember 4, 2018 at 3:21 AM
nice blog
data science training in bangalore
hadoop training in training
python online training
ReplyDelete
Replies
UnknownSeptember 24, 2018 at 3:26 AM
Gaining Python certifications will validate your skills and advance your career.
pythoncertification
ReplyDelete
Replies
UnknownDecember 25, 2018 at 8:58 PM
Nice tips. Very innovative... Your post shows all your effort and great experience towards your work Your Information is Great if mastered very well.
python Training institute in Pune
python Training institute in Chennai
python Training institute in Bangalore
ReplyDelete
Replies
priyaJanuary 8, 2019 at 3:44 AM
This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me..
Data Science Training in Chennai
Data Science course in anna nagar
Data Science course in chennai
Data science course in Bangalore
Data Science course in marathahalli
ReplyDelete
Replies
rohiniJanuary 10, 2019 at 2:36 AM
Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.Best Devops Training in pune
Microsoft azure training in Bangalore
Power bi training in Chennai
ReplyDelete
Replies
kevin antonyJanuary 10, 2019 at 9:49 PM
Superb. I really enjoyed very much with this article here. Really it is an amazing article I had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article. thank you for sharing such a great blog with us.
rpa training in bangalore
best rpa training in bangalore
rpa training in pune | rpa course in bangalore
rpa training in chennai
ReplyDelete
Replies
PriyankaMay 29, 2019 at 11:00 PM
Attend The Python training in bangalore From ExcelR. Practical Python training in bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Python training in bangalore.
python training in bangalore
ReplyDelete
Replies
AnonymousOctober 9, 2019 at 3:42 AM
Visit for AI training in Bangalore:- Artificial Intelligence training in Bangalore
ReplyDelete
Replies
deivaAugust 21, 2020 at 6:41 AM
This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me..
data science training in chennai

data science training in omr

android training in chennai

android training in omr

devops training in chennai

devops training in omr

artificial intelligence training in chennai

artificial intelligence training in omr
ReplyDelete
Replies
EXCELRSeptember 17, 2020 at 3:25 AM
I am looking for and I love to post a comment that "The content of your post is awesome" Great work! data science courses
ReplyDelete
Replies
Oxygen AcademyJune 21, 2021 at 4:34 AM
I need to thank you for this particularly fantastic article. I definitely really liked every part of it.Educational Institute in Visakhapatnam.
ReplyDelete
Replies
Oxygen AcademyJune 21, 2021 at 5:13 AM
Such a very useful article. Very interesting to read this article. I would like to thank you for the efforts you had made for writing this awesome article.Cyber Security in Visakhapatnam. Cyber Security near me
ReplyDelete
Replies
BhuvanaJuly 20, 2021 at 2:33 AM
Awesome article. I enjoyed reading your articles. this can be really a good scan for me. wanting forward to reading new articles. maintain the nice work!

DevOps Training in Hyderabad
ReplyDelete
Replies
DiyalabsJuly 23, 2021 at 12:28 AM
I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.

linen sarees
silk cotton sarees
dhoti for men
silk shirts for men
designer silk sarees

ReplyDelete
Replies
Clubf7September 20, 2021 at 11:59 PM
Excellent. Keep up the good work.
Best Gym in Visakhapatnam
ReplyDelete
Replies
NavyasriOctober 27, 2021 at 2:37 AM
“Great share!”
Best Cyber Security Company
Best Digital Marketing Services
ReplyDelete
Replies
ManeeshaDecember 28, 2021 at 2:31 AM
Very useful post. This is my first time i visit here. I found so many interesting stuff in your blog especially its discussion. Really its great article. Keep it up.
data analytics training in hyderabad
ReplyDelete
Replies
David FincherJanuary 26, 2022 at 10:02 PM

This post is so interactive and informative.keep update more information...
RPA Training in Velachery
RPA Training in Chennai
ReplyDelete
Replies
Pavithra DeviMarch 2, 2022 at 11:06 PM
This post is so interactive and informative.keep update more information…
German Classes in Anna Nagar
German Classes in chennai

ReplyDelete
Replies
traininginstituteMarch 15, 2022 at 1:48 AM
Great tips and very easy to understand. This will definitely be very useful for me when I get a chance to start my blog.
full stack development course

ReplyDelete
Replies
manashaApril 7, 2022 at 4:31 AM
Great post. keep sharing such a worthy information.
content writing course in chennai
online content writing course
ReplyDelete
Replies

Add comment

mathjax