mathjax

Wednesday, January 19, 2011

Apache Pig 0.8 with Cloudera cdh3

So it's January and Cloudera hasn't released pig 0.8 as a debian package yet. Too bad. Turns out for the particular project I'm working on it's important to have a custom partioner, only available in pig 0.8. Also, I'd like to make use of the HbaseStorage load and storefuncs. Also, only available in 0.8. Anyhow, here's how I got it working with my current install of Hadoop (cdh3):

Get Pig


Go the the Pig releases page here and download the apache release for pig-0.8

Install Pig


Skip this part if you don't care (ie. you're going to put wherever you want and don't give a flip what my opinion is on where it should go). It's usually a good idea to put things you download and install yourself in /usr/local/share/ so it doesn't conflict with /usr/lib/ when you apt-get install it. So go ahead and unpack the downloaded archive into that directory.

As an example (for those of us just getting familiar):

$: wget http://apache.mesi.com.ar//pig/pig-0.8.0/pig-0.8.0.tar.gz
$: tar -zxvf pig-0.8.0.tar.gz
$: sudo mv pig-0.8.0 /usr/local/share/
$: sudo ln -s /usr/local/share/pig-0.8.0 /usr/local/share/pig


Perform Pig Surgery


As it stands your new pig install will not work with cloudera hadoop. Let's fix that.

1. Nuke the current pig jar and rebuild without hadoop

$: sudo rm pig-0.8.0-core.jar
$: sudo ant jar-withouthadoop


2. Add these lines to bin/pig (I don't think it matters where, I put mine before PIG_CLASSPATH is set):

# Add installed version of Hadoop to classpath
HADOOP_HOME=${HADOOP_HOME:-/usr/lib/hadoop}
. $HADOOP_HOME/bin/hadoop-config.sh

for jar in $HADOOP_HOME/hadoop-core-*.jar $HADOOP_HOME/lib/* ; do
CLASSPATH=$CLASSPATH:$jar
done
if [ ! -z "$HADOOP_CLASSPATH" ] ; then
CLASSPATH=$CLASSPATH:$HADOOP_CLASSPATH
fi
if [ ! -z "$HADOOP_CONF_DIR" ] ; then
CLASSPATH=$CLASSPATH:$HADOOP_CONF_DIR
fi


3. Nuke the build dir and rename pig-withouthadoop.jar

$: sudo mv pig-withouthadoop.jar pig-0.8.0-core.jar
$: sudo rm -r build


4. Test it out

$: bin/pig
2011-01-19 13:49:07,766 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/share/pig-0.8.0/pig_1295466547762.log
2011-01-19 13:49:07,959 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2011-01-19 13:49:08,163 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
grunt>

You can try typing things like 'ls' in the grunt shell to make sure it sees your HDFS. Hurray.

25 comments:

  1. FYI: it matters where you put the stuff in step 2. Before setting PIG_CLASSPATH is a good spot :-)

    ReplyDelete
  2. Another idea is to use Cloudera's version of Pig, to be found at http://nightly.cloudera.com/cdh/3/

    ReplyDelete
  3. Thank you very much for this :). Saved me a bunch of time.

    ReplyDelete
  4. Gaining Python certifications will validate your skills and advance your career.
    pythoncertification

    ReplyDelete
  5. Nice tips. Very innovative... Your post shows all your effort and great experience towards your work Your Information is Great if mastered very well.
    python Training institute in Pune
    python Training institute in Chennai
    python Training institute in Bangalore

    ReplyDelete
  6. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.. 
    Data Science Training in Chennai
    Data Science course in anna nagar
    Data Science course in chennai
    Data science course in Bangalore
    Data Science course in marathahalli

    ReplyDelete
  7. Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.Best Devops Training in pune
    Microsoft azure training in Bangalore
    Power bi training in Chennai

    ReplyDelete
  8. Superb. I really enjoyed very much with this article here. Really it is an amazing article I had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article. thank you for sharing such a great blog with us.
    rpa training in bangalore
    best rpa training in bangalore
    rpa training in pune | rpa course in bangalore
    rpa training in chennai

    ReplyDelete
  9. Attend The Python training in bangalore From ExcelR. Practical Python training in bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Python training in bangalore.
    python training in bangalore

    ReplyDelete
  10. I am looking for and I love to post a comment that "The content of your post is awesome" Great work! data science courses

    ReplyDelete
  11. I need to thank you for this particularly fantastic article. I definitely really liked every part of it.Educational Institute in Visakhapatnam.

    ReplyDelete
  12. Such a very useful article. Very interesting to read this article. I would like to thank you for the efforts you had made for writing this awesome article.Cyber Security in Visakhapatnam. Cyber Security near me

    ReplyDelete
  13. Awesome article. I enjoyed reading your articles. this can be really a good scan for me. wanting forward to reading new articles. maintain the nice work!

    DevOps Training in Hyderabad

    ReplyDelete
  14. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.


    linen sarees
    silk cotton sarees
    dhoti for men
    silk shirts for men
    designer silk sarees

    ReplyDelete
  15. Very useful post. This is my first time i visit here. I found so many interesting stuff in your blog especially its discussion. Really its great article. Keep it up.
    data analytics training in hyderabad

    ReplyDelete

  16. This post is so interactive and informative.keep update more information...
    RPA Training in Velachery
    RPA Training in Chennai

    ReplyDelete
  17. This post is so interactive and informative.keep update more information…
    German Classes in Anna Nagar
    German Classes in chennai



    ReplyDelete
  18. Great tips and very easy to understand. This will definitely be very useful for me when I get a chance to start my blog.
    full stack development course

    ReplyDelete