Get Pig
Go the the Pig releases page here and download the apache release for pig-0.8
Install Pig
Skip this part if you don't care (ie. you're going to put wherever you want and don't give a flip what my opinion is on where it should go). It's usually a good idea to put things you download and install yourself in /usr/local/share/
As an example (for those of us just getting familiar):
$: wget http://apache.mesi.com.ar//pig/pig-0.8.0/pig-0.8.0.tar.gz
$: tar -zxvf pig-0.8.0.tar.gz
$: sudo mv pig-0.8.0 /usr/local/share/
$: sudo ln -s /usr/local/share/pig-0.8.0 /usr/local/share/pig
Perform Pig Surgery
As it stands your new pig install will not work with cloudera hadoop. Let's fix that.
1. Nuke the current pig jar and rebuild without hadoop
$: sudo rm pig-0.8.0-core.jar
$: sudo ant jar-withouthadoop
2. Add these lines to bin/pig (I don't think it matters where, I put mine before PIG_CLASSPATH is set):
# Add installed version of Hadoop to classpath
HADOOP_HOME=${HADOOP_HOME:-/usr/lib/hadoop}
. $HADOOP_HOME/bin/hadoop-config.sh
for jar in $HADOOP_HOME/hadoop-core-*.jar $HADOOP_HOME/lib/* ; do
CLASSPATH=$CLASSPATH:$jar
done
if [ ! -z "$HADOOP_CLASSPATH" ] ; then
CLASSPATH=$CLASSPATH:$HADOOP_CLASSPATH
fi
if [ ! -z "$HADOOP_CONF_DIR" ] ; then
CLASSPATH=$CLASSPATH:$HADOOP_CONF_DIR
fi
3. Nuke the build dir and rename pig-withouthadoop.jar
$: sudo mv pig-withouthadoop.jar pig-0.8.0-core.jar
$: sudo rm -r build
4. Test it out
$: bin/pig
2011-01-19 13:49:07,766 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/share/pig-0.8.0/pig_1295466547762.log
2011-01-19 13:49:07,959 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2011-01-19 13:49:08,163 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
grunt>
You can try typing things like 'ls' in the grunt shell to make sure it sees your HDFS. Hurray.