mathjax

Monday, September 9, 2013

Get Pig LogicalPlan

Recently I've been wanting to get ahold of the logical plan (a graph representation) for a pig script without running it. The largest reason is that the logical plan is a fairly language and platform agnostic representation of a dataflow. Once you have the logical plan I can think of several fun things you could do with it:


  • Serialize it as JSON and send it to any number of arbitrary tools
  • Visualize it in a web browser
  • Edit it with a web app
  • Compile it into an execution (physical) plan for arbitrary (non-hadoop map-reduce) backend frameworks that make sense (storm, s4, spark) 
Ok, so maybe those are the fun things I actually plan on doing with it, but what's the difference?

Problem

Pig doesn't make it easy to get this. After spending several hours digging through the way pig parses and runs a pig script I've come away somewhat shaken up. The parsing logic is deeply coupled with the execution logic. Yes, yes, this is supposed to change as we go forward, eg PIG-3419, but what about in the mean time?

Hack/Solution

So, I've written this little jruby script to return the LogicalPlan for a pig script. Right now all it does is exactly the same as putting an 'EXPLAIN' operator in your script. However, since it exposes the LogicalPlan, you could easily extend this to do whatever you like with it.


8 comments:

  1. I was able to get the logical plan in textual format. Can anybody suggest a java library to convert the generated logical plan to a directed-acyclic-graph in graphical form

    ReplyDelete
  2. Its a wonderful post and very helpful, thanks for all this information. You are including better information regarding this topic in an effective way.Thank you so much

    Installment loans
    Payday loans
    Title loans
    Cash Advances

    ReplyDelete
  3. AWS Training in Bangalore - Live Online & Classroom
    myTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.

    IOT Training in Bangalore - Live Online & Classroom
    IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Reading data through the sensors and processing it with applications sitting in the cloud and thereafter passing the processed data to generate different kind of output is the motive of the complete curricula. Students are made to understand the type of input devices and communications among the devices in a wireless media.

    ReplyDelete
  4. IOT Training in Bangalore - Live Online & Classroom
    Iot Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Iot Training in Bangalore

    ReplyDelete
  5. Gaining Python certifications will validate your skills and advance your career.
    python certification

    ReplyDelete
  6. myTectra Placement Portal is a Web based portal brings Potentials Employers and myTectra Candidates on a common platform for placement assistance

    ReplyDelete