## Monday, August 12, 2013

### Using Hadoop to Explore Chaos

Hadoop. Hadoop has managed to insinuate itself into practically every company with an engineering team and some data. If your company isn't using it, you know a company that is. Hell, it's why you're reading this to begin with. That being said, what you're probably doing with Hadoop is boring and uninspired. It's not your fault of course. Pretty much every example out there pigeonholes Hadoop into default business use cases like etl and data cleaning, basic statistics, machine learning, and GIS.

You know what though? Sometimes it's good to explore things that don't have an obvious business use case. Things that are weird. Things that are pretty. Things that are ridiculous. Things like dynamical systems and chaos. And, if you happen to find there are applicable tidbits along the way (*hint, skip to the problem outline section*), great, otherwise just enjoy the diversion.

## motivation

So what is a dynamical system? Dryly, a dynamical system is a fixed rule to describe how a point moves through geometric space over time. Pretty much everything that is interesting can be modeled as a dynamical system. Population, traffic flows, fireflies, and neurons can all be describe this way.

In most cases, you'll have a system of ordinary differential equations like this:

\begin{eqnarray*} \dot{x_{1}} & = & f_{1}(x_{1},\ldots,x_{n})\\ \vdots\\ \dot{x_{n}} & = & f_{n}(x_{1},\ldots,x_{n}) \end{eqnarray*}

For example, the Fitzhugh-Nagumo model (which models a biological neuron being zapped by an external current):

\begin{eqnarray*} \dot{v} & = & v-\frac{v^{3}}{3}-w+I_{{\rm ext}}\\ \dot{w} & = & 0.08(v+0.7-0.8w) \end{eqnarray*}

In this case $$v$$ represents the potential difference between the inside of the neuron and the outside (membrane potential), and $$w$$ corresponds to how the neuron recovers after it fires. There's also an external current $$I_{{\rm ext}}$$ which can model other neurons zapping the one we're looking at but could just as easily be any other source of current like a car battery. The numerical constants in the system are experimentally derived from looking at how giant squid axons behave. Basically, these guys in the 60's were zapping giant squid brains for science. Understand a bit more why I think your business use case is boring?

One of the simple ways you can study a dynamical system is to see how it behaves for a wide variety of parameter values. In the Fitzhugh-Nagumo case the only real parameter is the external current $$I_{{\rm ext}}$$. For example, for what values of $$I_{{\rm ext}}$$ does the system behave normally? For what values does it fire like crazy? Can I zap it so much that it stops firing altogether?

In order to do that you'd just decide on some reasonable range of currents, say $$(0,1)$$, break that range into some number of points, and simulate the system while changing the value of $$I_{{\rm ext}}$$ each time.

## chaos

There's a a lot of great ways to summarize the behavior of a dynamical system if you can simulate its trajectories. Simulated trajectories are, after all, just data sets. The way I'm going to focus on is calculation of the largest lyapunov exponent. Basically, all the lyapunov exponent says is, if I take two identical systems and start them going at slightly different places, how similarly do they behave?

For example, If I hook a car battery to two identical squid neurons at the same time, but one has a little bit of extra charge on it, does their firing stay in sync forever or do they start to diverge in time? The lyapunov exponent would measure the rate at which they diverge. If the two neurons fire close in time but don't totally sync up then the lyapunov exponent would be zero. If they eventually start firing at the same time then the lyapunov exponent is negative (they're not diverging, they're coming together). Finally, if they continually diverge from one another then the lyapunov exponent is positive.

As it turns out, a positive lyapunov exponent usually means the system is chaotic. No matter how close two points start out, they will diverge exponentially. What this means in practice is that, while I might have a predictive model (as a dynamical system) of something really cool like a hurricane, I simply can't measure it precisely enough to make a good prediction of where it's going to go. A really small measurement error, between where the hurricane actually is and where I measure it to be, will diverge exponentially. So my model will predict the hurricane heading into Texas when it actually heads into Louisanna. Yep. Chaos indeed.

## problem outline

So I'm going to compute the lyapunov exponent of a dynamical system for some range of parameter values. The system I'm going to use is the Henon Map:
\begin{eqnarray*}x_{n+1} & = & y_{n}+1-ax_{n}^{2}\\y_{n+1} & = & bx_{n}\end{eqnarray*}
I choose the Henon map for a few reasons despite the fact that it isn't modeling a physical system. One, it's super simple and doesn't involve time at all. Two, it's two dimensional so it's easy to plot it and take a look at it. Finally, it's only got two parameters meaning the range of parameter values will make up a plane (and not some n-dimensional hyperspace) so I can make a pretty picture.

What does Hadoop have to do with all this anyway? Well, I've got to break the parameter plane (ab-plane) into a set of coordinates and run one simulation per coordinate. Say I let $$a=[a_{min},a_{max}]$$ and $$b=[b_{min},b_{max}]$$ and I want to look $$N$$ unique $$a$$ values and $$M$$ unique $$b$$ values. That means I have to run $$N \times M$$ individual simulations!

Clearly, the situation gets even worse if I have more parameters (a.k.a a realistic system). However, since each simulation is independent of all the other simulations, I can benefit dramatically from simple parallelization. And that, my friends, is what Hadoop does best. It makes parallelization trivially simple. It handles all those nasty details (which distract from the actual problem at hand) like what machine gets what tasks, what to do about failed tasks, reporting, logging, and the whole bit.

So here's the rough idea:

1. Use Hadoop to split the n-dimensional (2D for this trivial example) space into several tiles that will be processed in parallel
2. Each split of the space is just a set of parameter values. Use these parameter values to run a simulation.
3. Calculate the lyapunov exponent resulting from each.
4. Slice the results, visualize, and analyze further (perhaps at higher resolution on a smaller region of parameter space), to understand under what conditions the system is chaotic. In the simple Henon map case I'll make a 2D image to look at.
The important silly detail is this. The input data here is minuscule in comparison to most data sets handled with Hadoop. This is NOT big data. Instead, the input data is a small file with n lines and can be thought of as a "spatial specification". It is the input format that explodes the spatial specification into the many individual tiles needed. In other words, Hadoop is not just for big data, it can be used for massively parallel scientific computing.

## implementation

Hadoop has been around for a while now. So when I implement something with Hadoop you can be sure I'm not going to sit down and write a java map-reduce program. Instead, I'll use Pig and custom functions for pig to hijack the Hadoop input format functionality. Expanding the rough idea in the outline above:

1. Pig will load a spatial specification file that defines the extent of the space to explore and with what granularity to explore it.
2. A custom Pig LoadFunc will use the specification to create individual input splits for each tile of the space to explore. For less parallelism than one input split per tile it's possible to specify the number of total splits. In this case the tiles will be split mostly evenly among the input splits.
3. The LoadFunc overrides Hadoop classes. Specifically: InputFormat (which does the work of expanding the space), InputSplit (which represents the set of one or more spatial tiles), and RecordReader (for deserializing the splits into useful tiles).
4. A custom EvalFunc will take the tuple representing a tile from the LoadFunc and use its values as parameters in simulating the system and computing the lyapunov exponent. The lyapunov exponent is the result.
And here is the pig script:

define LyapunovForHenon sounder.pig.chaos.LyapunovForHenon();

exponents = foreach points generate $0 as a,$1 as b, LyapunovForHenon($0,$1);

store exponents into 'data/result';


You can take a look at the detailed implementations of each component on github. See: LyapunovForHenon, RectangularSpaceLoader

## running

I want to explore the Henon map over a range where it's likely to be bounded (unbounded solutions aren't that interesting) and chaotic. Here's my input file:

$: cat data/space_spec 0.6,1.6,800 -1.0,1.0,800  Remember the system? \begin{eqnarray*}x_{n+1} & = & y_{n}+1-ax_{n}^{2}\\y_{n+1} & = & bx_{n}\end{eqnarray*} Well, the spatial specification says (if I let the first line represent $$a$$ and the second be $$b$$) that I'm looking at an $$800 \times 800$$ (or 640000 independent simulations) grid in the ab-plane where $$a=[0.6,1.6]$$ and $$b=[-1.0,1.0]$$ Now, these bounds aren't arbitrary. The Henon attractor that most are familiar with (if you're familiar with chaos and strange attractors in the least bit) occurs when $$a=1.4$$ and $$b=0.3$$. I want to ensure I'm at least going over that case. ## result With that, I just need to run it and visualize the results. $: cat data/result/part-m* | head
0.6	-1.0	9.132244649409043E-5
0.6	-0.9974968710888611	-0.0012539625419929572
0.6	-0.9949937421777222	-0.0025074937591903013
0.6	-0.9924906132665833	-0.0037665150764570965
0.6	-0.9899874843554444	-0.005032402237514987
0.6	-0.9874843554443055	-0.006299127065420516
0.6	-0.9849812265331666	-0.007566751054452304
0.6	-0.9824780976220276	-0.008838119048229768
0.6	-0.9799749687108887	-0.010113503950504331
0.6	-0.9774718397997498	-0.011392710785045064
\$: cat data/result/part-m* > data/henon-lyapunov-ab-plane.tsv


To visualize I used this simple python script to get:

The big swaths of flat white are regions where the system becomes unbounded. It's interesting that the bottom right portion has some structure to it that's possibly fractal. The top right portion, between $$b=0.0$$ and $$b=0.5$$ and $$a=1.0$$ to $$a=1.6$$ is really the only region on this image that's chaotic (where the exponent is non-negative and greater than zero). There's a lot more structure here to look at but I'll leave that to you. As a followup it'd be cool to zoom in on the bottom right corner and run this again.

## conclusion

So yes, it's possible to use Hadoop to do massively parallel scientific computing and avoid the question of big data entirely. Best of all it's easy.

The notion of exploding a space and doing something with each tile in parallel is actually pretty general and, as I've shown, super easy to do with Hadoop. I'll leave it to you to come up with your own way of applying it.

1. Very cool! Is it possible to use Hadoop and Pig's functionalities via Python?

Thanks for this post.

1. Yes. See: http://pig.apache.org/docs/r0.11.1/cont.html

2. This website is very helpful for the students who need info about the Hadoop courses.i appreciate for your post. thanks for shearing it with us. keep it up.

3. Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!

Hadoop Training in Chennai

4. Thank you so much for sharing this useful information about Hadoop, Here i gathered some new information keep updates...

5. Thanks for sharing this informative blog. If anyone wants to get Big Data Training Chennai visit fita academy located at Chennai, which offers best Hadoop Training Chennai with years of experienced professionals.

6. The information you posted here is useful to make my career better keep updates...If anyone want to get Cloud Computing Training in Chennai, Please visit FITA academy located at Chennai Velachery which offer best Cloud Computing Course in Chennai.

Cloud Computing Training Centers in Chennai

7. I gathered a lot of information through this article.Every example is easy to undestandable and explaining the logic easily.Thanks!AWS course chennai | AWS Certification in chennai | AWS Certification chennai

8. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

9. In sukere InfoTech we bolster a full chain advancement prepare from necessities definition, specification, engineering plan, coding, testing, approval, upkeep and support. Depending on your particular need our expert will take you through every period of arrangement giving you a sound direction on innovation and application choices.

web outlining in Chennai -Sukere infotechs

10. This is my first visit to your blog, your post made productive reading, thank you. dot net training in chennai

11. Thanks for sharing Valuable information. Greatful Info about hadoop. Really helpful. Keep sharing........... If it possible share some more tutorials.........

12. Informative article, just what I was looking for.seo services chennai

13. These provided information was really so nice,thanks for giving that post and the more skills to develop after refer that post. Your articles really impressed for me,because of all information so nice.

AWS Training in Chennai

SEO Training in Chennai

14. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
Hadoop Training in Chennai

15. it’s really nice and meanful. it’s really cool blog. Linking is very useful thing.you have really helped lots of people who visit blog and provide them usefull information.

16. I have seen a lot of blogs and Info. on other Blogs and Web sites But in this Hadoop Blog Information is useful very thanks for sharing it........

17. Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving..

Base SAS Training in Chennai

MSBI Training in Chennai

18. Before choosing a Job Oriented Training program it is important to evaluate your skills, interests, strength and weakness. Job Oriented Courses enable you to get a identity once you finish the same. Choose eNventsoft that suits you and make your career worthwhile.

19. Gud information about Using Hadoop to Explore Chaos.keep updating
microsoft dynamics crm course

20. now a days Hadoop has become the most effective course in the market In the year 2016 I had my PMP Certification in Chennai While I was under PMP Course I was able to know what will be the next updated Course that was Hadoop and by using hadoop to explore chaos is one element and good blog post Thankyou

21. Great Article, thank you for sharing this useful information!!

Linux Online Training India
Online devops Training India

22. Great Article, thank you for sharing this useful information!!

Linux Online Training India
Online devops Training India

23. Thank you very much for your good information Hadoop Admin Online Training Bangalore

24. Really it was an awesome article...very interesting to read..You have provided an nice article....Thanks for sharing..
Android Training in Chennai
Ios Training in Chennai

25. Thanks for sharing this blog. This very important and informative blog
Learned a lot of new things from your post! Good creation and HATS OFF to the creativity of your mind.
Very interesting and useful blog!
best Hadoop training in gurgaon

26. Australia Best Tutor is one of the best Online Assignment Help providers at an affordable price. Here All Learners or Students are getting best quality assignment help with reference and styles formatting.

Australia Best Tutor
Sydney, NSW, Australia
Call @ +61-730-407-305
Live Chat @ https://www.australiabesttutor.com

Our Services

Online assignment help Australia
my assignment help Australia
assignment help
help with assignment
Online instant assignment help
Online Assignment help Services

27. Great post. Thanks for sharing this useful information.
SEO Training In Chennai
SEO Training Institute In Chennai

28. It was really a nice article and i was really impressed by reading this Hadoop Administration Online Training Bnagalore

29. It was really a nice article and i was really impressed by reading this Hadoop Admin Online Course Hyderabad

30. Hello,
Really very good information sharing here, Appreciate your work, very informative blog on Hadoop. I just wanted to share information about The Best Hadoop Administration Certification | The Best MapReduce Certification.

31. The information which you have provided is very good. It is very useful who is looking for salesforce Online course Bangalore

32. If you are looking for Hadoop training in Hyderabad then definitely there are more good options available globally I have done googling for hadoop click on the link you can get the best institutes for your career.

33. Thanks for sharing this valuable information to our vision Full Stack Training in Hyderabad

34. Each department of CAD have specific programmes which, while completed could provide you with a recognisable qualification that could assist you get a job in anything design enterprise which you would really like.

AutoCAD training in Noida

AutoCAD training institute in Noida

Best AutoCAD training institute in Noida

35. AWS Training in Bangalore - Live Online & Classroom
myTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.

IOT Training in Bangalore - Live Online & Classroom
IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Reading data through the sensors and processing it with applications sitting in the cloud and thereafter passing the processed data to generate different kind of output is the motive of the complete curricula. Students are made to understand the type of input devices and communications among the devices in a wireless media.

36. Cloud Computing Training In Noida
Webtrackker is IT based company in many countries. Webtrackker will provide you a real time projects based training on Cloud Computing. If you are looking for the Cloud computing training in Noida then you can join the webtrackker technology.
Cloud Computing Training In Noida , Cloud Computing Training center In Noida , Cloud Computing Training institute In Noida ,

Webtrackker Technology
C- 67, Sector- 63, Noida
Email: info@webtrackker.com
Website: www.webtrackker.com
http://webtrackker.com/Cloud-Computing-Training-Institutes-In-Noida.php

37. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
Devops Training in Chennai

Devops Training in Bangalore

38. This is a 2 good post. This post gives truly quality information.

RPA Training in Hyderabad

39. IOT Training in Bangalore - Live Online & Classroom
Iot Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Iot Training in Bangalore

40. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.

rpa training in Chennai | rpa training in pune

rpa training in tambaram | rpa training in sholinganallur

rpa training in Chennai | rpa training in velachery

rpa online training | rpa training in bangalore

41. I was looking for this certain information for a long time. Thank you and good luck.
python training institute in chennai
python training in velachery
python training institute in chennai

42. Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point.
java training in annanagar | java training in chennai

java training in marathahalli | java training in btm layout

43. Nice blog with excellent information. Thank you, keep sharing

MSBI Training in Hyderabad

44. Gaining Python certifications will validate your skills and advance your career.
python certification

45. myTectra Placement Portal is a Web based portal brings Potentials Employers and myTectra Candidates on a common platform for placement assistance

46. Like different elements of India, numerous oil and spices usually cross into making food. This effects in substances getting caught to the partitions of the filter out.
Visit here
http://kitchenware.ml
Best kitchen chimney installation and service
Auto clean chimney sevice in Faridabad

47. Very good brief and this post helped me alot. Say thank you I searching for your facts. Thanks for sharing with us!

angularjs online Training

angularjs Training in marathahalli

angularjs interview questions and answers

angularjs Training in bangalore

angularjs Training in bangalore

48. Thanks For Sharing This Very Useful And More Informative.

TekSlate Online Trainings

49. Thank you for providing useful information and this is the best article blog for the students.learn Oracle Fusion Financials Online Training.

Oracle Fusion Financials Online Training

50. Thank you for sharing such a valuable article with good information containing in this blog.learn Oracle Fusion Technical Online Training.

Oracle Fusion Technical Online Training

51. Thanks for sharing valuable information in the article.students can make a good career by learning Oracle Fusion SCM Online Training.

Oracle Fusion SCM Online Training

52. Thanks for providing such a great information in the blog and also very helpful to all.learn best Oracle Fusion HCM Online Training.

Oracle Fusion HCM Online Training

53. This is really an awesome post, thanks for it. Keep adding more information to this. Thank you!!
DevOps Online Training

54. Thank you sharing this kind of noteworthy information. Nice Post.

Guest posting sites
Technology

55. Very interesting blog.Thanks for sharing this much valuable information.Keep Rocking.
rpa training in chennai | rpa course fee in chennai | trending technologies list 2018

56. Wonderful article!!! It is very useful for improve my skills. This blog makes me to learn new thinks. Thanks for your content.
Best CCNA Training Institute in Bangalore
CCNA Certification in Bangalore
CCNA Training Bangalore
CCNA Training in Saidapet
CCNA Training in Chennai Kodambakkam
CCNA Training in Chennai

57. This information is impressive. I am inspired with your post writing style & how continuously you describe this topic. Eagerly waiting for your new blog keep doing more.
Android Training in Bangalore
Android Institute in Bangalore
Android Coaching in Bangalore
Android Coaching Center in Bangalore
Best Android Course in Bangalore

58. I really thank you for your innovative post.I have never read a creative ideas like your posts.here after i will follow your posts which is very much help for my career.
Salesforce Training in Guindy
Salesforce Training in Saidapet
Salesforce Training in Ambattur
Salesforce Training in Nolambur

59. Thank you for providing useful information and this is the best article blog for the students.learn Oracle Fusion Financials Online Training.

Oracle Fusion Financials Online Training

60. Thank you for sharing such a valuable article with good information containing in this blog.learn Oracle Fusion Technical Online Training.

Oracle Fusion Technical Online Training

61. Thanks for sharing valuable information in the article.students can make a good career by learning Oracle Fusion SCM Online Training.

Oracle Fusion SCM Online Training

62. Thanks for providing such a great information in the blog and also very helpful to all.learn best Oracle Fusion HCM Online Training.

Oracle Fusion HCM Online Training

63. When I initially commented, I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get several emails with the same comment. Is there any way you can remove people from that service? Thanks.

AWS Interview Questions And Answers

AWS Training in Chennai | Best AWS Training in Chennai

AWS Training in Pune | Best Amazon Web Services Training in Pune

64. Thank you for sharing such a nice and interesting blog with us. I have seen that all will say the same thing repeatedly. But in your blog, I had a chance to get some useful and unique information.
Oracle Fusion Financials Online Training

65. I would like to thank you for the efforts you have made in writing this article. I am hoping the same best work from you in the future as well.
Oracle Fusion Financials Online Training

66. Your post is really awesome. Your blog is really helpful for me to develop my skills in a right way. Thanks for sharing this unique information with us.
- Learn Digital Academy

67. Thank you for sharing wonderful information with us to get some idea about that content. check it once through
Machine Learning With TensorFlow Training and Course in Tel Aviv
| CPHQ Online Training in Beirut. Get Certified Online

68. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here
Best Tally Training Institute in delhi
Tally Guru & GST Law course in Delhi

Tally Pro & GST Law course in Delhi

pmkvy course in Delhi
Latest updation On GST for Tally in Delhi

69. I’m thoroughly enjoying your blog. I as well as an aspiring blog writer, but I’m still new to the whole thing. Do you have any recommendations for newbie blog writers? I’d appreciate it.
Advanced AWS Training in Bangalore | Best Amazon Web Services Training Institute in Bangalore
Advanced AWS Training Institute in Pune | Best Amazon Web Services Training Institute in Pune
Advanced AWS Online Training Institute in india | Best Online AWS Certification Course in india

71. Very informative and impressive post you have written, this is quite interesting and i have went through it completely, an upgraded information is shared, keep sharing such valuable information.
Big data hadoop training in bangalore
Big data training in bangalore

72. Nice Article ..Thanks for providing information that was worth reading & sharing
ielts coaching in Hyderabad
Machine Learning Course in Hyderabad
Power bi training Hyderabad
Python training in Hyderabad

73. Superb. I really enjoyed very much with this article here. Really it is an amazing article I had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article. thank you for sharing such a great blog with us.
best rpa training in bangalore
rpa training in pune | rpa course in bangalore
RPA training in bangalore
rpa training in chennai

74. Wonderful article, very useful and well explanation. Your post is extremely incredible. I will refer this to my candidates...
Best Devops Training in pune
Microsoft azure training in Bangalore
Power bi training in Chennai

75. A very nice guide. I will definitely follow these tips. Thank you for sharing such detailed article. I am learning a lot from you.
python Online training in chennai

python training institute in marathahalli
python training institute in btm
Python training course in Chennai

76. Very good to read thanks for the post

Best php training in chennai

77. Genuinely noteworthy article published by you. This might be advantageous for innumerable learners. One can speak and practice English in an effective way, just by downloading English Learning App on your own smartphone, which you can use whenever and wherever you want to practice your communication skills with experts.
Practice English app | English Speaking App

78. Very Clear Explanation. Thank you to share this
Regards,
Devops Training Institute in Chennai

79. Wow wonderful post keep on posting
salesforce training in chennai

80. WOW! Really Nice Post! I personally believe that to maintain the standard of a blog all the hacks mentioned above are important. All points discussed were worth reading
and I’ll surely work with them all one by one.

CEH Training In Hyderbad

81. Its a wonderful post and very helpful, thanks for all this information. You are including better information.
Big Data Training in Gurgaon
Big Data Course in Gurgaon
Big Data Training institute in Gurgaon