Lets talk about the Evolution of Hadoop. Doug Cutting the creator of Hadoop(Yahoo!) and Chief Architect of Cloudera.
In the Year of 2002-2004, Doug Cutting was working with Apache in a project called Apache Lucene and Nutch, a distributed search engine that suppose to index 1 billion pages. Lucene is a search indexer and Nutch is a spider or crawler.
What does that mean ? What are the basic things of a general search engine?
A search engine basically contains of three things :
- A spider or crawler : downloads data whenever you search something over the search engine.
- Indexer : indexes to the frequently used pages. If the people are using any web site for more number of time. Indexer will point to that.
- Mapper : maps actual content to the screen.
In December 2004, Google Labs published a paper on the MapReduce(also called MR) algorithm. Doug Cutting found that the project he is working on is not scaling according to expectation. Then he decided to use the concept of MR for building Nutch distributed file system.
In 2006, Doug Cutting had joined Yahoo! And Yahoo had provided some dedicated team to work on a Project called Hadoop!
During 2006 – 2008, Hadoop was born out of Nutch as a Large Scale Distributed Computing platform!Which would scale upto multiple number of machines.
By the end of year 2008, Yahoo declared it had 910 node clusters. And by using those it was able to sort One Terabyte of data within 3.5 minutes. Previously it was taking at least a day to do that work.
So , we can say that Hadoop has got prominent by the Year 2008 !!!