We all know the History and Evolution of Hadoop. Now I will try to explain some key features of Hadoop that made Haddop to stand out from the crowd.
Let me talk about Hadoop Scalability first. Hadoop is linearly scalable. When I said Hadoop is linearly scalable , let me give an example to explain that.
Lets say I have two cars, one is Black and another is Red. These two cars are giving 15 kilometers mileage for One liter of Octane. But I want 30 kilometers mileage with the same fuel. So in order to achieve that 30 KM , I have increased the configuration of these two cars. I have exactly doubled the configuration of those cars. But I found that the Black one did not achieve 30 KM, not even 20 KM 😦
From the example the Black car can be compared with RDBMS and Red one with Hadoop.
Hadoop is something which uses distributed file system which distributes the work among different file system. So rather than using a single file system we are using distributed file system and distributing the work among them. So in this case I am increasing my resources rather than having a single resource I am having multiple resources. So a questing might popup , is that not a problem with the budget? Increase of resources will increase the budget which can be a burden to my client.
The nodes in Hadoop clusters are made-up of commodity hardware.
What does Commodity Hardware mean?
Commodity hardware is a term for affordable devices that are generally compatible with other such devices. In a process called commodity computing or commodity cluster computing, these devices are often networked to provide more processing power when those who own them cannot afford to purchase more elaborate supercomputers, or want to maximize savings in IT design.
When I will compare enterprise hardware with commodity hardware , it will be around 90% cheaper 🙂
How can I trust on Hadoop which stores lots of confidential and critical data on a cheaper hardware’s ?
The answer is YES!!! You can fully rely on Hadoop because it can take over of Auto Fail over of nodes. Hadoop architecture takes care of Auto Fail-over of your nodes. Lets say I have spitted my works among 10 people if One person has fall and sick. What I should do then ? I will route ask someone to do that task.
Should I assign that to anyone ?
No, I need to identify who has less work load to handle the additional task.
Hadoop architecture also do the same thing.
There are many reasons why Hadoop is flexible. Let me give two examples:
Firstly, as I said in the definition of Hadoop is a framework written in JAVA. And as we all know that, Java is the most powerful, portable, high ways across any operating system. So Hadoop should also be portable across any operating system.
The next thing is, Hadoop is written in Java but it’s not that all it’s programming models to be written in Java. You can write your programming model in Python, C, CPP or whatever you programming language you like. So Hadoop has lot of flexibility so that you can work your programming models in Hadoop.
Distributed And Fast:
And finally, let’s talk about the distributed behaviors of Hadoop. The distribution of work among different systems is the main feature of Hadoop and for which Hadoop got prominent. If you’re dealing with large volumes of unstructured data, Hadoop is able to efficiently process terabytes of data in just minutes, and petabytes in hours.
So, now we all know all significant features of Hadoop.
Happy learning 🙂