Improving mapreduce performance through data placement
In distributed processing model the MapReduce is become more important for large scale data application such as data mining and web indexing. Hadoop is an open source frame work which is used to implement the MapReduce for low response time. The recent Hadoop frame work assumes the cluster nodes are homogeneous.

Here data locality is not considered for introduction of speculative map responsibility because the most of the maps homogeneous which is mean that data local. So in virtualized data processing centers, both homogeneity and data locality assumptions are not satisfied.

This paper shows the ignore data locality in heterogeneous environments so that it will reduce the performance of MapReduce and also address the problem of locality of the data between node. This will achieve the balanced data processing between each node.

The data intensive application run on Hadoop MapReduce cluster frame work, the proposed data placement model is balanced the amount of the data which is stored in each node and it will improve the performance of data processing. This paper analyzed two real data applications and it shows the improve the MapReduce performance using rebalancing data across the nodes in the cluster

