Friday, November 4, 2016

UBER mode in Hadoop2 and its configuration

ResourceManager will create separate container for mapper and reducer by default. In Uber mode will allows to run mapper and reducer in the same process as the ApplicationMaster.

Jobs running in uber mode are Uber Jobs. Uber jobs are executed within the ApplicationMaster. Rather then communicate with ResourceManager to create the mapper and reducer containers. The ApplicationMaster runs the map and reduce tasks within its own process and avoided the overhead of launching and communicate with remote containers.

Why we go for UBER Mode?
If you have a small dataset or you want to run MapReduce on small amount of data, Uber configuration will help you out, by reducing additional time that MapReduce normally spends mapper and reducers phase.

Uber mode supports only for map-only jobs and jobs with one reducer.

Configurations to enable jobs to run in UBER Mode
There are four core settings around the configuration of UBER Jobs in the mapred-site.xml. 

Configuration options for Uber Jobs:

mapreduce.job.ubertask.enable (Default = false)
Whether to enable the small-jobs "ubertask" optimization, which runs "sufficiently small" jobs sequentially within a single JVM. 

mapreduce.job.ubertask.maxmaps (Default = 9)
Threshold value for the number of maps beyond which a job is considered too large for the ubertasking optimization. Users can override this value, but only downward.

mapreduce.job.ubertask.maxreduces (Default = 1)
Threshold value for the number of reduces beyond which a job is considered too large for the ubertasking optimization. 
Note: Currently the code can't support more than one Reducer and will ignore larger values.

mapreduce.job.ubertask.maxbytes (Default = HDFS Block Size)
Threshold value for the number of input bytes beyond which a job is considered too large for the ubertasking optimization.
If no value is specified, dfs.block.size is used as the default. Be sure to specify a default value in mapred-site.xml if the underlying file system is not HDFS.