The release of Apache Hadoop 2, as announced
today by the Apache Software Foundation, is an exciting one for
the entire Hadoop ecosystem
Cloudera engineers have been working hard for many months with the rest
of the vast Hadoop community to ensure that Hadoop 2 is the best it can
possibly be, for the users of Cloudera’s platform as well as all Hadoop users
generally. Hadoop 2 contains many major advances, including (but not limited
to):
·
High
availability for the HDFS NameNode, which eliminates the
previous SPOF in HDFS.
·
Support for filesystem snapshots in HDFS, which brings native backup and
disaster recovery processes to Hadoop.
·
Support for federated NameNodes, which allows for horizontal scaling of
the filesystem namespace.
·
Support for NFS access to HDFS, which allows HDFS to be mounted as a
standard filesystem.
·
Native network
encryption, which secures data while in transit.
·
The YARN resource management system, which provides infrastructure for
the creation of new Hadoop computing paradigms beyond MapReduce. This new
flexibility will serve to expand the use cases for Hadoop, as well as improve
the efficiency of certain types of processing over data already stored there.
·
Several performance-related enhancements, including more efficient
(and secure) short-circuit local reads in HDFS.
Furthermore, a great deal of work has gone into stabilizing and
maturing Hadoop’s APIs in preparation for this release,
which should give all users and projects building on top of Hadoop confidence
that what they’re creating today will work for years to come.
As for CDH,
Cloudera’s distribution including Hadoop and related projects have already
delivered several stable, high-value parts of Hadoop 2 in the current release
(such as HDFS 2.0, network encryption, and performance improvements), and the
next release (CDH 5) will be based entirely on Hadoop 2 — using YARN for
resource coordination between MapReduce and other components.
0 comments: