DB-Tools.com - system comparision
28th September 2020, Monday
 
Home > System Comparision

Please Choose another system Click here

Editorial information provided by DB-Tools
Comparison Hadoop Cloudera Amazon EMR
 Version 5.9.x Version 4.2.8
 Name Cloudera AWS EMR
 Drawbacks -- NA
 Advantages -- Manage Hadoop Cluster as plug and play
 Languages Supported Java Python Java Python Scala
 Website www.cloudera.com aws.amazon.com/rds/aurora/
 XML Support no no
 JSON Support yes yes
 Brief description Cloudera is a hybrid open-source packaged distribution primarily of Apache Hadoop, Spark, Kafka. The distribution is called CDH (Cloudera Distribution Including Apache Hadoop). It is targeted at enterprise-class deployments of Hadoop platform. You can launch an Amazon EMR cluster in minutes. There is no need to worry about node provisioning cluster setup Hadoop configuration or cluster tuning. Amazon EMR takes care of these tasks so you can focus on analysis.
 Database Model Hadoop File System (HDFS) Relational Database
 Technical Documentation https://www.cloudera.com/documentation.html https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html
 License Commercial Commercial
 Cloud-based / SaaS Altus is the cloud offering of Cloudera SaaS Service from AWS
 Implementation Language NA NA
 Operating System Supported Linux Windows Not Applicable as its managed by AWS
 Options for Integration / Access API Restful HTTP Restful HTTP
 Consistency NA NA
 Foreign Keys Not but you can join two files using Hive and Impala NA
 Streaming Support Yes Yes
 Analytics Support Using Mlib in Apache Spark NA
 Data Storage Schema Hadoop File System (HDFS) NA
 Notable Users Dun & Bradstreet, AoL NA
 Key Differentiator Cloudera Navigator provides lineage of the various jobs and data points. With EMR we can spin up a hadoop cluster in minutes
 Concurrency Yes Yes
 Partitioning Yes No
 Replication Yes Yes
 Secondary Indexes Yes in HBase. Datawarehouse using cloudera is generally built using Hbase. Hbase has secondary indexes Yes based on secondary indexes using Solr
 SchemaLess Yes Yes
 SQL Query No. HiveQL similar to SQL can be used withHive NA