DB-Tools.com - system comparision
19th September 2020, Saturday
 
Home > System Comparision

Please Choose another system Click here

Editorial information provided by DB-Tools
Comparison Amazon EMR Hadoop Cloudera
 Version Version 4.2.8 5.9.x
 Name AWS EMR Cloudera
 Drawbacks NA --
 Advantages Manage Hadoop Cluster as plug and play --
 Languages Supported Java Python Scala Java Python
 Website aws.amazon.com/rds/aurora/ www.cloudera.com
 XML Support no no
 JSON Support yes yes
 Brief description You can launch an Amazon EMR cluster in minutes. There is no need to worry about node provisioning cluster setup Hadoop configuration or cluster tuning. Amazon EMR takes care of these tasks so you can focus on analysis. Cloudera is a hybrid open-source packaged distribution primarily of Apache Hadoop, Spark, Kafka. The distribution is called CDH (Cloudera Distribution Including Apache Hadoop). It is targeted at enterprise-class deployments of Hadoop platform.
 Database Model Relational Database Hadoop File System (HDFS)
 Technical Documentation https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html https://www.cloudera.com/documentation.html
 License Commercial Commercial
 Cloud-based / SaaS SaaS Service from AWS Altus is the cloud offering of Cloudera
 Implementation Language NA NA
 Operating System Supported Not Applicable as its managed by AWS Linux Windows
 Options for Integration / Access API Restful HTTP Restful HTTP
 Consistency NA NA
 Foreign Keys NA Not but you can join two files using Hive and Impala
 Streaming Support Yes Yes
 Analytics Support NA Using Mlib in Apache Spark
 Data Storage Schema NA Hadoop File System (HDFS)
 Notable Users NA Dun & Bradstreet, AoL
 Key Differentiator With EMR we can spin up a hadoop cluster in minutes Cloudera Navigator provides lineage of the various jobs and data points.
 Concurrency Yes Yes
 Partitioning No Yes
 Replication Yes Yes
 Secondary Indexes Yes based on secondary indexes using Solr Yes in HBase. Datawarehouse using cloudera is generally built using Hbase. Hbase has secondary indexes
 SchemaLess Yes Yes
 SQL Query NA No. HiveQL similar to SQL can be used withHive