Bigdata Hadoop Hive Spark Data Science Training

Bigdata Hadoop Hive Spark Data Science Training

The Apache Hadoop software library is a framework that allows for the distributed processing of large Big data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Bigdata & Data Science Courses

Certified Hadoop Architect Engineer

Enroll Now

Certified Bigdata Engineer

Enroll Now

Certified Data Scientist

Enroll Now

Understanding Big Data and Hadoop

Limitations and Solutions of existing Data Analytics Architecture
Hadoop Features
Hadoop Ecosystem
Hadoop 2.x core components
Hadoop Storage: HDFS
Hadoop Processing: MapReduce Framework
Hadoop Different Distributions

Duration: Customized
Fees & Syllabus

Hadoop Architecture and HDFS

Hadoop 2.x Cluster Architecture
Federation and High Availability
A Typical Production Hadoop Cluster
Hadoop Cluster Modes
Common Hadoop Shell Commands
Hadoop 2.x Configuration Files
Single node cluster and Multi node cluster set up Hadoop Administration.

Duration: Customized
Fees & Syllabus

Hadoop MapReduce Framework

Topics-MapReduce Use Cases
Hadoop 2.x MapReduce Architecture
YARN MR Application Execution Flow,
Anatomy of MapReduce Program
Input Splits
Relation between Input Splits and HDFS Blocks
MapReduce: Combiner & Partitioner
Counters ,Distributed Cache
MRunit, Reduce Join
Custom Input Format
Sequence Input Format
Xml file Parsing using MapReduce.

Duration: Customized
Fees & Syllabus

Hive

Hive Background
Hive Vs Pig
Hive Architecture and Components
Metastore in Hive, Limitations of Hive
Comparison with Traditional Database
Hive Data Types and Data Models, Partitions and Buckets, Hive Tables(Managed Tables and External Tables), Importing Data, Querying Data, Managing Outputs, Hive Script, Hive UDF, Retail use case in Hive, Hive Demo on Healthcare Data set.
Hive QL: Joining Tables, Dynamic Partitioning
Custom Map/Reduce Scripts
Hive Indexes and views Hive query optimizers
Hive : Thrift Server, User Defined Functions, HBase: Introduction to NoSQL Databases and HBase, HBase v/s RDBMS, HBase Components, HBase Architecture, Run Modes & Configuration, HBase Cluster Deployment.

Duration: Customized
Fees & Syllabus

HBase

HBase Data Model
HBase Shell
HBase Client API
Data Loading Techniques
ZooKeeper Data Model
Zookeeper Service
Zookeeper, Demos on Bulk Loading
Getting and Inserting Data, Filters in HBase

Duration: Customized
Fees & Syllabus

Apache Spark & scala

What is Apache Spark
Spark Ecosystem
Spark Components
Spark a Polyglot
Why Scala
SparkContext
RDD

Duration: Customized
Fees & Syllabus

Apache Pig

About Pig
MapReduce Vs Pig
Programming Structure in Pig
Pig Running Modes
Pig components, Pig Execution
Pig Latin Program, Data Models in Pig
Pig Data Types, Shell and Utility Commands, Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Specialized joins in Pig, Built In Functions ( Eval Function, Load and Store Functions, Math function, String Function, Date Function, Pig UDF, Piggybank, Parameter Substitution ( PIG macros and Pig Parameter substitution ), Pig Streaming, Testing Pig scripts with Punit, Aviation use case in PIG, Pig Demo on Healthcare Data set.

Duration: Customized
Fees & Syllabus

Oozie Sqoop and Flume

Flume and Sqoop
Oozie Components, Oozie Workflow
Scheduling with Oozie
Oozie Co-ordinator
Oozie Commands, Oozie Web Console
Oozie for MapReduce
PIG, Hive, and Sqoop, Combine flow of MR, PIG, Hive in Oozie, Hadoop Project Demo, Hadoop Integration with Talend.

Duration: Customized
Fees & Syllabus