Site Overlay

Hadoop & Spark

Hadoop & Spark

Big Data – Hadoop & Spark Training

What is Hadoop?

Hadoop is a platform written in java where we can able to process large amount of data. Hadoop ecosystem has lots of tools which make processing the bigdata made easy.
Let’s learn how to do that end to end..!!


Over the past years, Hadoop & Spark has seen enormous industry adoption and facing lack of skills in the market. To help bridge the gap we have designed this course with industry expectations with real time examples. This is course will help you understand variety of big data application development options and let you develop your own and Performance tune the same.

This course is for

  • Professionals who want to learn & develop Hadoop & Spark applications.
  • And those are is interested to learn about latest technology for their career improvement.

Course Structure:

  • This course is designed with 40% theory and 60% Hands on.
  • You will be given real time POC to solve and learn.

Course Duration:

  • Course Duration: 40+ hrs live session and 20+ hrs pre-recorded videos

Course Modules:

Below 40+ hrs Live sessions topics:

  • BigData
  • Hadoop & Spark intro
  • Hadoop – HDFS
  • Apache Hive
  • Apache Yarn
  • spark core
  • spark sql
  • Spark streaming
  • structured streaming
  • spark Execution (Intellij, spark-submit)
  • Kafka
  • cassandra
  • Nifi – Basic
  • Airflow
  • POC / Project class – Hadoop & Spark

Below Pre-Recorded videos Topics:

  • Map-Reduce
  • Sqoop
  • Flume
  • Apache Pig
  • Apache HBase
  • ML Intro : Basics Python
  • Cloud Basics

Below Basics Pre-Recorded videos Topics:

  • Linux
  • Shell Script
  • Java/Scala Basics
  • Sql Basics

Why Choose this course?

  • To fulfill your ambition and build your future career
  •  Industry Experience trainers
  • Course contents keep evolving according to current industry need
  • Session are taken online that could be taken from anywhere
  • Courses Available in Tamil & English Languages

After this class you will be able to,

  • Have Good knowledge about hadoop and spark
  • Have hands-on experience on hadoop and spark.
  • Build data pipeline using Hadoop & spark.
  • Complete a project on hadoop and spark independently.
  • Performance tune a spark application.
  • Know how to switch career to BigData from any other technology.
  • Understanding different BigData components.
  • Create Streaming jobs and run on YARN cluster.

Hadoop Module Details

Introduction to Hadoop World:

  • Dataaaaaaa…….Bigdata..!
  • What is bigdata? 3 + 1 V’s.
  • What is Hadoop , why hadoop & Its history.
  •  Hadoop Eco System an overview.
  • Current Requirements and Future possibilities in
  • RDBMS vs Hadoop
  • Wait..Finally what hadoop is not?
  • Do we need java to learn hadoop?
  • Hadoop installation

Hadoop Architecture In-depth travel:

  • HDFS – An introduction.
  • How data is stored in hdfs? (Travel of a byte).
  • Hadoop Daemons:
    o Name node.
    o Data node.
    o Job Tracker.
    o Task tracker.
  • Fault tolerance in Hadoop.
  •  HA mode in HDFS.
  • How files are handled in projects (sample Project
    Scenario Execution)

Map Reduce 1.0 & YARN:

  • Mapreduce history.
  • How Map Reduce is being used in Projects.
  • Mapreduce architecture,Key-Value pair.
  • YARN 2.0 architecture.
  • Java Implementation of map reduce. (Sample POC)
  • Mapper, Reducer, Combiner Different combination

Pig & Hive:

  • Hive introduction.
  • Hive data model.
  • Hive implementation of sample project.
  • Pig Introduction.
  • Pig Data structure.
  • Pig Implementation on sample project.
  • How pig & hive is used in real time project?
  • Module 4 assignment.

Hbase, oozie & Zookeeper:

  • oozie introduction.
  •  oozie Overview and configuration.
  • zookeeper overview.
  • HBASE Introduction.
  • HBASE Overview.
  •  SPARK Over view
Spark Module Details

Welcome to Spark:

  • Welcome to the world of Spark.
  • Bye Bye Hadoop? (Hadoop Vs Spark).
  • Spark Components:
    o Spark Core
    o Spark SQL
    o Spark Streaming
    o Graphx
    o Mlib
  • Spark Use cases in real time.
    Hands on:
  • Running a sample program in spark.
  • Executing a spark use case.

Programming with RDD:

  • What is RDD?
  • Why RDD?
  • How RDD gets executed in a spark application.
  • Transformations in RDD.
  • Actions in RDD.
  • RDD Programming API’s.
    Hands On:
  • Creating RDD from a Data file.
  • Applying transformations & actions in RDD.
  • Interactive queries using RDD.

Spark SQL/DataFrames:

  • SparkSQL/Dataframe Uses.
  • DataFrame / SQL API’s
  • Spark & Hive Integration.
  • Catalyst query optimization.
    Hands on:
  • Create dataframe from a file.
  • Create dataframe from a table.
  • Caching and reusing dataframes.
  • Query with dataframes API and SQL.

Spark Execution & Optimization:

  • Jobs Stages & tasks.
  • Partitions and Shuffles.
  • Data locality.
  • Spark memory Management
  • Job Performance (tuning).
    Hands on:
  • Visualizing DAG execution.
  • Measuring memory usage.
  • Understanding performance.

Spark Streaming:

  • Introduction to Spark Streaming.
  • DSTREAM API’s and Stateful
  • Reliability and fault recovery.
    Hands on:
  • Creating DStream from source.
  • Integration of Kafka and Spark streaming.
  • Developing a kafka-spark application.
  • Viewing Stream jobs in WebUI.
  • Measuring memory usage.
  • Understanding performance.

Introduction to Kafka:

  • Introduction to Kafka.
  • Kafka architecture.
  • Producers, Consumers in Kafka.
  • Working with Kafka.
    Hands on:
  • Installing & configuring Kafka.
  • Producing and consuming messages

Hadoop Module Details

Our Trainers provide complete freedom to the students, to explore the subject and learn based on real-time examples. Our trainers help the candidates in completing their projects and even prepare them for interview questions and answers. Candidates are free to ask any questions at any time.

  • More than 7+ Years of Experience.
  • Trained more than 2000+ students in a year.
  • Strong Theoretical & Practical Knowledge.
  • Certified Professionals with High Grade.
  • Well connected with Hiring HRs in multinational companies.
  • Expert level Subject Knowledge and fully up-to-date on real-world industry applications.
  • Trainers have Experienced on multiple real-time projects in their Industries.
  • Our Trainers are working in multinational companies such as CTS, TCS, HCL Technologies, ZOHO, Birlasoft, IBM, Microsoft, HP, Scope, Philips Technologies etc

Course Instructor

Tamil Bhoomi

Member Since November 2020

Related Courses

Bigdata - Apache Spark Real time Project Oriented

“Big data” analysis is a hot and highly valuable skill

Bigdata - Apache Spark Real time Project Oriented

“Big data” analysis is a hot and highly valuable skill

Subscribe to our Online Newsletter

Some Text over here