azure data engineerTraining in hyderabad

Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation.

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance

Azure Data Engineer Azure Data Factory and Spark Online Training Course Details

Course Duration

45 Days

Mon-Fri

8 am to 9 am (IST)

Mode of Training

Online

Azure Data Engineer Online Training Course Details

Adf

Basics
Introduction to Data Engineering
Python,SQL and Azure Portal Access

Introduction to ADF V2

Why ADF?
Terminology
How to create a data factory?

Practice basics of ADF

practice creating Linked services, Datasets, gobal parameters, and a simple pipeline
understand navigation of pages in ADF portal

Activities Sessions

Move and Transform
General
Iteration & Conditionals

scenario based pipeline building

Building pipelines using above mentioned set of activities

Debugging

debugging pipeline failures

Triggers in Pipeline

Theoritical & Practice

Real time project showcasing

Showing a real time project pipelines

Spark

Basics of Big Data

What is Big Data?
How to process Big Data?
What is Map Reduce?
What is Apache Spark?
differences between Map reduce and Spark?
How to practice Spark?

Spark Basics

What is Apache Spark?
Architecture of Apache Spark?
Spark’s Language API’s
Spark’s API’s
Spark Context, SQL Context
RDD Definition
First Hands-on program in Spark
Spark Session
Data Frames
Partitions
Lazy Evaluations
Transformations
Actions
Spark UI

Spark Structured API

Data Frames, Spark SQL, and Datasets
Py Spark Introduction
Overview of Structured API execution.
Understand different file formats for data processing
How to practice Spark in Google Colab?
Practice creating dataframes for various file formats
Selecting columns of a dataframe in various methods including using selectExpr
Inferring custom datatypes on the data in the dataframe
filter data in the dataframe based on a condition
Group by and collect_list() functions
creating a new column from an existing column
Renaming existing columns
removing columns from a dataframe
casting datatypes on the data in the dataframe
extract distinct values from a dataframe

Spark For ML

Random Sampling and Random Splits(optional)

Spark Structured API

Concatenating data(rows) in Dataframes
Join multiple dataframes
Sort Data In dataframes
Limit function in Spark
Repartition and Coalesce in Spark
Collecting rows to the driver node using collect()
Cache and persist dataframe
Usage of Lit() function and it’s usage
Working with dates and timestamps
Handling Null values in data
convert a spark dataframe into a pandas dataframe
Creating User Defined Functions in Spark
Write the data of the dataframe to external storage
create partitions while writing the the data into external storeage
create buckets on the data written to the datalake
pull data from a database directly using spark

Spark SQL

Introduction to SparkSQL
different types of views in SparkSQL
creating spark tables/views
practice all the above concepts using Spark SQL

Databricks

inroduction
how to create clusters and it’s types
what is data lake?
how to mount data lake to databricks?
creating notebooks and running code on databricks
databricks utilities and magic commands
widget’s in databricks
Lakehouse Architecture
Delta File format
creating delta tables
inserting data into delta tables
doing time travel with delta tables
uses of delta format

Optimisation techniques

Broadcast joins
spark shuffle partitions
vaccum command
optimise command
Repartition and Coalesce on the dataframes
using cache and persist when required

Improving performance of Spark Jobs

using multiprocessing package

ADF – Databricks Linkage

Call databricks notebooks from data factory
pass parameter’s to databricks notebooks

End-to-End project

A real-time project done using ADF, databricks, PySpark, SparkSQL

A quick review

doubts & Questions

Azure Data Engineer Azure Data Factory and Spark Online Training Course Details

Course Duration

Mon-Fri

Mode of Training

Azure Data Engineer Online Training Course Details

Azure Data Engineer Online Training demo videos by srilkanth

Register Now for Azure Data Engineer Online Training