Data Engineering

DURATION
6 Months

MODE OF TRAINING
Online/offline

LEVEL
Advanced

What is AWS Data Engineering?

AWS Data Engineering is a specialized domain focused on designing, building, and managing scalable data pipelines, storage solutions, and data transfer processes on the AWS platform. This course equips learners with the skills to efficiently process, store, and analyze large-scale datasets using AWS tools and services.


Why Choose AWS for Data Engineering?

AWS Data Engineering is a specialized domain focused on designing, building, and managing scalable data pipelines, storage solutions, and data transfer processes on the AWS platform. This course equips learners with the skills to efficiently process, store, and analyze large-scale datasets using AWS tools and services.

  • Scalability:

    Scale up or down as per your demand with ease.

  • Cost-Effectiveness:

    Benefit from AWS's pay-as-you-go pricing model, paying only for what you use.

  • Security & Compliance:

    AWS ensures advanced security features and meets regulatory compliance standards.

  • Real-Time & Batch Processing:

    Supports both real-time and batch data processing to meet diverse needs.

  • Seamless Integration:

    AWS services easily integrate with third-party tools to enhance functionality

Data Engineering Course Curriculum

Introduction to Data Engineering +
  • Overview of Data Engineering: Role, Importance, Implementation
    Understanding Data Engineering, Data Analysis and Data Science
    Data Engineering for (i) Data Analytics (ii) Machine Learning
    Different Types of Data - 5Vs
    Data Lifecycle Details: Ingestion, Storage, Processing, Analysis, and Visualization
    Landscape of Tools and Technologies
    Role/Job Areas of Data Engineering
    Modular Case Study: 1
    Formative Assessment: 1
Business Data Preparation +
    • Understanding Business Data and its requirements
      KPIs and Metrics
      Analysis of Data from different domains like Health Care
      Education, Human Resource, Retail Business Chain and FMCG, Media
      Hospitality Industry and more
      Modular Case Study: 2
      Formative Assessment: 2
Implementation Fundamentals: Tools and Technologies +
  • Python Programming: Programming Essentials
    Data Types, Control Structures, Functions
    Modules, OO Concepts
    Data Operations using Pandas: Data Cleaning
    Munging/Wrangling, Manipulation, EDA using Pandas
    Working with Different Data Sources and Structures
    Programming using Scala: Programming Fundamentals
    using Scala 2, Concepts of Parallel Programming using Scala
    Modular Case Study: 3
    Formative Assessment: 3
Data Storage Technologies +
  • Relational Databases/Object Relational
    Databases and its implementation
    NoSQL Databases: Features and Characteristics
    Types of NoSQL data models like Key Value
    Column-oriented, Document, Graph
    Implementation of NoSQL Databases
    Data Warehouses: Concepts, Star and Snowflake
    Schemas, OLAP, and Data Marts. Implementation and Design
    Distributed File Systems: Hadoop HDFS, Google Cloud Storage
    Amazon S3. Implementation of Hadoop Cluster and AWS S3
    Data Processing using Hive. Implementation of Hive in Hadoop Cluster
    Working with open Data Lakehouse with Presto and Apache Iceberg
    Scalable Query Handling using Presto
    Modular Case Study: 4
    Formative Assessment: 4
Data Ingestion Technologies +
  • Different types of Data Sources and its destinations
    Architecture of Data Ingestion Mechanism
    Types of Ingestion Process: (i) Batch (ii) Streaming
    Data Ingestion Pipleline
    Data Ingestion Technologies like Apache Kafka, Flume
    Amazon Kinesis and more
    Dynamic Pipeline Generation, ETL, Developing, Scheduling
    and Monitoring Batch-oriented Workflows, DAGs using Apache Airflow
    Working with No-code ETL/ELT Tools
    Introduction to the concepts of Data Lakehouse and its implementation
    Modular Case Study: 5
    Formative Assessment: 5
DataOps +
  • Understanding DataOps Vs. DevOps
    Review of Operating System Concepts
    Basic Principles of Containerization
    Understanding the concepts of DataOps
    Applying CI/CD concepts in Data Pipelines
    Introduction to Docker
    Dockerfiles, Images, and Containers
    Docker Networking
    Docker Compose
    Orchestration
    Introduction to Kubernetes
    Running Data Orchestration Tools on Kubernetes
    Modular Case Study: 6
    Formative Assessment: 6
Big Data Processing +
  • Essentials of Apache Spark:Spark Architecture
    Spark Resilient Distributed Datasets (RDD) and
    its operations, Dataframe Basics, Dataframe
    Transformation and Execution, Dataframe Joining
    Implementation using PySpark/ScalaSpark
    Ingesting Data into Spark: Spark SQL, Spark Data
    and Stream Processing, Implementation using PySpark/ScalaSpark
    Working with Databricks
    Working with Ray - Ray with Databricks, Spark and more
    No Code/Low Code Big Data Processing Platforms
    Introduction to Google Cloud BigQuery, Working with Amazon EMR
    Modular Case Study: 7
    Formative Assessment: 7
Data Analytics and Visualization +
  • Introduction to Reports for Data Analysis
    Descriptive Analysis and its Reports
    Key Performance Indicator (KPI) Dashboards and
    Periodic Reports, Diagnostic Analysis and
    Detailed Drill Down Reports, Predictive Analysis
    and Reports based on Predictive Models, Prescriptive
    Analysis and Reports based on AI/ML Models
    Implementation of Reports and Dashboards using Apache
    Superset, Microsoft Power BI, and Salesforce Tableau
    Modular Case Study: 8
    Formative Assessment: 8
Cloud Platform and Services +
  • Introduction to Different Cloud Platforms
    AWS, Azure and Google Cloud Architecture
    Working with Amazon RedShift, BigQuery
    Managed Data Services: AWS Glue, Google
    Dataflow, Azure Data Factory
    Modular Case Study: 9
    Formative Assessment: 9
Basic ML Techniques +
  • Predictive Modeling: Regression and
    Classification Algorithms, Supervised
    and Unsupervised Algorithms, Performance
    Measures and Metrics
    Introduction to Deep Learning, NLP and RL
    Introduction to Convolutional Neural Networks
    (CNN), RNN and LSTM, Introduction to Natural
    Language Processing (NLP) and Toolkit Application
    in Natural Language Processing (NLP), Introduction
    Introduction to LLMs: Understanding LLM (Large Language Models)
    performance, Tools to implement LLMs and GenAI
    Implementation of ML/DL Algorithms using Python centric
    libraries and Spark ML
    Modular Case Study: 10
    Formative Assessment: 10
Introduction to ML Automation Process and Pipelines +
  • Introduction to MLOps
    Working with MLOps Platforms
    AWS SageMaker, MLFlow, Pipelines
    for Model Building, Distributed Model
    Training, LLM Training Pipelines
    Pipelines for Real-time ML Inference
    Pipeline for RAG, RLHF Training Pipeline
    for LLMs using Python centric libraries
    Modular Case Study: 11
    Formative Assessment: 11

Enquiry Form

Only alphabets are allowed.
Email must start with alphabets followed by numbers.

💬
Chat with Us
WhatsApp Logo
Chat with Us