Videos - The Data Signal

The Data Signal

For years, data teams have been stuck in reactive mode—cleaning up messy reports, fixing broken dashboards, and chasing bugs long after they’ve caused damage. But as AI and real-time decisioning take center stage, that model just doesn't cut it anymore.

In this episode of the Shift Left, Think Forward series, we explore how data teams are transforming from behind-the-scenes janitors into strategic architects of modern data systems.

🚨 We cover:

Why the old “clean it later” mindset is broken

How data teams are embedding directly into product, marketing, and ops

The rise of “Data as a Product” thinking (inspired by Data Mesh)

What skills and roles are now essential—from analytics engineers to data product managers

Why clean, reliable, early-stage data is critical for trustworthy AI

🔧 We also touch on tools like dbt, Monte Carlo, DataHub, and more—and preview how they support this shift (full breakdown in Part 3).

If your data team is still stuck in ticket mode, or your AI isn’t delivering what it should, this episode is for you.

👉 Subscribe for more episodes in this 5-part series on modern data strategy and the shift-left revolution.

#DataStrategy #AnalyticsEngineering #DataAsAProduct #ShiftLeft #ModernDataStack #AIandData

From Janitor to Architect: How Data Teams Are Being Rebuilt for the AI Era | Shift Left Series

Every data team has had that moment—your dashboard breaks, panic sets in, and everyone scrambles to figure out what went wrong… after the fact.

In this video, we explore why that’s no longer good enough—especially in an AI-powered world. “Shifting left” is a mindset that’s transforming how modern teams approach data quality, reliability, and trust. Instead of cleaning up messes at the end of the pipeline, forward-thinking teams are catching issues at the source—before they impact dashboards, models, or customer decisions.

🧠 We cover:

The origin of “shift-left” from the world of DevOps

The Rule of Ten: why early fixes save time, money, and trust

How AI raises the stakes for bad data

The 3-layer Trust Stack: a simple mental model for reliable AI

Why data quality is no longer optional—it’s strategic

Whether you’re a data engineer, analyst, PM, or business leader, this series will help you rethink how you build and trust your data.

📌 This is Part 1 of a 5-part series on modern data thinking.
👉 Subscribe to follow the full journey.

Chapters:
0:00 - Introduction
0:40 - From Panic Mode to Prevention
3:50 - Welcome to Shift Left
4:42 - Origin Story: From DevOps to Data
7:54 - The Rule of Ten
10:30 - Why AI Matters in Shift Left
12:40 - The Trust Stack
14:30 - Conclusion

#DataEngineering #AI #DataQuality #ShiftLeft #Analytics #DevOps #ModernDataStack

Why ‘Shifting Left’ Is the Wake-Up Call Data Teams Needed

Databricks. Snowflake. dbt.

Everyone’s talking about them. Every modern data team is using at least one of them. But what do these tools actually do? And more importantly—who are they designed for?

In this video, I break down the origin, evolution, and core philosophies behind each platform. We’ll look at how Databricks, Snowflake, and dbt started out solving very different problems—and why today, they often feel like they’re doing the same things.

I’ll walk you through:

What makes each tool unique (and where they overlap)

Which roles they serve best (engineers, analysts, data scientists)

How they work together in a modern data stack

And how to figure out which one to start with, based on what you actually need

This isn’t just a feature comparison—it’s a real-world guide to understanding how these tools fit into real teams, real workflows, and real careers.

🔗 Get started:

Databricks: https://www.databricks.com/resources/learn/training/databricks-fundamentals

Snowflake: https://signup.snowflake.com/

dbt: https://learn.getdbt.com/catalog

🎯 Whether you're new to data or trying to make sense of your team's stack—this video will give you the clarity you need.

Chapters:

0:00 - Introduction
1:00 - The Origin Story
3:37 - How each tool has evolved for data teams
6:00 - How they work together in modern data stack
8:40 - Philosophies behind each tool
14:32 - Deciding which tool to use

Databricks vs Snowflake vs dbt: Built for which Data Teams? Finally Understand the Difference

Big Data isn’t just a tech term from the early 2010s—it’s the invisible force behind nearly every decision modern technology makes. From the apps on your phone to the routes your GPS suggests, Big Data is quietly working in the background, shaping your experience in real-time.

In this video, we break down what Big Data really means in a way that’s finally easy to understand. You’ll learn:
🔹 Why Big Data never disappeared—it just powered up AI
🔹 How everyday actions like shopping, streaming, and walking through a smart city generate data
🔹 What the “3 Vs” of Big Data are (and why they matter)
🔹 How companies collect, store, and analyze data at massive scale
🔹 The risks—like privacy, bias, and overload—you need to be aware of

We’re cutting through the hype to show you exactly how Big Data affects you, even if you've never worked in tech.

You’re Surrounded by Big Data—Here’s What That Actually Means

🚀 Canva Just Got Analytical! In this video, I walk you through Canva’s newest feature — Canva Sheets — and how it’s changing the game for data-driven storytelling and visual reporting.

From exploring the latest chart options to building a fully visual social media dashboard, I’ll show you how to use real data (CSV or manually entered) to create beautiful, presentation-ready dashboards — all inside Canva.

🔍 In this video, you’ll learn:

How to use Canva Sheets to manage and connect your data

An overview of Canva’s new chart and graph capabilities

Step-by-step dashboard creation using real social media metrics

The pros and limitations of Canva as a light analytics tool

Whether you’re a data analyst, marketer, designer, content creator, or just Canva-curious, this video will help you unlock a new side of the tool — one that blends data and design effortlessly.

Canva Is Coming for BI Tools – I Built This Dashboard to Test It

In this video, I walk you through a full-scale data pipeline for processing and analyzing news articles using the modern medallion architecture (Bronze → Silver → Gold). The pipeline is built on Databricks and utilizes PySpark, Delta Lake, and Hive Metastore, with integrated sentiment analysis using TextBlob and robust data quality validation mechanisms.

🔧 Technologies Used:

Apache Spark (PySpark)

Delta Lake (ACID Transactions)

Azure Data Lake Gen2 (Storage)

Hive Metastore / Unity Catalog (Metadata Management)

TextBlob (NLP Sentiment Analysis)

Databricks (ETL Orchestration)

📌 What You'll Learn:

How to ingest data from APIs and store in Delta format

Dynamic data quality checks and quarantining bad records

Enriching data with NLP sentiment scores

Building star-schema data models with fact/dim tables

Writing clean data to Hive and exposing it for BI

📁 GitHub Repo:
👉 https://github.com/david-ikenna-ezekiel/news-data-pipeline

📌 Referenced Video
- How to Provision the Medallion Architecture on Azure ADLS using Terraform: https://youtu.be/3ClIjdFHp1k

- Medallion Architecture Explained: From Raw Data to Business Insights: https://youtu.be/UJ5g9BUaoeE

- How to Create Azure Key Vault and Connect with Databricks: https://youtu.be/QBr2WUgxnEc

- How to Design a Data Model Using Python and SQLite: https://youtu.be/JT-9cALngYE

- How to Connect to Databricks from PowerBI: https://youtu.be/7iVYVwhvwtc

-----
🔥 Don't forget to Like, Comment, and Subscribe for more data engineering content!#dataengineering

#Databricks #DeltaLake #ApacheSpark #PySpark #NLP #SentimentAnalysis #BigData #ETL #Hive #Lakehouse #MedallionArchitecture

📰 End-to-End News Data Pipeline | Databricks, PySpark, Delta Lake, Hive, and Sentiment Analysis

In 3 mins, I'll show you how to build your own Python Data Quality Checker using Pandas. You'll learn how to quickly assess any dataset for common data quality issues like missing values, duplicates, invalid data, and more!

This is a must-watch for anyone looking to level-up their data cleaning skills in minutes!

Build a Python Data Quality Checker in 3 Minutes! #coding #dataanlysis #dataquality #viralvideo

Join me for a quick rundown on the five core steps of Generative AI, as shared by Databricks. From foundation models to evaluation, learn how each stage transforms raw data into powerful, business-ready AI. #GenerativeAI #Databricks #AIExplained #ML #DataScience #dataanalytics #dataanalysis #data

Generative AI Demystified: The 5 Essential Stages #ai #chatgpt #foundationmodels #llm

In this practical, step-by-step tutorial, we'll build a comprehensive Data Quality Checker in Python using Pandas. We'll cover the 6 essential dimensions of data quality—Accuracy, Completeness, Uniqueness, Consistency, Timeliness, and Validity—and demonstrate how you can easily apply them to any dataset. By the end of this tutorial, you'll have a powerful and reusable Python class ready to improve your data quality immediately.

✅ Download the notebook and resources:
GitHub Repository (https://github.com/david-ikenna-ezekiel/data-quality-toolkit)

📚 Resources Referenced:

DAMA(UK) Data Management Body of Knowledge (DMBOK):
https://www.dama-uk.org

Watch this video if you want to:
✅ Understand key data quality dimensions practically.
✅ Learn to detect and fix common data issues automatically.
✅ Build a reusable Python class for data quality checks.

🔔 If you found this video helpful, please like, subscribe, and share!
💬 Feel free to ask questions or suggest topics in the comments.

Let's Build a Data Quality Checker in Python (Step-by-Step Tutorial)

Load More... Subscribe