Videos - The Data Signal

The Data Signal

Most people hear “Databricks” and think it’s just another data tool. It’s not.

At its core, Databricks is a system that sits on top of your cloud storage and uses powerful distributed compute to process massive amounts of data in parallel. Instead of moving data across different tools, everything works together in one place.

That’s why companies use it to run large-scale pipelines, analytics, and even machine learning—without constantly stitching systems together.

The real shift isn’t just speed. It’s turning a messy data setup into something that actually works as one system. #short

What Databricks Really Does (And Why Everyone Uses It)

The Apache Software Foundation powers some of the biggest technologies in modern data systems — including Spark, Hadoop, Kafka, and hundreds more.

But what exactly is it?

In this short video, I break down:

* why the Apache Foundation exists
* how it protects open-source projects
* what “The Apache Way” means
* and why companies trust Apache-backed technologies

Who Actually Owns Apache Spark? #dataanalytics #coding #programming

Sounds like data engineering is over… right?

Not exactly.

In this video, I break down what’s really happening behind this headline — and why most people are misunderstanding it. The truth is, this shift isn’t about AI replacing data engineers. It’s about something much bigger: the explosion of databases, environments, and data systems that now need to be managed.

We’ll talk about:

Why the 80% stat is misleading

The difference between dev/test vs production systems

How AI agents are changing the way data infrastructure is built

What this means for data engineering jobs and careers

The real skills that will matter going forward

If you’re a data engineer, analyst, or thinking about getting into data, this is something you need to understand.

Because the question isn’t “Is data engineering dead?”

It’s:

👉 “What does a data engineer actually do now?”

AI Builds 80% of Databases…What do Data Engineers do Now?

I built a Tetris-style game called TetriPulse using Codex.

But this video isn’t about “look what AI can do.”

It’s about what it actually takes to turn AI output into working software.

In this breakdown, I walk through:

• How I structured the project
• The architecture decisions behind the game loop
• How I wrote and refined prompts
• Why context management matters more than people realise
• The iteration cycles that shaped the final product

AI can generate code.

But building software still requires:

– Design thinking
– Systems awareness
– Trade-off decisions
– Continuous refinement

This is a transparent look at what the real workflow looks like when you combine AI with engineering discipline.

If you’re building with AI, or thinking about it — this is the process behind the scenes.

🔗 Repo

TetriPulse on GitHub:
https://github.com/david-ikenna-ezekiel/tetripulse

Inside My AI Game Build: Architecture, Prompts & Iterations

Ever opened your Downloads folder and realised it’s complete chaos?

In this video, I show how I used Codex CLI to automatically organise my messy downloads folder into structured categories — separating files to keep, files to remove, and grouping them by file type.

This wasn’t a polished production script… just a fun real-world experiment showing how AI can automate everyday digital clutter.

By the end of this video, you’ll see how developer AI tools like Codex can go beyond coding and help automate real life workflows.

🚀 What You’ll Learn

- How Codex CLI works in practice
- Using AI to analyse and categorise files
- Automating folder organisation
- Structuring files into meaningful categories
- Practical AI automation ideas you can replicate

I Let Codex CLI Clean My Messy Downloads Folder…Here’s What Happened

Want to become a data analyst in 2026 but don’t know where to start?

This video breaks down the only roadmap you need — step by step — starting from absolute zero. No degree required. No advanced math. Just the right order of skills that actually get people hired.

Stuff inside:

- Where beginners should actually start

- Which skills matter most in 2026

- Why learning tools in the wrong order slows you down

- How to build job-ready data analyst skills from scratch

If you’re serious about breaking into data analytics, save this video and follow for the next steps.

The Only Roadmap You Need to Become a Data Analyst in 2026!! #dataanalytics #datascience #AI

ETL vs ELT is one of those data concepts that sounds simple — until you actually try to explain it.

Most explanations focus on the acronyms:
Extract, Transform, Load vs Extract, Load, Transform.

But the real difference isn’t the order of the letters.

In this video, we break down ETL vs ELT by looking at what actually matters in modern data pipelines:

- Where transformations really happen

- Why ETL dominated for decades

- What changed with cloud data warehouses

- Why ELT fits modern analytics workflows

- And when ETL still makes more sense today

Instead of definitions and buzzwords, this explanation focuses on mindset, infrastructure, and real-world use cases — so the difference finally clicks.

If you’ve ever heard ETL and ELT discussed in meetings and felt like you kind of understood it, but not fully, this video is for you.

ETL vs ELT: The Real Difference in Modern Data Pipelines

In this video, I walk through how to deploy a simple website on Google Cloud Platform using a Linux virtual machine and Apache HTTP Server.

We start by creating a Linux VM in Compute Engine, logging into it via SSH, installing Apache, and finally serving a custom HTML page to the public internet. This is a hands-on, beginner-friendly walkthrough designed to help you understand how cloud virtual machines and web servers actually work together.

What you’ll learn:

- How to create a Linux VM in Google Compute Engine
- How to SSH into your VM
- How to install and run Apache HTTP Server
- How to create and serve a custom HTML website
- How to access your site via a public IP

Perfect if you’re new to GCP, cloud infrastructure, or want to understand the fundamentals behind hosting websites on virtual machines.

Deploy Your First Website on Google Cloud in 5 Minutes (Linux VM + Apache)

Want to use Google Drive like a fully functional cloud storage system for your Databricks projects?
In this step-by-step tutorial, I’ll show you the exact trick to connect Databricks to Google Drive using OAuth + PyDrive2, download any file type, and even auto-convert Google Sheets → CSV inside your notebook.

This method turns Google Drive into a lightweight, flexible data lake — perfect for analytics, machine learning, and data engineering workflows.

What You’ll Learn

✔️ How to enable Google Drive API
✔️ How to create OAuth credentials in Google Cloud
✔️ How to authenticate Databricks using PyDrive2
✔️ How to download files directly from Drive (CSV, XLSX, PDFs, ZIPs, etc.)
✔️ How to export Google Sheets automatically to CSV
✔️ How to load Drive data into Spark
✔️ How to build a reusable Drive download function

📁 Notebook & Code

Download the full Databricks notebook here 👇
🔗 GitHub Repository: [https://github.com/david-ikenna-ezekiel/thedatasignal/tree/main/google-drive-databricks]

🛠️ Technologies Used

- Databricks
- PyDrive2
- Google Drive API
- Google Cloud Platform
- Spark
- Python

👍 Like, Share & Subscribe

If this tutorial helped you, consider subscribing for more Databricks, Big Data, and automation content.

This Trick Turns Google Drive Into a Data Lake for Databricks

Struggling to wrap your head around SQL Window Functions? You’re not alone, they’re one of those things that sound complex until someone explains them the right way.

In this video, I’ll break down SQL Window Functions from scratch, in a way that finally makes sense.

We’ll cover:
✅ What window functions are and why they exist
✅ How OVER() and PARTITION BY actually work
✅ The difference between aggregates and window functions
✅ Real examples using Google BigQuery
✅ And how to calculate running totals, rankings, and moving averages — step by step.

By the end, you’ll not only understand window functions, you’ll actually use them confidently in your own SQL queries.

💡 What You’ll Learn

- How to calculate running totals and rolling averages with ROWS BETWEEN
- Ranking and ranking differences: ROW_NUMBER(), RANK(), and DENSE_RANK()
- Comparing each row to group averages using PARTITION BY
- Why window functions beat subqueries for analytics
- How BigQuery handles partitions and frames behind the scenes

🧠 Who This Video Is For

- Data analysts & data scientists learning advanced SQL
- Developers who use BigQuery, PostgreSQL, or Snowflake
- Anyone who’s tired of copy-pasting SQL code without really understanding it

⏰ Timestamps

00:00 – Intro
01:03 – Why Window Functions Exist
02:37 – Why Windows Function is Important
03:35 – The Magic of Over()
04:40 – Demo Practice

🙌 Support the Channel
If this video helped you finally “get it,” please like, subscribe, and share it with a friend who’s learning SQL too!

Let’s make data make sense — one query at a time 💙

Stop Struggling with SQL Window Functions — Watch This First!

Ever wondered what data engineers actually do? 🤔 It’s not just writing SQL or building pipelines all day. In this short video, I break down the 5 real things data engineers do, in plain English, with simple, real-world examples.

Whether you’re just starting out in tech, curious about data careers, or trying to understand what your company’s data team really does—this is for you.

You’ll learn how data engineers:
1️⃣ Build the data plumbing that keeps information flowing
2️⃣ Clean up messy, chaotic data so others can trust it
3️⃣ Organize everything in data warehouses like Snowflake or BigQuery
4️⃣ Make systems fast, reliable, and scalable
5️⃣ Help analysts, data scientists, and teams actually use the data

What do data engineers do all day? #ai #dataengineering #job

5 rookie mistakes data engineers still make—fix these and earn trust fast. #DataEngineer #Airflow #dbt #Prefect #Dagster #GreatExpectations

5 Habits Every Data Engineer Must Drop -- they're career killers #dataengineering #programming #ai