About Me

Testimonials

Slido logo
Andrej B.Head of Data Circus at Slido

⭐⭐⭐⭐⭐
“Jose executed the data migration from Gainsight CRM to the data warehouse with precision and zero downtime—a remarkable achievement in a complex, dependency-filled system. He coordinated seamlessly across multiple teams and time zones, proving himself to be organized, reliable, and fully committed. You can trust him to drive a project from start to finish.”

Google Summer of Code logo
Julio H.Google Summer of Code Mentor, DBpedia

⭐⭐⭐⭐⭐
“During Google Summer of Code 2021, Jose was proactive, responsible, and solution-oriented while leading the development of the DBpedia Spotlight dashboard. He tackled technical challenges with resilience and delivered the project with outstanding results.”

Credentials & Certificates

Data Engineering Bootcamp at Le Wagon Data Engineering Bootcamp at Le Wagon Click to enlarge
Google Summer of Code Google Summer of Code Click to enlarge
Learning beyond the classroom and applying hands-on coding in real data projects.
Amazon Letter Amazon Letter Click to enlarge
AWS Certified Cloud Practitioner AWS Certified Cloud Practitioner Click to enlarge
My role and experience at Amazon Luxembourg.
Bachelor Diploma Bachelor Diploma Click to enlarge
Master Diploma Master Diploma Click to enlarge
My official university education.

Tech Stack

Core Skills & Tooling

PythonSQLBash GitGitHubPoetryPylint PandasNumPy

Ingestion, Orchestration & Processing

Apache AirflowCloud Composer (GCP)MWAA (AWS) dbtFivetranAirbytePrefect Apache SparkPySparkApache Beam Dataflow (GCP)Dataproc (GCP)Spark Structured Streaming Apache KafkaGoogle Pub/SubApache NiFi Web scraping

Data Platforms & Storage

Amazon S3Google Cloud StorageParquet BigQuerySnowflakeAmazon RedshiftAmazon Athena PostgreSQLMongoDBCassandraClickHouse

ML, NLP & Knowledge Graphs

Generative AILarge Language Models OpenAI APILangChain (RAG)Hugging Face Transformers NLTKspaCyscikit-learn PyTorchTensorFlow SPARQLAWS SageMaker

Analytics & Visualization

MatplotlibSeabornPlotly Amazon QuickSightApache Superset

Cloud & DevOps

Amazon EC2Google Compute Engine Terraform (IaC)DockerDocker Compose GitHub Actions (CI/CD)IAM / RBAC

Personal Projects

All code available on GitHub.

Hybrid Fitness Data Pipeline

Hybrid Fitness Data Pipeline

Fitness

Real-time and batch pipeline for workout tracking, enrichment, and analytics using Kafka, MongoDB, Redshift, Prefect, and QuickSight.

Dashboard Structure:

1. Filters Panel

Filters Panel

Interactive filters for Body Part, Equipment, Exercise Type, and Exercise Level allow analysts to slice the dataset for deeper exploration.

2. Key Performance Indicators (KPIs)

Key Performance Indicators

KPI tiles show Total Exercises (866), Average Calories Burned per Hour per kg (6.04 kcal), and Average Exercise Rating (7.87) for a quick overview of scale, intensity, and satisfaction.

3. Calories Burned by Category

Calories Burned by Category

Bar charts display average kcal/hour/kg by Exercise Type, Body Part, and Equipment to compare efficiency across categories.

4. Exercise Ratings by Category

Exercise Ratings by Category

Ratings broken down by Difficulty Level (Beginner, Intermediate, Expert) and Body Part to identify user preferences across workout types.

PythonPoetryApache Kafka PrefectMongoDBAmazon Redshift Amazon S3Amazon EC2Docker Compose GitHub ActionsAmazon QuickSight
View code
Hybrid Nutrition Data Pipeline

Hybrid Nutrition Data Pipeline

Nutrition

Real-time and batch pipeline for food data enrichment and analysis using Kafka, OpenAI API, Cassandra, ClickHouse, Dagster, and Superset.

Dashboard Structure:

1. Tab Navigation (Quantitative / Textual)

Tab Navigation

Tabs switch between Quantitative Data (charts and numeric comparisons) and OpenAI Text Insights (food descriptions, preparation tips, and pairings).

2. Macronutrient & Sodium Analysis

Macronutrient & Sodium Analysis

Interactive bar charts compare calories, protein, carbs, fat per 100 g and sodium content by food item, enabling comprehensive nutritional profiling.

3. AI Descriptions & Preparation Tips

AI Descriptions & Preparation Tips

OpenAI-generated descriptions provide ingredient breakdowns, nutritional highlights, and cooking advice for actionable guidance.

4. Best Pairings Word Cloud

Best Pairings Word Cloud

A word cloud of common ingredient pairings (e.g. Salad, Pasta, Polenta, Couscous, Eggs) suggests complementary foods.

PythonPoetryApache Kafka DagsterCassandraClickHouse Docker ComposeGoogle Cloud Compute Engine GitHub ActionsApache SupersetOpenAI API
View code

Client Projects

Hybrid Batch + Streaming IoT/Legacy/PostgreSQL

Hybrid Batch and Streaming Pipeline for IoT, Legacy, and PostgreSQL Data Integration

HealthTech

The client was struggling to manage real-time IoT data alongside legacy systems and PostgreSQL, which hindered integration and analytics. To address this, automated pipelines were built using Apache NiFi, Kafka, Spark Streaming, Airflow, and dbt, centralizing data in Snowflake. This improved data accessibility and laid the groundwork for advanced analytics, AI models, and visualization, reducing manual effort and enhancing decision-making.

PythonSQLApache NiFi Apache KafkaSpark StreamingApache Airflow dbtAmazon S3Snowflake PostgreSQLDocker ComposeTerraform
Batch + Streaming LMS/SIS/SaaS/Logs

Batch and Streaming Pipelines for LMS, SIS, SaaS, and Log Data Integration

EdTech

The client faced data fragmentation across LMS, SIS, SaaS platforms, and real-time logs, complicating integration and analysis. Automated pipelines were developed using Fivetran, Spark (Dataproc), Beam (Dataflow), and Cloud Composer (Airflow), centralizing data in BigQuery. This streamlined data ingestion and access, enabling advanced analytics while reducing manual effort and improving decision-making.

PythonApache SparkApache Beam Apache AirflowGoogle Cloud Storage BigQueryDocker Compose

Work With Me

🎯 My mission

📊 What I offer

Not the right fit if…

Perfect fit if you need…

👉 If your project is about reliable, scalable data systems, I can help.

Let’s connect

📧 jm.diaz.urraco@gmail.com

🔗 My LinkedIn