⭐⭐⭐⭐⭐
“Jose executed the data migration from Gainsight CRM to the data warehouse with precision and zero downtime—a remarkable achievement in a complex, dependency-filled system. He coordinated seamlessly across multiple teams and time zones, proving himself to be organized, reliable, and fully committed. You can trust him to drive a project from start to finish.”
⭐⭐⭐⭐⭐
“During Google Summer of Code 2021, Jose was proactive, responsible, and solution-oriented while leading the development of the DBpedia Spotlight dashboard. He tackled technical challenges with resilience and delivered the project with outstanding results.”
Data Engineering Bootcamp at Le Wagon
Click to enlarge
Google Summer of Code
Click to enlarge
Amazon Letter
Click to enlarge
AWS Certified Cloud Practitioner
Click to enlarge
Bachelor Diploma
Click to enlarge
Master Diploma
Click to enlarge
Core Skills & Tooling
Ingestion, Orchestration & Processing
Data Platforms & Storage
ML, NLP & Knowledge Graphs
Analytics & Visualization
Cloud & DevOps
All code available on GitHub.
Real-time and batch pipeline for workout tracking, enrichment, and analytics using Kafka, MongoDB, Redshift, Prefect, and QuickSight.
Dashboard Structure:
1. Filters Panel
Interactive filters for Body Part, Equipment, Exercise Type, and Exercise Level allow analysts to slice the dataset for deeper exploration.
2. Key Performance Indicators (KPIs)
KPI tiles show Total Exercises (866), Average Calories Burned per Hour per kg (6.04 kcal), and Average Exercise Rating (7.87) for a quick overview of scale, intensity, and satisfaction.
3. Calories Burned by Category
Bar charts display average kcal/hour/kg by Exercise Type, Body Part, and Equipment to compare efficiency across categories.
4. Exercise Ratings by Category
Ratings broken down by Difficulty Level (Beginner, Intermediate, Expert) and Body Part to identify user preferences across workout types.
Real-time and batch pipeline for food data enrichment and analysis using Kafka, OpenAI API, Cassandra, ClickHouse, Dagster, and Superset.
Dashboard Structure:
1. Tab Navigation (Quantitative / Textual)
Tabs switch between Quantitative Data (charts and numeric comparisons) and OpenAI Text Insights (food descriptions, preparation tips, and pairings).
2. Macronutrient & Sodium Analysis
Interactive bar charts compare calories, protein, carbs, fat per 100 g and sodium content by food item, enabling comprehensive nutritional profiling.
3. AI Descriptions & Preparation Tips
OpenAI-generated descriptions provide ingredient breakdowns, nutritional highlights, and cooking advice for actionable guidance.
4. Best Pairings Word Cloud
A word cloud of common ingredient pairings (e.g. Salad, Pasta, Polenta, Couscous, Eggs) suggests complementary foods.
The client was struggling to manage real-time IoT data alongside legacy systems and PostgreSQL, which hindered integration and analytics. To address this, automated pipelines were built using Apache NiFi, Kafka, Spark Streaming, Airflow, and dbt, centralizing data in Snowflake. This improved data accessibility and laid the groundwork for advanced analytics, AI models, and visualization, reducing manual effort and enhancing decision-making.
The client faced data fragmentation across LMS, SIS, SaaS platforms, and real-time logs, complicating integration and analysis. Automated pipelines were developed using Fivetran, Spark (Dataproc), Beam (Dataflow), and Cloud Composer (Airflow), centralizing data in BigQuery. This streamlined data ingestion and access, enabling advanced analytics while reducing manual effort and improving decision-making.
👉 If your project is about reliable, scalable data systems, I can help.