José Manuel — Freelance Data Engineer

Hybrid Fitness Data Pipeline

Fitness

Real-time and batch pipeline for workout tracking, enrichment, and analytics using Kafka, MongoDB, Redshift, Prefect, and QuickSight.

Dashboard Structure:

1. Filters Panel

Interactive filters for Body Part, Equipment, Exercise Type, and Exercise Level allow analysts to slice the dataset for deeper exploration.

2. Key Performance Indicators (KPIs)

KPI tiles show Total Exercises (866), Average Calories Burned per Hour per kg (6.04 kcal), and Average Exercise Rating (7.87) for a quick overview of scale, intensity, and satisfaction.

3. Calories Burned by Category

Bar charts display average kcal/hour/kg by Exercise Type, Body Part, and Equipment to compare efficiency across categories.

4. Exercise Ratings by Category

Ratings broken down by Difficulty Level (Beginner, Intermediate, Expert) and Body Part to identify user preferences across workout types.

PythonPoetryApache Kafka PrefectMongoDBAmazon Redshift Amazon S3Amazon EC2Docker Compose GitHub ActionsAmazon QuickSight

View code

Hybrid Nutrition Data Pipeline

Nutrition

Real-time and batch pipeline for food data enrichment and analysis using Kafka, OpenAI API, Cassandra, ClickHouse, Dagster, and Superset.

Dashboard Structure:

1. Tab Navigation (Quantitative / Textual)

Tabs switch between Quantitative Data (charts and numeric comparisons) and OpenAI Text Insights (food descriptions, preparation tips, and pairings).

2. Macronutrient & Sodium Analysis

Interactive bar charts compare calories, protein, carbs, fat per 100 g and sodium content by food item, enabling comprehensive nutritional profiling.

3. AI Descriptions & Preparation Tips

OpenAI-generated descriptions provide ingredient breakdowns, nutritional highlights, and cooking advice for actionable guidance.

4. Best Pairings Word Cloud

A word cloud of common ingredient pairings (e.g. Salad, Pasta, Polenta, Couscous, Eggs) suggests complementary foods.

PythonPoetryApache Kafka DagsterCassandraClickHouse Docker ComposeGoogle Cloud Compute Engine GitHub ActionsApache SupersetOpenAI API

View code

About Me

Testimonials

Credentials & Certificates

Tech Stack

Personal Projects

Hybrid Fitness Data Pipeline

Hybrid Nutrition Data Pipeline

Client Projects

Hybrid Batch and Streaming Pipeline for IoT, Legacy, and PostgreSQL Data Integration

Batch and Streaming Pipelines for LMS, SIS, SaaS, and Log Data Integration

Work With Me

🎯 My mission

📊 What I offer

Not the right fit if…

Perfect fit if you need…

Let’s connect