Dharmesh Kashyap — Data Engineer

01 — About

I work across the full data lifecycle extraction, transformation, validation, and delivery. My work has spanned geospatial datasets at national scale, financial and regulatory pipelines, and executive dashboards that go straight into boardrooms. I've designed databases handling millions of records, built ETL systems from scratch, and managed complete project lifecycles including architecture, deployment, and stakeholder coordination.

My approach is automation-first. If a process can be systematised, it will be web scraping pipelines that survive anti-bot rewrites, PDF parsers for documents that were never meant to be parsed, API integrations that run without intervention. I'm strong in Python, SQL, and building scalable systems where reliability is non-negotiable.

I focus on one outcome: raw data in, decision-ready assets out. Everything in between schema design, validation layers, data quality checks is engineered to make that outcome repeatable.

Precision through automation. Reliability by design.

02 — Impact

Numbers that shipped.

Village boundary records delivered as GeoJSON

GOV Org

Automated Government scheme parsing into structured pdfs

GOV Org

Mutual fund instruments on automated pipelines

GOV Org

Record database designed from schema up

Financial Services Institution

Tab executive dashboard, presented to senior stakeholders

Financial Services Institution

03 — Projects

Selected Work

🗺

↗

GOV Org Pan-India Spatial Pipeline

End-to-end pipeline delivering 670,000+ village boundary records as structured GeoJSON. Handled inconsistent source data, boundary mismatches, and validation across multiple government data formats.

PythonGeoJSON ValidationETL

⚡

↗

Data Processor & ETL Pipeline Tool

Flask-based web tool for structured data ingestion. Runs automated validation checks — column schema comparison, null detection, type integrity — before transforming raw uploads into a clean, standardised CSV format and pushing directly into a PostgreSQL database. Built to replace manual data prep entirely.

PythonFlaskPostgreSQL ETLData ValidationCSV Processing

📈

↗

Executive Analytics Dashboard — Financial Services

8-tab Tableau dashboard built on 18L+ records modelled into PostgreSQL. Covers KPI reporting, trend analysis, and deep drill-downs across expense categories for a large financial services institution. Designed for senior stakeholders — from high-level summary views down to transaction-level detail.

TableauPostgreSQLSQL KPI DesignEDAData Modelling

⚙

↗

Lead Generation Automation Tool

AI-powered scraping and enrichment pipeline that eliminates manual prospecting. Entity resolution via Groq API, operator-facing Streamlit dashboard, one-click Excel export. Zero manual steps in the workflow.

PythonGroq APIStreamlit

📊

↗

COVID-19 Global Mortality Analysis

Processed 6.48M+ death records across multi-year global dataset. Temporal trend analysis, regional breakdowns, and anomaly detection visualized through a structured Tableau dashboard.

TableauPythonPandas SQLData Cleaning

🏠