Available for opportunities

Dharmesh
Kashyap

Data Engineer · ETL Pipelines · Analytics Engineering · Data Automation

Data Engineer specialising in ETL pipelines, large-scale data processing, and automation systems. I turn raw, messy, multi-source data into structured assets that drive real decisions.

I work across the full data lifecycle extraction, transformation, validation, and delivery. My work has spanned geospatial datasets at national scale, financial and regulatory pipelines, and executive dashboards that go straight into boardrooms. I've designed databases handling millions of records, built ETL systems from scratch, and managed complete project lifecycles including architecture, deployment, and stakeholder coordination.

My approach is automation-first. If a process can be systematised, it will be web scraping pipelines that survive anti-bot rewrites, PDF parsers for documents that were never meant to be parsed, API integrations that run without intervention. I'm strong in Python, SQL, and building scalable systems where reliability is non-negotiable.

I focus on one outcome: raw data in, decision-ready assets out. Everything in between schema design, validation layers, data quality checks is engineered to make that outcome repeatable.

Precision through automation. Reliability by design.

Numbers that shipped.

0
Village boundary records delivered as GeoJSON
GOV Org
0
Automated Government scheme parsing into structured pdfs
GOV Org
0
Mutual fund instruments on automated pipelines
GOV Org
0
Record database designed from schema up
Financial Services Institution
0
Tab executive dashboard, presented to senior stakeholders
Financial Services Institution

Selected Work

🗺
GOV Org Pan-India Spatial Pipeline
End-to-end pipeline delivering 670,000+ village boundary records as structured GeoJSON. Handled inconsistent source data, boundary mismatches, and validation across multiple government data formats.
PythonGeoJSON ValidationETL
Data Processor & ETL Pipeline Tool
Flask-based web tool for structured data ingestion. Runs automated validation checks — column schema comparison, null detection, type integrity — before transforming raw uploads into a clean, standardised CSV format and pushing directly into a PostgreSQL database. Built to replace manual data prep entirely.
PythonFlaskPostgreSQL ETLData ValidationCSV Processing
📈
Executive Analytics Dashboard — Financial Services
8-tab Tableau dashboard built on 18L+ records modelled into PostgreSQL. Covers KPI reporting, trend analysis, and deep drill-downs across expense categories for a large financial services institution. Designed for senior stakeholders — from high-level summary views down to transaction-level detail.
TableauPostgreSQLSQL KPI DesignEDAData Modelling
Lead Generation Automation Tool
AI-powered scraping and enrichment pipeline that eliminates manual prospecting. Entity resolution via Groq API, operator-facing Streamlit dashboard, one-click Excel export. Zero manual steps in the workflow.
PythonGroq APIStreamlit
📊
COVID-19 Global Mortality Analysis
Processed 6.48M+ death records across multi-year global dataset. Temporal trend analysis, regional breakdowns, and anomaly detection visualized through a structured Tableau dashboard.
TableauPythonPandas SQLData Cleaning
🏠
Airbnb Market Pricing Intelligence
Listing-level pricing pattern analysis across neighborhoods and property types. KPI overlays for occupancy signals and pricing efficiency. Built for actionable market intelligence, not just visualization.
TableauPythonEDA Data VisualizationKPI Design

Technical Skills

Programming Languages
Python SQL Java C++
Data Engineering
ETL Pipelines Data Validation Web Scraping API Integration Data Processing Automation
Databases
PostgreSQL MySQL Schema Design Query Optimization
Analytics & Tooling
Tableau Streamlit Jupyter Bright Data KPI Reporting EDA Dashboard Design