[00:00.001] booting portfolio.kernel [ ok ]
[00:00.014] loading identity matrix... [ ok ]
[00:00.027] mounting /skills, /projects, /career [ ok ]
[00:00.041] negotiating handshake with recruiter [ pending ]

whoami --verbose --format=human

<shashank.sangepu/>

role = data_analyst based = Detroit, MI status = open to work

Bridging corporate finance and enterprise-grade DataOps: I architect end-to-end ETL pipelines, train ML models, and ship LLM-powered apps that turn raw data into decisions executives actually act on.

./view_projects.sh → cat career.log ssh shashank@inbox

scroll · J next · K prev

cat --syntax=markdown about.md

profile [ /about ]

I'm a Data Professional with a Master of Science in Data Analytics, combining a foundation in corporate finance with enterprise-grade DataOps and cloud ecosystems.

My expertise spans the entire data lifecycle — architecting robust ETL pipelines, optimising complex SQL architectures, deploying CI/CD workflows, and integrating LLM applications into production environments.

I thrive at the intersection of technical data engineering and strategic business decision-making, with hands-on experience across real estate analytics, financial planning, and AI development in USA, India, and Australia. I've presented findings to non-technical stakeholders and translated ambiguous business questions into data problems worth solving.

4.0ms gpadata analytics

3countriesus · in · au

40%faster queriespg · indexing

tree -L 2 ./stack

technical stack [ /skills ]

stack ├── databases_cloud │ ├── postgresql.engine # primary warehouse, advanced indexing │ ├── mysql · sqlserver · azure_sql.engine │ ├── microsoft_azure.cloud # data factory, blob, synapse │ └── aws.cloud # s3, ec2, lambda ├── programming │ ├── python.lang # pandas, numpy, scikit-learn │ ├── sql.lang # ctes, window fns, query tuning │ ├── r.lang # stats, monte carlo │ └── javascript · html · css.web ├── ai_machine_learning │ ├── predictive_modeling.ml # regression, classification, clustering │ ├── time_series.ml # arima, prophet │ ├── llm_integration.ai # gemini api, structured outputs │ └── prompt_engineering.ai ├── bi_visualisation │ ├── power_bi.bi # dax, paginated reports │ ├── tableau.bi │ ├── streamlit.app # internal data apps │ └── advanced_excel.bi ├── data_engineering │ ├── etl_pipelines.eng # python orchestration, scheduling │ ├── data_modeling.eng # normalised + star schemas │ ├── schema_normalisation.eng │ └── data_validation.eng └── methodologies ├── git · github.vcs ├── ci_cd_pipelines · docker.devops ├── agile · scrum.process └── ab_testing · statistics.research

30 files · 6 directories · last sync just now

git log --oneline --decorate projects/

selected work [ /projects ]

01
02
03
04
05
06
07
08
09
10

a3f1c92 main Live

▸Stock Sentiment & Analytics Dashboard

A full-stack financial analytics platform tracking 26 S&P 500 tickers. Ingests RSS news feeds daily, scores articles via the Gemini API, and surfaces insights across 9 analysis tabs: OHLCV overlay, auto-ARIMA 7-day forecast, Random Forest Watch/Hold/Avoid signal with P(up) probability, sentiment vs. price chart, sentiment-return regression, volatility, STL decomposition, rolling correlation, and a 5-year income statement. A per-ticker Gemini AI analyst briefing with streaming chat Q&A rounds out the platform — all backed by PostgreSQL, deployed on a cloud VPS with full CI/CD.

pythonfastapipostgresql reactscikit-learnarima dockergemini-aigithub-actionsnginx

./run_demo → git remote -v

01
02
03
04
05
06
07
08
09
10
11
12

7d2e91a main Capstone

▸U.S. Airline Delay Analysis (2019–2024)

Which airline should a frequent flyer trust, and does it matter when they travel? Using 10.7 million flights across 8 airlines (2019–2024), this analysis gives a data-backed answer. Post-pandemic delays have not recovered to pre-COVID baselines. Delta is the most reliable carrier at 1.77 min average delay; JetBlue the worst at 13.30 min. Summer travel carries 1.84× the odds of a 15+ min delay. Built with rigorous statistical testing: Welch's t-test, two-way ANOVA, Tukey HSD, logistic regression, and Random Forest.

pythonpandasscipy statsmodelsscikit-learnseaborn

git remote -v →

01
02
03
04
05
06
07
08
09
10

3b8f044 main Research

▸U.S. Energy Emissions & Renewables (2013–2023)

Has the U.S. energy grid actually gotten cleaner — and which states are driving it? No clean dataset existed, so one was built from scratch via the EIA API and web-scraped archive files across 50 states and 11 years. CO₂ emissions dropped 144.78 kg/MWh on average (p=1e-12), but the gains are uneven: OLS regression and ANOVA reveal that renewable energy share is the strongest predictor of emissions intensity, with sharp regional disparities that national averages obscure.

pythonpandasscipy statsmodelsrequestsbeautifulsoupseaborn

git remote -v →

01
02
03
04
05
06
07
08
09
10

c91d7f2 main Research

▸Soybean Yield Prediction

Can physical seed traits predict how much a soybean cultivar will yield before harvest? Across 40 cultivars, Random Forest and XGBoost models (both regression and classification, tuned via GridSearchCV) say yes. Thousand Seed Weight and grain count per plant are the dominant predictors — a finding that holds up in the crop science literature and gives breeders a measurable selection signal earlier in the growth cycle.

pythonpandasscikit-learn xgboostseaborn

git remote -v →

01
02
03
04
05
06
07
08
09
10

e4a5c81 main Database

▸Dungeon Master's Vault

A live tabletop RPG campaign generates more state than a spreadsheet can handle — characters, inventory, quests, gear, and loot all have complex relationships. This MySQL database models that domain properly: 3NF schema across 5 tables, a many-to-many inventory bridge, and a deliberate indexing strategy (FK joins, text search, boolean flags, composite indexes for equipped gear). Gameplay queries cover character sheets, encumbrance, quest tracking, and loot extraction.

mysqlsqlschema design indexinglucidchart

git remote -v →

tail -f career.log | grep "[shashank]"

career log [ /experience ]

Apr 2026— present

USA · remote

Data Analyst

AI Labs Web LLC

Building Python ETL pipelines (Pandas, NumPy) to orchestrate continuous data ingestion for AI development initiatives.
Integrating the Gemini API to parse and structure raw unstructured text, supporting feature engineering for NLP workflows.
Developing Power BI dashboards to surface pipeline health metrics and flag data anomalies before they reach model training.

Jan 2026— present

Troy, MI

Peer Tutor — Data Analytics

Walsh College

Support graduate students in SQL, Python, Power BI, relational database design, and LLM integration with the Gemini API.
Guide capstone projects involving statistical testing, power analysis, and longitudinal datasets exceeding 10M records.

Jun 2023— Aug 2025

Hyderabad, India

Business Intelligence Analyst

Near Estate

Architected end-to-end ETL pipelines into a centralised PostgreSQL data warehouse, improving query performance by 40% through advanced indexing and schema normalisation.
Developed ML models (regression, clustering) to forecast property valuations, contributing to a 5% increase in profit margins.
Built executive Power BI dashboards tracking development KPIs and monitoring a Rs. 20 Cr operational budget.
Engineered a Streamlit + LLM internal tool to auto-summarise property inspection reports, significantly reducing manual review time.

Jul 2022— Jun 2023

Hyderabad, India

Assistant Finance Manager

Near Estate

Used Python time-series forecasting (ARIMA, Prophet) to model revenue trajectories, helping secure Rs. 10 Cr in additional funding.
Automated FP&A workflows by migrating manual Excel processes to dynamic data pipelines, reducing month-end reporting turnaround by 15+ hours.
Transitioned static monthly summaries to interactive self-service Tableau dashboards for department heads.

Jan 2022— May 2022

Gold Coast, AU

Financial Planning Associate

Apex Financial Planning

Analysed financial datasets using R to identify high-yield investment opportunities and assess historical portfolio volatility.
Assisted in developing Monte Carlo simulations to stress-test client portfolios against economic downturn scenarios.
Automated daily market data extraction from third-party financial APIs to feed internal reporting environments.

psql -c "SELECT * FROM education ORDER BY year DESC;"

education [ /education ]

SELECT degree, institution, gpa, country FROM education ORDER BY year DESC;

Degree	Institution	GPA	Country
M.S., Data Analytics	Walsh College	4.0 / 4.0	USA
PG Program, Data Science & Business Analytics	University of Texas	4.0 / 4.0	USA
B.Bus., Finance & International Business	Griffith University	4.28 / 7.0	Australia

(3 rows) — query executed in 0.014s

ssh shashank@inbox -p 22

get in touch [ /contact ]

→ Hello, recruiter / hiring manager.

Connection established. I'm currently open to Data Analyst, Data Scientist, and Data Engineering roles — full-time, remote-friendly, US-based. Reach out via any channel below; I usually reply within 24 hours.

LinkedIn GitHub @ copy email