Prathyush KR Lebaku

Data Scientist — Machine Learning | Analytics | Optimization | Forecasting · Texas, USA

prathyushreddy55@gmail.com

Data Scientist with 4 years of experience building search, ranking, personalization and NLP/LLM-driven systems. Delivered measurable gains through hybrid retrieval, fine-tuned Transformers, relevance evaluation, and A/B-driven product improvements. Strong in metrics design, experimentation, and turning behavioral data into ML-powered features.

Work Experience

The Home Depot

Data Scientist

Aug 2024 - now

Translated business problems into ML solutions by defining objectives, success metrics, & data requirements with cross-functional teams.
Built a hybrid retrieval and personalized ranking system using semantic embeddings, FAISS, BM25, and fine-tuned Transformer models (Hugging Face, PyTorch) and Ranker(XGBoost) to improve search relevance, driving a 15% lift in CTR across the e-commerce platform.
Analyzed large-scale search logs, customer behavior data, and product metadata (SQL, Python, PySpark) to identify ranking gaps, relevance issues, & opportunities for personalization.
Enhanced query understanding with fine-tuned Transformer models (BERT/Sentence-BERT) for semantic rewriting, intent classification, and synonym expansion, improving long-tail search recall and user relevance.
Developed offline relevance evaluation frameworks (NDCG, Recall@K, MRR) and ran controlled A/B tests to measure ranking improvements, ensuring statistically sound validation of search and recommendations.
Developed an LLM-powered RAG pipeline by implementing document chunking, vector indexing, & retrieval workflows (Python, Vertex AI, Hugging Face, & BigQuery), enabling automated customer support responses & reducing ticket resolution time by 10%.
Built customer segmentation and demand forecasting models by developing XGBoost/LSTM time-series predictors and RFM/K-means customer clusters to identify high-value cohorts, enabling targeted campaigns that improved engagement and reduced marketing waste.
Implemented and maintained scalable ML pipelines using PySpark, Airflow, BigQuery, and Vertex AI, enabling automated model training, large-scale feature processing, batch & real-time inference, and production monitoring.
Partnered with data engineering team to define data schemas, feature requirements, and quality checks for POS, CRM, and web analytics pipelines on AWS (Glue, Lambda, S3), ensuring reliable, model-ready datasets and timely refresh cycles.
Developed Tableau dashboards for search analytics, customer segmentation, and inventory forecasting using Snowflake/BigQuery data pipelines, improving self-service analytics for marketing and operations teams.

HCLTech

India

Product Data Scientist

Apr 2021 - Jul 2023

Defined north-star metrics and feature-level KPIs for interview analytics, user engagement, and payout workflows, enabling consistent measurement and faster decision-making across product and engineering teams.
Designed and executed A/B tests and quasi-experiments to evaluate scoring logic, funnel optimizations, and payout adjustments—turning statistical results into actionable product decisions within the same sprint.
Built dashboards and lightweight analytical data models using SQL and Tableau/Looker to support self-serve insights on user behavior, funnel conversion, and operational performance.
Defined tracking requirements & validated event instrumentation to improve data quality, coverage, reliability for downstream analytics.
Prototyped lightweight ML models (logistic regression, decision trees, XGBoost) to improve matching, scoring, and operational workflows, providing quick baselines for product experimentation.
Evaluated NLP/chatbot-powered features by designing scoring rubrics, human review workflows, and quality assessments to measure accuracy, intent classification performance, and robustness.

Education

University of Houston

Texas, USA

Master of Science in Engineering Data Science

Worked as Research Assistant under Prof.Lu Gao (Aug 2023 – Aug 2024)

Vellore Institute of Technology

Vellore, Tamil Nadu, India

Bachelor of Science in CSE with Specialization in Data Science

Skills

Programming Languages

Python
SQL

Machine Learning

Scikit-Learn
PyTorch
TensorFlow
Clustering (K-Means, DBSCAN)
Time Series
Forecasting

Natural Language Processing & LLM

Hugging Face Transformers (BERT, RoBERTa, SBERT, T5)
Text Classification
NER
Semantic Search
RAG
LLM Evaluation (HITL, rubrics)
LangChain
MCP

Experimentation & Analytics

A/B Testing
Quasi-Experiments
Causal Inference (basic)
Funnel Analytics
KPI/North-Star Metric Design
Behavioural Analytics

Big Data & Distributed Computing

PySpark
Spark SQL
Databricks
Hadoop Ecosystem

Data Visualization & BI Tools

Tableau
Power BI

Cloud Platforms

AWS (S3, Redshift, EC2, SageMaker)
GCP (BigQuery, Vertex AI)
Azure (basic)

Certifications

Microsoft Power BI Associate (PL-300)

Microsoft

Microsoft Azure Data Scientist Associate (DP-100)

Microsoft

AWS Cloud Practitioner

Amazon

Publications

Google Scholar(4 research papers as first author)

Google Scholar

Contacts

• prathyushreddy55@gmail.com