Prathyush KR Lebaku
prathyushreddy55@gmail.com
소개
Data Scientist with 4 years of experience building search, ranking, personalization and NLP/LLM-driven systems. Delivered measurable gains through hybrid retrieval, fine-tuned Transformers, relevance evaluation, and A/B-driven product improvements. Strong in metrics design, experimentation, and turning behavioral data into ML-powered features.
경력
The Home Depot
TX
Data Scientist
Aug. 2024 - 현재
- Translated business problems into ML solutions by defining objectives, success metrics, & data requirements with cross-functional teams.
- Built a hybrid retrieval and personalized ranking system using semantic embeddings, FAISS, BM25, and fine-tuned Transformer models (Hugging Face, PyTorch) and Ranker(XGBoost) to improve search relevance, driving a 15% lift in CTR across the e-commerce platform.
- Analyzed large-scale search logs, customer behavior data, and product metadata (SQL, Python, PySpark) to identify ranking gaps, relevance issues, & opportunities for personalization.
- Enhanced query understanding with fine-tuned Transformer models (BERT/Sentence-BERT) for semantic rewriting, intent classification, and synonym expansion, improving long-tail search recall and user relevance.
- Developed offline relevance evaluation frameworks (NDCG, Recall@K, MRR) and ran controlled A/B tests to measure ranking improvements, ensuring statistically sound validation of search and recommendations.
- Developed an LLM-powered RAG pipeline by implementing document chunking, vector indexing, & retrieval workflows (Python, Vertex AI, Hugging Face, & BigQuery), enabling automated customer support responses & reducing ticket resolution time by 10%.
- Built customer segmentation and demand forecasting models by developing XGBoost/LSTM time-series predictors and RFM/K-means customer clusters to identify high-value cohorts, enabling targeted campaigns that improved engagement and reduced marketing waste.
- Implemented and maintained scalable ML pipelines using PySpark, Airflow, BigQuery, and Vertex AI, enabling automated model training, large-scale feature processing, batch & real-time inference, and production monitoring.
- Partnered with data engineering team to define data schemas, feature requirements, and quality checks for POS, CRM, and web analytics pipelines on AWS (Glue, Lambda, S3), ensuring reliable, model-ready datasets and timely refresh cycles.
- Developed Tableau dashboards for search analytics, customer segmentation, and inventory forecasting using Snowflake/BigQuery data pipelines, improving self-service analytics for marketing and operations teams.
HCLTech
India
Product Data Scientist
Apr. 2021 - Jul. 2023
- Defined north-star metrics and feature-level KPIs for interview analytics, user engagement, and payout workflows, enabling consistent measurement and faster decision-making across product and engineering teams.
- Designed and executed A/B tests and quasi-experiments to evaluate scoring logic, funnel optimizations, and payout adjustments—turning statistical results into actionable product decisions within the same sprint.
- Built dashboards and lightweight analytical data models using SQL and Tableau/Looker to support self-serve insights on user behavior, funnel conversion, and operational performance.
- Defined tracking requirements & validated event instrumentation to improve data quality, coverage, reliability for downstream analytics.
- Prototyped lightweight ML models (logistic regression, decision trees, XGBoost) to improve matching, scoring, and operational workflows, providing quick baselines for product experimentation.
- Evaluated NLP/chatbot-powered features by designing scoring rubrics, human review workflows, and quality assessments to measure accuracy, intent classification performance, and robustness.
학력
University of Houston
Texas, USA
Master of Science in Engineering Data Science
- Worked as Research Assistant under Prof.Lu Gao (Aug 2023 – Aug 2024)
Vellore Institute of Technology
Vellore, Tamil Nadu, India
Bachelor of Science in CSE with Specialization in Data Science
기술
Programming Languages
- Python
- SQL
Machine Learning
- Scikit-Learn
- PyTorch
- TensorFlow
- Clustering (K-Means, DBSCAN)
- Time Series
- Forecasting
Natural Language Processing & LLM
- Hugging Face Transformers (BERT, RoBERTa, SBERT, T5)
- Text Classification
- NER
- Semantic Search
- RAG
- LLM Evaluation (HITL, rubrics)
- LangChain
- MCP
Experimentation & Analytics
- A/B Testing
- Quasi-Experiments
- Causal Inference (basic)
- Funnel Analytics
- KPI/North-Star Metric Design
- Behavioural Analytics
Big Data & Distributed Computing
- PySpark
- Spark SQL
- Databricks
- Hadoop Ecosystem
Data Visualization & BI Tools
- Tableau
- Power BI
Cloud Platforms
- AWS (S3, Redshift, EC2, SageMaker)
- GCP (BigQuery, Vertex AI)
- Azure (basic)
자격증
Microsoft Power BI Associate (PL-300)
Microsoft
Microsoft Azure Data Scientist Associate (DP-100)
Microsoft
AWS Cloud Practitioner
Amazon
Deep Learning by Andrew Ng