Internship Experience
Deep dive into the systems and architectures I built during my internships
Amazon
Software Development Engineer Intern
End-to-End Search Query Classification System
Implemented a comprehensive search query classification system to fetch and normalize 50,000 search queries from database, applied LLM prompt templates via Python script to categorize queries into distinct categories, and stored results in Amazon S3.
Key Highlights
- ▸Fetched and normalized 50,000+ search queries from database using SQL
- ▸Applied LLM prompt templates for intelligent query categorization
- ▸Stored classifications in Amazon S3 for scalable access
- ▸Implemented caching mechanism using HashMap data structure for classification-aligned query execution
- ▸Achieved precise category-specific query results using Java Spring Framework
System Components
Query Fetcher
Fetches and normalizes search queries from database
LLM Classifier
Applies LLM to categorize queries
S3 Storage
Stores query classifications
Cache Layer
In-memory cache using HashMap data structure
Query Executor
Executes category-specific queries
Data Flow
Sends normalized queries
50K search queriesStores categorized queries
Query + Category mappingsLoads classifications
Category dataProvides cached classifications
Classification resultsArchitecture Flow
Query Fetcher
LLM Classifier
S3 Storage
Cache Layer
Query Executor
Innomatics Research Labs
Data Science Intern
Product Review Sentiment Analysis System
Engineered a comprehensive system to analyze and classify over 8,500 product reviews, leveraging Prefect for ETL pipeline automation and scheduling. Trained multiple sentiment analysis models achieving F1-Score of 0.92.
Key Highlights
- ▸Analyzed and classified 8,500+ product reviews
- ▸Implemented Prefect for ETL pipeline automation and scheduling
- ▸Trained models using BoW, TF-IDF, Word2Vec, and BERT
- ▸Achieved F1-Score of 0.92 with BERT model
- ▸Utilized MLflow for model management and experiment tracking
- ▸Deployed sentiment analysis web application on AWS
- ▸Enabled real-time customer feedback insights
System Components
Data Ingestion
Fetches product reviews from multiple sources
ETL Pipeline
Automated data transformation and loading
Feature Engineering
Creates features using BoW, TF-IDF, Word2Vec
Model Training
Trains multiple sentiment models
MLflow Tracker
Tracks experiments and model versions
Inference API
Real-time sentiment prediction
AWS Deployment
Cloud hosting and scaling using ECS cluster
Data Flow
Raw review data
8,500+ reviewsCleaned and preprocessed text
Processed reviewsFeature vectors
BoW, TF-IDF, Word2Vec, BERT embeddingsModel metrics and artifacts
F1-Score: 0.92Trained model
Best BERT modelDeploy model to ECS cluster
Production modelArchitecture Flow
Data Ingestion
ETL Pipeline
Feature Engineering
Model Training
MLflow Tracker
Inference API
AWS Deployment