Innomatics Research Labs
Data Science Intern
Product Review Sentiment Analysis System
Engineered a comprehensive system to analyze and classify over 8,500 product reviews, leveraging Prefect for ETL pipeline automation and scheduling. Trained multiple sentiment analysis models achieving F1-Score of 0.92.
📋 Project Overview
Engineered a comprehensive system to analyze and classify over 8,500 product reviews, leveraging Prefect for ETL pipeline automation and scheduling. Trained multiple sentiment analysis models achieving F1-Score of 0.92.
✨ Key Highlights
Analyzed and classified 8,500+ product reviews
Implemented Prefect for ETL pipeline automation and scheduling
Trained models using BoW, TF-IDF, Word2Vec, and BERT
Achieved F1-Score of 0.92 with BERT model
Utilized MLflow for model management and experiment tracking
Deployed sentiment analysis web application on AWS
Enabled real-time customer feedback insights
🛠️ Technology Stack
🏗️ System Architecture
System Components
Data Ingestion
Fetches product reviews from multiple sources
ETL Pipeline
Automated data transformation and loading
Feature Engineering
Creates features using BoW, TF-IDF, Word2Vec
Model Training
Trains multiple sentiment models
MLflow Tracker
Tracks experiments and model versions
Inference API
Real-time sentiment prediction
AWS Deployment
Cloud hosting and scaling using ECS cluster
Data Flow
Raw review data
8,500+ reviewsCleaned and preprocessed text
Processed reviewsFeature vectors
BoW, TF-IDF, Word2Vec, BERT embeddingsModel metrics and artifacts
F1-Score: 0.92Trained model
Best BERT modelDeploy model to ECS cluster
Production modelArchitecture Flow
Data Ingestion
ETL Pipeline
Feature Engineering
Model Training
MLflow Tracker
Inference API
AWS Deployment
🔄 End-to-End Flow
Raw review data
8,500+ reviewsCleaned and preprocessed text
Processed reviewsFeature vectors
BoW, TF-IDF, Word2Vec, BERT embeddingsModel metrics and artifacts
F1-Score: 0.92Trained model
Best BERT modelDeploy model to ECS cluster
Production model📊 Component Details
Data Ingestion
Fetches product reviews from multiple sources
ETL Pipeline
Automated data transformation and loading
Feature Engineering
Creates features using BoW, TF-IDF, Word2Vec
Model Training
Trains multiple sentiment models
MLflow Tracker
Tracks experiments and model versions
Inference API
Real-time sentiment prediction
AWS Deployment
Cloud hosting and scaling using ECS cluster
🎯 Impact & Results
Analyzed and classified 8,500+ product reviews
Implemented Prefect for ETL pipeline automation and scheduling
Trained models using BoW, TF-IDF, Word2Vec, and BERT