Back to Internships
📊

Innomatics Research Labs

Data Science Intern

January 2024 - April 2024
Hyderabad, India

Product Review Sentiment Analysis System

Engineered a comprehensive system to analyze and classify over 8,500 product reviews, leveraging Prefect for ETL pipeline automation and scheduling. Trained multiple sentiment analysis models achieving F1-Score of 0.92.

PythonPrefectBERTTF-IDFWord2VecMLflowAWS

📋 Project Overview

Engineered a comprehensive system to analyze and classify over 8,500 product reviews, leveraging Prefect for ETL pipeline automation and scheduling. Trained multiple sentiment analysis models achieving F1-Score of 0.92.

✨ Key Highlights

Analyzed and classified 8,500+ product reviews

Implemented Prefect for ETL pipeline automation and scheduling

Trained models using BoW, TF-IDF, Word2Vec, and BERT

Achieved F1-Score of 0.92 with BERT model

Utilized MLflow for model management and experiment tracking

Deployed sentiment analysis web application on AWS

Enabled real-time customer feedback insights

🛠️ Technology Stack

PythonPrefectBERTTF-IDFWord2VecMLflowAWS

🏗️ System Architecture

System Components

Data Ingestion

Fetches product reviews from multiple sources

PythonPrefectAPIs

ETL Pipeline

Automated data transformation and loading

PrefectPythonPandas

Feature Engineering

Creates features using BoW, TF-IDF, Word2Vec

Scikit-learnNLTK

Model Training

Trains multiple sentiment models

BERTPython

MLflow Tracker

Tracks experiments and model versions

MLflowPython

Inference API

Real-time sentiment prediction

FlaskREST API

AWS Deployment

Cloud hosting and scaling using ECS cluster

AWS ECS

Data Flow

Data IngestionETL Pipeline

Raw review data

8,500+ reviews
ETL PipelineFeature Engineering

Cleaned and preprocessed text

Processed reviews
Feature EngineeringModel Training

Feature vectors

BoW, TF-IDF, Word2Vec, BERT embeddings
Model TrainingMLflow Tracker

Model metrics and artifacts

F1-Score: 0.92
Model TrainingInference API

Trained model

Best BERT model
Inference APIAWS Deployment

Deploy model to ECS cluster

Production model

Architecture Flow

Data Ingestion

ETL Pipeline

Feature Engineering

Model Training

MLflow Tracker

Inference API

AWS Deployment

🔄 End-to-End Flow

Step 1:Data IngestionETL Pipeline

Raw review data

8,500+ reviews
Step 2:ETL PipelineFeature Engineering

Cleaned and preprocessed text

Processed reviews
Step 3:Feature EngineeringModel Training

Feature vectors

BoW, TF-IDF, Word2Vec, BERT embeddings
Step 4:Model TrainingMLflow Tracker

Model metrics and artifacts

F1-Score: 0.92
Step 5:Model TrainingInference API

Trained model

Best BERT model
Step 6:Inference APIAWS Deployment

Deploy model to ECS cluster

Production model

📊 Component Details

Data Ingestion

Fetches product reviews from multiple sources

PythonPrefectAPIs

ETL Pipeline

Automated data transformation and loading

PrefectPythonPandas

Feature Engineering

Creates features using BoW, TF-IDF, Word2Vec

Scikit-learnNLTK

Model Training

Trains multiple sentiment models

BERTPython

MLflow Tracker

Tracks experiments and model versions

MLflowPython

Inference API

Real-time sentiment prediction

FlaskREST API

AWS Deployment

Cloud hosting and scaling using ECS cluster

AWS ECS

🎯 Impact & Results

1

Analyzed and classified 8,500+ product reviews

2

Implemented Prefect for ETL pipeline automation and scheduling

3

Trained models using BoW, TF-IDF, Word2Vec, and BERT

Kushal Adhyaru - AI/ML Engineer & Full-Stack Builder