Pattern Recognition Project Report
Team: Deepthika Sivaram & Derek Lu
Group: Group 6 — University at Buffalo
Phase 1: Data Collection
- Classes: 4 fruits x 3 variants
- Fruits: Apple, Grapes, Peach, Raspberry
- Variants:
- Apple: Whole, Sliced-Cored, In-Context
- Grapes: In a Bag, Loose Grapes, On the Vine
- Peach: Whole, Halved or Pitted, Sliced
- Raspberry: Small Group, In a Container, Slightly Crushed
- Total Images: ~12,000 (≈3,000 per fruit)
- File Format: JPEG and PNG
- Dimensions: 1200x1600 to 3024x4032
- Method: Hand-shot smartphone images under varied lighting and conditions
Phase 2: Computer Vision
Objective: To quantitatively assess the quality, balance, and separability of our collected fruit images by training two transfer-learning based CNN classifiers.
Produce Classifier
- Fine-tune a pre-trained ResNet-50 on four fruit classes: Apple, Grapes, Peach, Raspberry.
- Evaluate using overall accuracy, per-class precision/recall/F₁, and confusion matrix.
Variation Classifier
- Fine-tune a pre-trained EfficientNet-B0 for each fruit’s three photographic variants (Whole, Sliced/Cored, In-Context).
- Evaluate using the same metrics to identify under-performing variants.
Methodology
- Transfer Learning: ResNet-50 for produce, EfficientNet-B0 for variation.
- Configuration: Set
NUM_CLASSES
, BATCH_SIZE
, FRUIT
, and MODE
.
- Dataloaders:
ImageFolder
with 80/20 random_split
.
- Baseline Training: 10 epochs @ LR=1e-4.
- Fine-Tuning: 5 epochs @ LR=1e-5 with augmentations and early stopping at epoch 3.
Dataset Directory Structure
Removed batch subfolders. Dataset organized flat: 4 produce folders, each containing 3 variant subfolders (≈12 000 images total).
Pre-training Analysis
ResNet-50 (Produce)
- Apple: Precision=0.94, Recall=0.95, F₁=0.94
- Grapes: 0.96 / 0.93 / 0.94
- Peach: 0.99 / 0.99 / 0.99
- Raspberry: 0.98 / 1.00 / 0.99
- Strengths: Peach & Raspberry near-perfect; overall accuracy 97%.
- Weakness: Apple <->Grapes confusion (25 A→G, 36 G→A) and minor cross-class errors.
EfficientNet-B0 (Variation)
Apple Variants
- Whole: 0.98 / 0.98 / 0.98
- Sliced-Cored: 1.00 / 0.99 / 0.99
- In-Context: 0.97 / 0.98 / 0.98
- Errors: 1 In-Context→Sliced, 5 In-Context→Whole, 3 Sliced→(In-Context/Whole), 11 Whole→In-Context.
Grapes Variants
- In a Bag: 0.97 / 0.98 / 0.98
- Loose Grapes: 1.00 / 0.99 / 0.99
- On the Vine: 0.98 / 0.98 / 0.98
- Errors: mix-ups between Bag, Loose, and Vine (total ≈8 misclassifications).
Peach Variants
- Halved/Pitted: 1.00 / 1.00 / 1.00
- Sliced: 0.99 / 1.00 / 1.00
- Whole: 0.99 / 0.98 / 0.99
- Errors: 2 Sliced→Halved; others perfect.
Raspberry Variants
- In a Container: 1.00 / 1.00 / 1.00
- Slightly Crushed: 1.00 / 1.00 / 1.00
- Small Group: 1.00 / 1.00 / 1.00
- No errors—perfect separation.
Fine Tuning 1
- Augmentations: RandomResizedCrop, ColorJitter, Erasing.
- Run: 5 epochs @ LR=1e-5, early stopping @ epoch 3.
Fine Tuning 1 Analysis
ResNet-50 (Produce)
- Apple: 0.96 / 0.95 / 0.95
- Grapes: 0.94 / 0.96 / 0.95
- Peach & Raspberry: 1.00 / 1.00 / 1.00
- Confusion: 14 A→G, 21 G→A; no off-diagonal for Peach/Raspberry.
- Weakness: Apple/Grapes still share color/shape cues in “Whole” form.
EfficientNet-B0 (Variation)
Apple
- Whole: 0.98 / 0.98 / 0.98
- In-Context: 0.97 / 0.98 / 0.98
- Sliced-Cored: 1.00 / 0.99 / 0.99
- Errors: 3 In-Context→Whole, 4 Whole→In-Context, 1 Sliced→In-Context, 2 Sliced→Whole.
Grapes
- On the Vine: 0.98 / 0.98 / 0.98
- In a Bag: 0.97 / 0.98 / 0.98
- Loose Grapes: 1.00 / 0.99 / 0.99
- Errors: 1 Vine→Bag, 2 Vine→Loose.
Peach & Raspberry
- Peach Whole: 0.99 / 0.98 / 0.99; 2 Whole→Sliced errors.
- Raspberry: perfect 1.00/1.00/1.00 separation.
Phase 3: Semantic Recipe Retrieval (NLP)
Objective: The goal of the NLP phase was to design a BERT-based semantic search engine capable of recommending recipes based on ingredient and descriptor tags. The system needed to handle nuanced, multi-tag user queries and return ranked recipe suggestions from the RAW_recipes.csv and RAW_Interactions.csv dataset.
Methodology
- Data Preprocessing
- Datasets Used:
- RAW_recipes.csv: Provided recipe information including name, ingredients, steps, and tags.
- RAW_interactions.csv: Contained user ratings which helped gauge recipe popularity or quality.
- Cleaning Steps:
- Removed duplicates and recipes with missing essential fields (like ingredients or name).
- Combined the ingredients, tags, and name fields into a unified text input for embedding.
- Lowercased, tokenized, and filtered special characters and redundant whitespace.
- Model Architecture
- Base Model: Fine-tuned sentence-transformers/all-MiniLM-L6-v2, a lightweight yet powerful model optimized for semantic search tasks.
- Training Objective: Constructed training pairs from user-tagged recipe preferences and designed the model to minimize the distance between relevant recipe vectors and query vectors in embedding space.
- Embedding Strategy: Used the transformer to embed both recipes and user queries (composed of 5–20 tags). Applied cosine similarity to compute closeness between queries and recipe embeddings.
Evaluation
Test Query Set Design
To assess the model's robustness, we constructed a test set containing:
- Ingredient-based queries (e.g., "chicken, rice")
- Conceptual and subjective terms (e.g., "healthy dinner", "quick snacks", "seasonal recipes")
- Combined queries (e.g., "vegan soup quick", "low-carb dinner healthy")
These were selected to probe both surface-level retrieval and semantic understanding.
Recommendation Output Review
1. Simple Ingredient Queries
- Query: "chicken, rice"
Recommendation: Chicken and Rice Casserole, Avg. Rating: 4.8
Relevance: High
Insightfulness: Includes variations and prep guidance
- Query: "tomato, pasta"
Recommendation: Tomato Basil Pasta
Relevance: Strong match
Creativity: Basic recommendation
2. Abstract/Conceptual Queries
- Query: "healthy dinner"
Recommendation: Grilled Chicken with Steamed Vegetables
Relevance: Matches low-fat pattern
Transparency: No nutrition data cited
- Query: "quick lunch"
Recommendation: Tuna Salad Sandwich
Time: Under 15 minutes
Appropriateness: Good match
- Query: "seasonal winter soup"
Recommendation: Butternut Squash Soup
Semantic Match: Correct seasonal dish
Context Sensitivity: Season logic not applied
Success and Failure Patterns
Query Type | Success Case | Failure Case | Notes |
Ingredient Matching | "chicken rice" → Chicken & Rice Casserole | "peach thyme" → Dessert without thyme | Strong for simple matches |
Healthy Concepts | "low calorie lunch" → Grilled Fish Salad | "heart healthy" → Pasta Alfredo | Subjective term inference limited |
Time Constraints | "quick snacks" → Peanut Butter Banana Bites | "10 minute meal" → Took 30+ mins | No prep time filter |
Seasonal Reasoning | "fall desserts" → Pumpkin Pie | "spring stew" → Winter recipes | Lacks season metadata |
Limitations
- No use of structured metadata (e.g., cook time, nutrition)
- Abstract concepts are inferred, not validated
- Failures due to ambiguous/compound terms
Conclusion
The model performs well for ingredient queries and somewhat for conceptual inputs. Future improvements should include structured metadata and post-filtering based on user goals to boost relevance and trust.
Phase 4: Web Application Deployment
Objective: The goal of this phase was to create a polished, static web application that integrates both our Computer Vision (CV) and Natural Language Processing (NLP) models into a seamless, interactive platform. This web app serves as the unified demonstration and delivery mechanism for our project, hosted publicly on Render.com.
Live Deployment: https://patternrec-project-group6.onrender.com/
GitHub Repository: PatternRec_Project_Group6
Team Responsibilities
Role |
Member |
Responsibilities |
CV Engineer |
Derek Lu |
Developed the front-end interface for image upload and fruit/variant classification |
NLP Engineer |
Deepthika Sivaram |
Built the interface for tag-based recipe recommendations using a BERT model |
Joint Tasks |
Both |
Co-designed layout, styling, and integrated the written scientific report |
System Overview
- Frontend: Hybrid HTML interface combining static JS/CV output and Flask-rendered NLP forms
- CV Models: Exported to ONNX and executed in-browser using ONNX.js and JavaScript
- NLP Models: Deployed on Flask server and served via HTML templating
- Workflow:
- User uploads a photo → ONNX.js classifies fruit type and variant in browser
- User enters tags → Flask application processes with BERT and returns top-ranked recipes
Architecture Diagram
[User] → [Upload Image / Input Tags]
↘ ↙
[CV Frontend] [NLP Frontend]
↓ ↓
[CV ONNX Model] [NLP BERT Model]
↓ ↓
[Fruit/Variant] [Top-K Recipes]
↘ ↙
[Unified Display & Report Integration]
Highlights
- Seamless integration of CV and NLP in a single-page React app
- Real-time classification and recipe recommendation experience
- Static deployment with ONNX.js for efficient client-side inference
- Scientific reporting embedded alongside interactive demos
Limitations & Future Work
- Client-side ONNX may have latency on large models—future work may include backend inference support
- Currently uses pre-exported embeddings and sample inference—future versions may support dynamic user input processing
Conclusion
This deployment phase successfully unified the two core machine learning components of our project into a single, user-accessible interface. The web application not only demonstrates technical proficiency in React and ONNX deployment but also enhances transparency and reproducibility through embedded scientific reporting. The Render-hosted solution provides an easily shareable and scalable showcase of our work.
Conclusion
This project successfully integrated Computer Vision and NLP systems into a unified, end-user application. High model performance and seamless web deployment reflect the robustness of both dataset design and implementation strategy.