Pattern Recognition Project Report

Team: Deepthika Sivaram & Derek Lu

Group: Group 6 — University at Buffalo

Phase 1: Data Collection

Classes: 4 fruits x 3 variants
Fruits: Apple, Grapes, Peach, Raspberry
Variants:
- Apple: Whole, Sliced-Cored, In-Context
- Grapes: In a Bag, Loose Grapes, On the Vine
- Peach: Whole, Halved or Pitted, Sliced
- Raspberry: Small Group, In a Container, Slightly Crushed
Total Images: ~12,000 (≈3,000 per fruit)
File Format: JPEG and PNG
Dimensions: 1200x1600 to 3024x4032
Method: Hand-shot smartphone images under varied lighting and conditions

Phase 2: Computer Vision

Objective: To quantitatively assess the quality, balance, and separability of our collected fruit images by training two transfer-learning based CNN classifiers.

Produce Classifier

Fine-tune a pre-trained ResNet-50 on four fruit classes: Apple, Grapes, Peach, Raspberry.
Evaluate using overall accuracy, per-class precision/recall/F₁, and confusion matrix.

Variation Classifier

Fine-tune a pre-trained EfficientNet-B0 for each fruit’s three photographic variants (Whole, Sliced/Cored, In-Context).
Evaluate using the same metrics to identify under-performing variants.

Methodology

Transfer Learning: ResNet-50 for produce, EfficientNet-B0 for variation.
Configuration: Set NUM_CLASSES, BATCH_SIZE, FRUIT, and MODE.
Dataloaders: ImageFolder with 80/20 random_split.
Baseline Training: 10 epochs @ LR=1e-4.
Fine-Tuning: 5 epochs @ LR=1e-5 with augmentations and early stopping at epoch 3.

Dataset Directory Structure

Removed batch subfolders. Dataset organized flat: 4 produce folders, each containing 3 variant subfolders (≈12 000 images total).

Pre-training Analysis

ResNet-50 (Produce)

Apple: Precision=0.94, Recall=0.95, F₁=0.94
Grapes: 0.96 / 0.93 / 0.94
Peach: 0.99 / 0.99 / 0.99
Raspberry: 0.98 / 1.00 / 0.99
Strengths: Peach & Raspberry near-perfect; overall accuracy 97%.
Weakness: Apple <->Grapes confusion (25 A→G, 36 G→A) and minor cross-class errors.

EfficientNet-B0 (Variation)

Apple Variants

Whole: 0.98 / 0.98 / 0.98
Sliced-Cored: 1.00 / 0.99 / 0.99
In-Context: 0.97 / 0.98 / 0.98
Errors: 1 In-Context→Sliced, 5 In-Context→Whole, 3 Sliced→(In-Context/Whole), 11 Whole→In-Context.

Grapes Variants

In a Bag: 0.97 / 0.98 / 0.98
Loose Grapes: 1.00 / 0.99 / 0.99
On the Vine: 0.98 / 0.98 / 0.98
Errors: mix-ups between Bag, Loose, and Vine (total ≈8 misclassifications).

Peach Variants

Halved/Pitted: 1.00 / 1.00 / 1.00
Sliced: 0.99 / 1.00 / 1.00
Whole: 0.99 / 0.98 / 0.99
Errors: 2 Sliced→Halved; others perfect.

Raspberry Variants

In a Container: 1.00 / 1.00 / 1.00
Slightly Crushed: 1.00 / 1.00 / 1.00
Small Group: 1.00 / 1.00 / 1.00
No errors—perfect separation.

Fine Tuning 1

Augmentations: RandomResizedCrop, ColorJitter, Erasing.
Run: 5 epochs @ LR=1e-5, early stopping @ epoch 3.

Fine Tuning 1 Analysis

ResNet-50 (Produce)

Apple: 0.96 / 0.95 / 0.95
Grapes: 0.94 / 0.96 / 0.95
Peach & Raspberry: 1.00 / 1.00 / 1.00
Confusion: 14 A→G, 21 G→A; no off-diagonal for Peach/Raspberry.
Weakness: Apple/Grapes still share color/shape cues in “Whole” form.

EfficientNet-B0 (Variation)

Apple

Whole: 0.98 / 0.98 / 0.98
In-Context: 0.97 / 0.98 / 0.98
Sliced-Cored: 1.00 / 0.99 / 0.99
Errors: 3 In-Context→Whole, 4 Whole→In-Context, 1 Sliced→In-Context, 2 Sliced→Whole.

Grapes

On the Vine: 0.98 / 0.98 / 0.98
In a Bag: 0.97 / 0.98 / 0.98
Loose Grapes: 1.00 / 0.99 / 0.99
Errors: 1 Vine→Bag, 2 Vine→Loose.

Peach & Raspberry

Peach Whole: 0.99 / 0.98 / 0.99; 2 Whole→Sliced errors.
Raspberry: perfect 1.00/1.00/1.00 separation.

Phase 3: Semantic Recipe Retrieval (NLP)

Objective: The goal of the NLP phase was to design a BERT-based semantic search engine capable of recommending recipes based on ingredient and descriptor tags. The system needed to handle nuanced, multi-tag user queries and return ranked recipe suggestions from the RAW_recipes.csv and RAW_Interactions.csv dataset.

Methodology

Data Preprocessing
- Datasets Used:
  - RAW_recipes.csv: Provided recipe information including name, ingredients, steps, and tags.
  - RAW_interactions.csv: Contained user ratings which helped gauge recipe popularity or quality.
- Cleaning Steps:
  - Removed duplicates and recipes with missing essential fields (like ingredients or name).
  - Combined the ingredients, tags, and name fields into a unified text input for embedding.
  - Lowercased, tokenized, and filtered special characters and redundant whitespace.
Model Architecture
- Base Model: Fine-tuned sentence-transformers/all-MiniLM-L6-v2, a lightweight yet powerful model optimized for semantic search tasks.
- Training Objective: Constructed training pairs from user-tagged recipe preferences and designed the model to minimize the distance between relevant recipe vectors and query vectors in embedding space.
- Embedding Strategy: Used the transformer to embed both recipes and user queries (composed of 5–20 tags). Applied cosine similarity to compute closeness between queries and recipe embeddings.

Evaluation

Test Query Set Design

To assess the model's robustness, we constructed a test set containing:

Ingredient-based queries (e.g., "chicken, rice")
Conceptual and subjective terms (e.g., "healthy dinner", "quick snacks", "seasonal recipes")
Combined queries (e.g., "vegan soup quick", "low-carb dinner healthy")

These were selected to probe both surface-level retrieval and semantic understanding.

Recommendation Output Review

1. Simple Ingredient Queries

Query: "chicken, rice"
Recommendation: Chicken and Rice Casserole, Avg. Rating: 4.8
Relevance: High
Insightfulness: Includes variations and prep guidance
Query: "tomato, pasta"
Recommendation: Tomato Basil Pasta
Relevance: Strong match
Creativity: Basic recommendation

2. Abstract/Conceptual Queries

Query: "healthy dinner"
Recommendation: Grilled Chicken with Steamed Vegetables
Relevance: Matches low-fat pattern
Transparency: No nutrition data cited
Query: "quick lunch"
Recommendation: Tuna Salad Sandwich
Time: Under 15 minutes
Appropriateness: Good match
Query: "seasonal winter soup"
Recommendation: Butternut Squash Soup
Semantic Match: Correct seasonal dish
Context Sensitivity: Season logic not applied

Success and Failure Patterns

Query Type	Success Case	Failure Case	Notes
Ingredient Matching	"chicken rice" → Chicken & Rice Casserole	"peach thyme" → Dessert without thyme	Strong for simple matches
Healthy Concepts	"low calorie lunch" → Grilled Fish Salad	"heart healthy" → Pasta Alfredo	Subjective term inference limited
Time Constraints	"quick snacks" → Peanut Butter Banana Bites	"10 minute meal" → Took 30+ mins	No prep time filter
Seasonal Reasoning	"fall desserts" → Pumpkin Pie	"spring stew" → Winter recipes	Lacks season metadata

Limitations

No use of structured metadata (e.g., cook time, nutrition)
Abstract concepts are inferred, not validated
Failures due to ambiguous/compound terms

Conclusion

The model performs well for ingredient queries and somewhat for conceptual inputs. Future improvements should include structured metadata and post-filtering based on user goals to boost relevance and trust.

Phase 4: Web Application Deployment

Objective: The goal of this phase was to create a polished, static web application that integrates both our Computer Vision (CV) and Natural Language Processing (NLP) models into a seamless, interactive platform. This web app serves as the unified demonstration and delivery mechanism for our project, hosted publicly on Render.com.

Live Deployment: https://patternrec-project-group6.onrender.com/

GitHub Repository: PatternRec_Project_Group6

Team Responsibilities

Role	Member	Responsibilities
CV Engineer	Derek Lu	Developed the front-end interface for image upload and fruit/variant classification
NLP Engineer	Deepthika Sivaram	Built the interface for tag-based recipe recommendations using a BERT model
Joint Tasks	Both	Co-designed layout, styling, and integrated the written scientific report

System Overview

Frontend: Hybrid HTML interface combining static JS/CV output and Flask-rendered NLP forms
CV Models: Exported to ONNX and executed in-browser using ONNX.js and JavaScript
NLP Models: Deployed on Flask server and served via HTML templating
Workflow:
- User uploads a photo → ONNX.js classifies fruit type and variant in browser
- User enters tags → Flask application processes with BERT and returns top-ranked recipes

Architecture Diagram

[User] → [Upload Image / Input Tags]
      ↘                           ↙
   [CV Frontend]          [NLP Frontend]
      ↓                           ↓
[CV ONNX Model]       [NLP BERT Model]
      ↓                           ↓
  [Fruit/Variant]        [Top-K Recipes]
          ↘               ↙
     [Unified Display & Report Integration]

Highlights

Seamless integration of CV and NLP in a single-page React app
Real-time classification and recipe recommendation experience
Static deployment with ONNX.js for efficient client-side inference
Scientific reporting embedded alongside interactive demos

Limitations & Future Work

Client-side ONNX may have latency on large models—future work may include backend inference support
Currently uses pre-exported embeddings and sample inference—future versions may support dynamic user input processing

Conclusion

This deployment phase successfully unified the two core machine learning components of our project into a single, user-accessible interface. The web application not only demonstrates technical proficiency in React and ONNX deployment but also enhances transparency and reproducibility through embedded scientific reporting. The Render-hosted solution provides an easily shareable and scalable showcase of our work.

Conclusion

This project successfully integrated Computer Vision and NLP systems into a unified, end-user application. High model performance and seamless web deployment reflect the robustness of both dataset design and implementation strategy.