Installation Guide

Prerequisites

Before installing the Food103Seg Calories project, ensure you have the following prerequisites:

Python 3.8+ (Python 3.9 or 3.10 recommended)
CUDA-compatible GPU (recommended for training, optional for inference)
Git for cloning the repository
pip package manager

System Requirements

Component	Minimum	Recommended
Python	3.8+	3.9 or 3.10
RAM	8GB	16GB+
GPU Memory	4GB	8GB+
Storage	10GB	20GB+

Installation Steps

Setup Instructions Using uv

1. Clone the Repository

git clone https://github.com/kkkamur07/food103seg-calories
cd food103seg-calories

2. Create Virtual Environment and Install Dependencies

Using uv (recommended for fastest setup):

# Create virtual environment
uv venv

# Activate virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install production dependencies
uv pip install -r requirements.txt

# Install development dependencies (optional)
uv pip install -r requirements_dev.txt

# Install project in development mode
uv pip install -e .

Alternative one-liner approach:

# Create environment and install dependencies in one step
uv venv && source .venv/bin/activate && uv pip install -r requirements.txt

3. GPU Setup (Optional but Recommended)

For CUDA support, install PyTorch with CUDA using uv:

# For CUDA 11.8
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Verify GPU installation:

python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

About the Requirements Files

This project provides two dependency files for different use cases[1][2]:

requirements.txt - Contains production dependencies needed to run the application
requirements_dev.txt - Contains additional development dependencies for testing, linting, and development tools[3]

The uv package manager provides significant speed improvements over traditional pip installations[4][2], making it ideal for projects with multiple dependencies. You can install from either file using uv pip install -r <filename>[5].

Data Setup

1. Download Dataset

Data Storage and Versioning with DVC

This project uses DVC (Data Version Control) with Google Cloud Storage for data versioning and management. The data and models are stored in two separate GCS buckets:

Data storage: gs://dvc-storage-sensor/
Model storage: gs://food-segmentation-models/

Setting Up DVC with Google Cloud Storage

# Create data directory
mkdir -p data

# Install required tools
pip install dvc-gs

# List available GCS buckets
gsutil ls

# Add remote storage (replace <output-from-gsutils> with actual bucket path)
dvc remote add -d remote_storage gs://dvc-storage-sensor/

# Configure version-aware storage
dvc remote modify remote_storage version_aware true

# List configured remotes
dvc remote list

# Pull data from remote storage
dvc pull

DVC Management Commands

# Remove a remote if needed
dvc remote remove gcp_storage

# Set default remote
dvc remote default remote_storage

# Push data (note: --no-cache may have issues)
dvc push --no-cache

Known Issues with DVC Setup

During development, several challenges were encountered with the DVC workflow, particularly with the dvc push --no-cache command. While DVC provides excellent data versioning capabilities, the setup proved complex for this project's requirements.

Alternative: Direct Dataset Download

If you prefer to bypass the DVC setup or encounter issues, you can download the Food103 segmentation dataset directly from:

Dataset source: https://paperswithcode.com/dataset/foodseg103

# Create data directory
mkdir -p data

# Download dataset manually from Papers with Code
# Extract and place in data/ directory

Recommended Approach

For this project, you can choose either approach:

DVC approach - Use the GCS buckets with DVC for version control
Direct download - Download the Food103 dataset directly from Papers with Code

The DVC setup provides better data versioning and collaboration features, while the direct download approach is simpler for getting started quickly.

2. Expected Data Directory Structure

Ensure your data follows this structure:

data/
├── Images/
│   ├── img_dir/
│   │   ├── train/
│   │   │   ├── image1.jpg
│   │   │   ├── image2.jpg
│   │   │   └── ...
│   │   └── test/
│   │       ├── image1.jpg
│   │       ├── image2.jpg
│   │       └── ...
│   └── ann_dir/
│       ├── train/
│       │   ├── image1.png
│       │   ├── image2.png
│       │   └── ...
│       └── test/
│           ├── image1.png
│           ├── image2.png
│           └── ...

Configuration Setup

Copy the Template

# Install cookiecutter
pip install cookiecutter

# Generate project using our template
cookiecutter https://github.com/kkkamur07/cookie-cutter --directory=mlops

Find the complete template and installation guide at:

https://github.com/kkkamur07/cookie-cutter --directory=mlops

Sources

Verification

1. Test Installation

Run the following commands to verify your installation:

# Test imports
python -c "import torch; import torchvision; print('PyTorch installed successfully')"

# Test project modules
python -c "from src.segmentation.data import data_loaders; print('Project modules working')"

# Test data loading
python -c "from src.segmentation.data import data_loaders; print('Data loading test passed')"

2. Quick Training Test

Run a quick training test with minimal epochs:

python src/segmentation/main.py model.hyperparameters.epochs=1

Running the Application

1. Streamlit Web App

streamlit run src/app/frontend.py

2. API Server (FastAPI with Uvicorn)

uvicorn src.app.api:app --host 0.0.0.0 --port 8000 --reload

3. Training Pipeline

python src/segmentation/main.py

4. Custom Training

python src/segmentation/main.py model.hyperparameters.epochs=50 model.hyperparameters.lr=0.001

Access Points

Web App: http://localhost:8501
API: http://localhost:8000
API Docs: http://localhost:8000/docs

Troubleshooting

Common Issues

CUDA Out of Memory

Reduce batch size in config: model.hyperparameters.batch_size=16

Missing Dependencies

pip install --upgrade pip
pip install -r requirements.txt --force-reinstall

Data Loading Errors

Verify directory structure matches expected format
Check file permissions: chmod -R 755 data/
Ensure image and annotation files have correct extensions

Import Errors

# Reinstall in development mode
pip install -e . --force-reinstall

Getting Help

If you encounter issues:

Check the logs in saved/logs/
Verify GPU setup with nvidia-smi
Review configuration in configs/config.yaml
Check Python version compatibility

Optional Components

Docker Setup

If you prefer Docker:

# Build backend
docker build -f Dockerfile.backend -t food-seg-backend .

# Build frontend
docker build -f Dockerfile.frontend -t food-seg-frontend .

# Run with docker-compose
docker-compose up

Development Tools

Install additional development tools:

# Pre-commit hooks
pre-commit install

# Jupyter for notebooks
pip install jupyter
jupyter notebook notebooks/

Next Steps

After successful installation:

Review the configuration in configs/config.yaml
Run the training pipeline with your data
Explore the Streamlit app at the live link above
Check the documentation for advanced usage
Set up monitoring with Weights & Biases

You're now ready to start training your food segmentation model and estimating calories! 🍕📊