Model Architecture
Implementation of the MiniUNet segmentation model.
src.segmentation.model
MiniUNet()
Bases: Module
Lightweight U-Net architecture for semantic segmentation.
A simplified version of the U-Net architecture designed for food segmentation with 104 output classes. Features encoder-decoder structure with skip connections and proper weight initialization.
Architecture
- Encoder: 3 conv blocks with max pooling (3→64→128→256 channels)
- Bottleneck: 1 conv block (256→512 channels)
- Decoder: 3 conv blocks with transpose convolutions (512→256→128→64 channels)
- Output: 1x1 conv to 104 classes
Attributes:
Name | Type | Description |
---|---|---|
encoder1, |
(encoder2, encoder3)
|
Encoder convolutional blocks |
bottleneck |
Bottleneck convolutional block |
|
decoder1, |
(decoder2, decoder3)
|
Decoder convolutional blocks |
pool |
Max pooling layer for downsampling |
|
upconv1, |
(upconv2, upconv3)
|
Transpose convolutions for upsampling |
final |
Final 1x1 convolution to output classes |
Example
model = MiniUNet() x = torch.rand(1, 3, 224, 224) # Batch of 1, RGB image output = model(x) print(output.shape) # torch.Size([1, 104, 224, 224])
Initialize the MiniUNet model.
Sets up the encoder-decoder architecture with skip connections, pooling/upsampling layers, and applies He weight initialization.
Source code in src/segmentation/model.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
conv_block(in_channels, out_channels)
Create a convolutional block with two conv layers and ReLU activations.
Each block consists of two 3x3 convolutions with padding, followed by ReLU activations. This design increases the receptive field while maintaining spatial dimensions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels
|
int
|
Number of input channels |
required |
out_channels
|
int
|
Number of output channels |
required |
Returns:
Type | Description |
---|---|
nn.Sequential: Sequential container with conv layers and ReLU activations |
Note
Using two conv layers increases the receptive field and adds non-linearity without increasing parameters significantly.
Source code in src/segmentation/model.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
forward(x)
Forward pass through the MiniUNet model.
Implements the U-Net architecture with encoder-decoder structure and skip connections. The encoder progressively reduces spatial dimensions while increasing channel depth. The decoder upsamples and combines features using skip connections.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
Input tensor with shape (B, 3, H, W) where: - B: batch size - 3: RGB channels - H, W: height and width |
required |
Returns:
Type | Description |
---|---|
torch.Tensor: Segmentation logits with shape (B, 104, H, W) where: - B: batch size - 104: number of food classes - H, W: same as input dimensions |
Architecture Flow
- Encoder: x → enc1 → enc2 → enc3
- Bottleneck: enc3 → bottleneck
- Decoder: bottleneck + enc3 → dec1 + enc2 → dec2 + enc1 → dec3
- Output: dec3 → final (104 classes)
Source code in src/segmentation/model.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|