Skip to content

Model Architecture

Implementation of the MiniUNet segmentation model.

src.segmentation.model

MiniUNet()

Bases: Module

Lightweight U-Net architecture for semantic segmentation.

A simplified version of the U-Net architecture designed for food segmentation with 104 output classes. Features encoder-decoder structure with skip connections and proper weight initialization.

Architecture
  • Encoder: 3 conv blocks with max pooling (3→64→128→256 channels)
  • Bottleneck: 1 conv block (256→512 channels)
  • Decoder: 3 conv blocks with transpose convolutions (512→256→128→64 channels)
  • Output: 1x1 conv to 104 classes

Attributes:

Name Type Description
encoder1, (encoder2, encoder3)

Encoder convolutional blocks

bottleneck

Bottleneck convolutional block

decoder1, (decoder2, decoder3)

Decoder convolutional blocks

pool

Max pooling layer for downsampling

upconv1, (upconv2, upconv3)

Transpose convolutions for upsampling

final

Final 1x1 convolution to output classes

Example

model = MiniUNet() x = torch.rand(1, 3, 224, 224) # Batch of 1, RGB image output = model(x) print(output.shape) # torch.Size([1, 104, 224, 224])

Initialize the MiniUNet model.

Sets up the encoder-decoder architecture with skip connections, pooling/upsampling layers, and applies He weight initialization.

Source code in src/segmentation/model.py
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def __init__(self):
    """
    Initialize the MiniUNet model.

    Sets up the encoder-decoder architecture with skip connections,
    pooling/upsampling layers, and applies He weight initialization.
    """
    super(MiniUNet, self).__init__()

    # Encoder
    self.encoder1 = self.conv_block(3, 64)
    self.encoder2 = self.conv_block(64, 128)
    self.encoder3 = self.conv_block(128, 256)

    # Bottleneck
    self.bottleneck = self.conv_block(256, 512)

    # Decoder
    self.decoder1 = self.conv_block(512, 256)
    self.decoder2 = self.conv_block(256, 128)
    self.decoder3 = self.conv_block(128, 64)

    # Pooling and upsampling
    self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

    # Upsampling
    self.upconv1 = nn.ConvTranspose2d(512, 256, kernel_size=2, stride=2)
    self.upconv2 = nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2)
    self.upconv3 = nn.ConvTranspose2d(128, 64, kernel_size=2, stride=2)

    # Final layer
    self.final = nn.Sequential(
        nn.Conv2d(64, 104, kernel_size=1),
    )

    # ✅ Initialize weights
    self._initialize_weights()

conv_block(in_channels, out_channels)

Create a convolutional block with two conv layers and ReLU activations.

Each block consists of two 3x3 convolutions with padding, followed by ReLU activations. This design increases the receptive field while maintaining spatial dimensions.

Parameters:

Name Type Description Default
in_channels int

Number of input channels

required
out_channels int

Number of output channels

required

Returns:

Type Description

nn.Sequential: Sequential container with conv layers and ReLU activations

Note

Using two conv layers increases the receptive field and adds non-linearity without increasing parameters significantly.

Source code in src/segmentation/model.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def conv_block(self, in_channels, out_channels):
    """
    Create a convolutional block with two conv layers and ReLU activations.

    Each block consists of two 3x3 convolutions with padding, followed by
    ReLU activations. This design increases the receptive field while
    maintaining spatial dimensions.

    Args:
        in_channels (int): Number of input channels
        out_channels (int): Number of output channels

    Returns:
        nn.Sequential: Sequential container with conv layers and ReLU activations

    Note:
        Using two conv layers increases the receptive field and adds
        non-linearity without increasing parameters significantly.
    """
    return nn.Sequential(
        nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
        nn.ReLU(inplace=True),
    )

forward(x)

Forward pass through the MiniUNet model.

Implements the U-Net architecture with encoder-decoder structure and skip connections. The encoder progressively reduces spatial dimensions while increasing channel depth. The decoder upsamples and combines features using skip connections.

Parameters:

Name Type Description Default
x Tensor

Input tensor with shape (B, 3, H, W) where: - B: batch size - 3: RGB channels - H, W: height and width

required

Returns:

Type Description

torch.Tensor: Segmentation logits with shape (B, 104, H, W) where: - B: batch size - 104: number of food classes - H, W: same as input dimensions

Architecture Flow
  1. Encoder: x → enc1 → enc2 → enc3
  2. Bottleneck: enc3 → bottleneck
  3. Decoder: bottleneck + enc3 → dec1 + enc2 → dec2 + enc1 → dec3
  4. Output: dec3 → final (104 classes)
Source code in src/segmentation/model.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def forward(self, x):
    """
    Forward pass through the MiniUNet model.

    Implements the U-Net architecture with encoder-decoder structure
    and skip connections. The encoder progressively reduces spatial
    dimensions while increasing channel depth. The decoder upsamples
    and combines features using skip connections.

    Args:
        x (torch.Tensor): Input tensor with shape (B, 3, H, W) where:
            - B: batch size
            - 3: RGB channels
            - H, W: height and width

    Returns:
        torch.Tensor: Segmentation logits with shape (B, 104, H, W) where:
            - B: batch size
            - 104: number of food classes
            - H, W: same as input dimensions

    Architecture Flow:
        1. Encoder: x → enc1 → enc2 → enc3
        2. Bottleneck: enc3 → bottleneck
        3. Decoder: bottleneck + enc3 → dec1 + enc2 → dec2 + enc1 → dec3
        4. Output: dec3 → final (104 classes)
    """
    # Encoder
    enc1 = self.encoder1(x)
    enc2 = self.encoder2(self.pool(enc1))
    enc3 = self.encoder3(self.pool(enc2))

    # Bottleneck
    bottleneck = self.bottleneck(self.pool(enc3))

    # Decoder with skip connections
    dec1 = self.decoder1(torch.cat([self.upconv1(bottleneck), enc3], dim=1))
    dec2 = self.decoder2(torch.cat([self.upconv2(dec1), enc2], dim=1))
    dec3 = self.decoder3(torch.cat([self.upconv3(dec2), enc1], dim=1))

    return self.final(dec3)