ML Notes Neural NetworkDeep Learning

Pooling, Batch Norm, and Layer Norm

Pooling plus batch and layer normalisation fundamentals.

Contents

Syllabus Map


Overview


Pooling

Core idea

Common types

How it works (2D)

H_{out} = \left\lfloour \frac{H + 2P - K}{S} \right\rfloour + 1,\quad W_{out} = \left\lfloour \frac{W + 2P - K}{S} \right\rfloour + 1

Gradient flow

Design knobs

Practical Notes

Use pooling cautiously for localization-heavy tasks

Consider strided convolutions as learnable alternatives

Prefer global average pooling before classifiers

Avoid aggressive early downsampling for small objects

PyTorch examples

import torch.nn as nn

max_pool = nn.MaxPool2d(kernel_size=2, stride=2)
avg_pool = nn.AvgPool2d(kernel_size=2, stride=2)
global_avg = nn.AdaptiveAvgPool2d((1, 1))

Batch Normalisation

Core idea

How it works

Practical Notes

Handle train/eval mode correctly

Mitigate small-batch instability

Use standard layer ordering

Watch BN momentum and inference behavior

PyTorch examples

import torch.nn as nn

bn1 = nn.BatchNorm1d(num_features=128)
bn2 = nn.BatchNorm2d(num_features=64)
bn3 = nn.BatchNorm3d(num_features=32)

# Typical conv block
block = nn.Sequential(
    nn.Conv2d(64, 64, kernel_size=3, padding=1, bias=False),
    nn.BatchNorm2d(64),
    nn.ReLU(inplace=True)
)

Layer Normalisation

Core idea

How it works

Practical Notes

Prefer for small-batch or sequence-heavy workloads

Leverage consistent train/eval behavior

Use proven placement patterns

PyTorch examples

import torch.nn as nn

ln = nn.LayerNorm(normalized_shape=512)

block = nn.Sequential(
    nn.Linear(512, 512, bias=False),
    nn.LayerNorm(512),
    nn.ReLU(inplace=True)
)

Layer Norm vs Batch Norm

Key differences

Rule of thumb

← Back to Blog