Page 68 - eBook_Proceedings of the International Conference on Digital Manufacturing V2
P. 68
Proceedings of the International Conference on Digital Manufacturing –
Volume 2
Z l +1 = MLP (LN (zˆ l + 1 ) + (6)
� l + 1
In both W-MSA and SW-MSA, the attention mechanism is
defined as in equation 7:
l l l
Q = z WQ, K = z WK, V = z WV (7)
T
l
Attention (z) = SoftMax (QK / √ + B) V
Here, WQ, WK, and WV ∈ R D×d are the learnable projection
matrices for query, key, and value B ∈ R L×L is the relative
positional bias that helps encode spatial structure.
Figure 23: Schematic of a Swin transformer block, (b) Architecture
of a standard transformer block
Hyperparameters
The Mask2Former-based semantic segmentation model was
implemented using PyTorch and trained on a Linux-based system
equipped with a 12th Gen Intel® Core™ i7-12700 CPU and
NVIDIA GeForce RTX 4080 GPU (16 GB GDDR6X). A batch
size of five was used for training. The model was trained for 20
epochs using AdamW optimiser with a learning rate of 1e-4. To
handle the class imbalance, present in the cervical cell dataset, a
class-weighted cross-entropy loss function was employed. The
background class was assigned a weight of 0, while the least
represented class (Superficial-Intermediate) was assigned a lower
52

