Page 68 - eBook_Proceedings of the International Conference on Digital Manufacturing V2
P. 68

Proceedings of the International Conference on Digital Manufacturing –
                                         Volume 2

                     Z l +1  = MLP (LN (zˆ  l + 1 ) +            (6)
                                           � l + 1


                  In both W-MSA and SW-MSA, the attention mechanism is
               defined as in equation 7:
                          l        l       l
                     Q = z WQ, K = z WK, V = z WV                (7)
                                                 T
                                   l
                        Attention (z) = SoftMax (QK  / √            + B) V

                  Here, WQ, WK, and WV  ∈ R D×d   are the  learnable projection
               matrices  for  query,  key,  and  value  B  ∈ R L×L  is the  relative
               positional bias that helps encode spatial structure.



















                Figure 23: Schematic of a Swin transformer block, (b) Architecture
                               of a standard transformer block

               Hyperparameters

               The Mask2Former-based semantic segmentation  model was
               implemented using PyTorch and trained on a Linux-based system
               equipped with a 12th Gen  Intel® Core™  i7-12700 CPU and
               NVIDIA GeForce RTX 4080 GPU (16 GB GDDR6X). A batch
               size of five was used for training. The model was trained for 20
               epochs using AdamW optimiser with a learning rate of 1e-4. To
               handle the class imbalance, present in the cervical cell dataset, a
               class-weighted cross-entropy  loss function  was employed. The
               background class was assigned a weight of 0, while the least
               represented class (Superficial-Intermediate) was assigned a lower



                                              52
   63   64   65   66   67   68   69   70   71   72   73