Bottleneck ViT
- class cv.backbones.BoT_ViT.model.BoT_ViT(mhsa_num_heads, attention_dropout=0.0, num_classes=1000, in_channels=3)[source]
Bases:
ModuleBoT-ViT (Bottleneck Transformers for Visual Recognition) model architecture from paper.
This class implements the BoT-ViT model, combining convolutional and bottleneck transformer blocks for feature extraction and classification. The model processes input images through multiple residual groups and a final classification layer.
- Parameters:
mhsa_num_heads (int) – The number of heads for the multi-head self-attention layer.
attention_dropout (float, optional) – Dropout rate for attention layers (default: 0.0).
num_classes (int, optional) – Number of output classes for classification (default: 1000).
in_channels (int, optional) – Number of input channels for the images, typically 3 for RGB (default: 3).
Example
>>> model = BoT_ViT(mhsa_num_heads=8, attention_dropout=0.1, num_classes=1000)
- initializeConv()[source]
Initialize convolutional layers with specific weight and bias initialization.
This method initializes the weights and biases of convolutional layers using a specific strategy: - Weights: Initialized with a normal distribution where the mean is 0.0 and the standard deviation is computed based on the number of input units and output channels. - Biases: Initialized to 0.0.
The standard deviation for weight initialization is calculated using the formula:
\[\text{std} = \sqrt{\frac{2}{n_{\text{in}}}}\]where:
\[n_{\text{in}} = \text{kernel_size[0]}^2 \times \text{out_channels}\]The weight initialization is performed as follows:
\[\text{weight} \sim \mathcal{N}(\text{mean}=0.0, \text{std})\]Biases are initialized with:
\[\text{bias} = 0.0\]This initialization helps in stabilizing the learning process and improving the convergence rate.
- forward(x)[source]
Defines the forward pass through the BoT-ViT model.
The input tensor passes through the initial convolutional layers, followed by multiple residual groups, and is then processed through the classifier for final classification.
- Parameters:
x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, height, width).
- Returns:
Output tensor after passing through the model, with shape (batch_size, num_classes).
- Return type:
torch.Tensor
Example
>>> output = model(torch.randn(1, 3, 224, 224)) # Example input tensor of shape (batch_size, channels, height, width)