ConvNeXt

class cv.backbones.ConvNeXt.model.ConvNeXt(**kwargs)[source]

Bases: Module

ConvNeXt model architecture, adapted from the paper.

This class implements the ConvNeXt architecture, which is a modernized version of the traditional convolutional network (ConvNet). The model consists of four main stages (or groups) of ConvNeXt blocks, with a stem at the beginning for initial feature extraction and a classifier at the end for final prediction.

Each group consists of multiple ConvNeXt blocks, with optional downsampling at the start of some groups to progressively reduce the spatial dimensions and increase the feature channels. The model can dynamically adjust its configuration through the ConvNeXtParams class, which stores hyperparameters such as the number of blocks per stage, expansion rate, and stochastic depth probability.

The model also applies techniques like Layer Scaling, Stochastic Depth, and ConvLayerNorm for better training stability and generalization.

Parameters:

num_classes (int) – The number of output classes for classification (default: 1000).
in_channels (int) – The number of input channels in the input image, typically 3 for RGB (default: 3).
stem_out_channels (int) – The number of output channels from the initial stem convolution (default: 96).
stem_kernel_size (int) – The kernel size for the stem convolution (default: 4).
stem_kernel_stride (int) – The stride for the stem convolution (default: 4).
num_blocks (list[int]) – The number of blocks in each ConvNeXt stage (default: [3, 3, 9, 3]).
expansion_rate (int) – The expansion rate for the number of channels in the block (default: 4).
depthwise_conv_kernel_size (int) – The kernel size for the depthwise convolution (default: 7).
layer_scale (float) – The initial value for LayerScale (default: 1e-6).
stochastic_depth_mp (float) – The maximum probability for stochastic depth dropout (default: 0.1).

Example

>>> model = ConvNeXt(**kwargs)

forward(x)[source]

Defines the forward pass of the ConvNeXt model.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, height, width).
Returns:: Output logits of shape (batch_size, num_classes).
Return type:: torch.Tensor

Example

>>> output = model(torch.randn(1, 3, 224, 224))  # Example input tensor of shape (batch_size, channels, height, width)