Zubnet AIAprenderWiki › CNN
Models

CNN

Convolutional Neural Network, ConvNet
Uma arquitetura de rede neural projetada para processar dados em formato de grade (imagens, espectrogramas de áudio) deslizando pequenos filtros (kernels) através da entrada para detectar padrões locais como bordas, texturas e formas. CNNs dominaram a visão computacional de 2012 (AlexNet) até os Vision Transformers emergirem por volta de 2020. Ainda são amplamente usados em produção, especialmente em dispositivos edge.

Por que importa

CNNs iniciaram a revolução do deep learning. A vitória do AlexNet no ImageNet 2012 provou que redes neurais profundas podiam superar dramaticamente features feitas à mão, disparando o boom atual de IA. Entender CNNs te ajuda a entender por que Transformers funcionam (muitas das mesmas ideias — features hierárquicos, compartilhamento de parâmetros — se aplicam), e CNNs continuam sendo a melhor escolha para muitas tarefas de visão em dispositivos com recursos limitados.

Deep Dive

A CNN's core operation is convolution: a small filter (say 3×3 pixels) slides across the image, computing a dot product at each position to detect a specific pattern. Early layers learn simple patterns (edges, color gradients). Deeper layers combine these into increasingly complex features (eyes, wheels, faces). Pooling layers downsample between convolution layers, reducing spatial dimensions while preserving important features.

Why CNNs Work

Two key properties make CNNs efficient: translation equivariance (a cat is a cat regardless of where it appears in the image — the same filter detects it everywhere) and locality (nearby pixels are more related than distant ones). These properties drastically reduce the number of parameters compared to fully connected networks, making CNNs tractable for high-resolution images.

CNNs Beyond Images

CNNs aren't limited to images. 1D convolutions process sequences (audio waveforms, time series). WaveNet (for speech synthesis) and some text classification models use 1D CNNs. In audio, spectrograms are treated as 2D images and processed with standard 2D CNNs. Even in the Transformer era, some hybrid architectures use convolutional layers for local feature extraction before feeding into attention layers.

Conceitos relacionados

← Todos os termos
← Clustering Cohere →