Zubnet AI學習Wiki › Convolution
基礎

Convolution

Conv, Convolutional Layer, Kernel, Filter
一個數學運算,透過在輸入上滑動一個小濾波器(kernel)來偵測局部模式。在影像裡,一個 3×3 kernel 滑過每個位置,與底層像素計算點積產生特徵圖。不同 kernel 偵測不同模式:水平邊緣、垂直邊緣、紋理,在更深層最終是眼睛或車輪這類複雜特徵。

為什麼重要

卷積是讓電腦視覺運作的運算。它編碼兩個強大的假設:局部性(鄰近像素相關)和平移等變性(模式無論出現在哪裡都一樣)。這些假設相比全連接層大幅減少參數數量,讓處理高解析度影像變得可行。即便在 Transformer 時代,卷積仍在許多混合架構中使用。

Deep Dive

A convolution with a 3×3 kernel: at each position, multiply the 9 kernel values with the 9 underlying input values and sum them. This produces one output value. Slide the kernel to the next position and repeat. A single kernel produces one feature map (detecting one pattern). Multiple kernels produce multiple feature maps. Stride (how far the kernel moves each step) and padding (how to handle edges) are additional parameters that control the output size.

Depth and Hierarchy

In a CNN, early layers use small kernels to detect simple patterns. Each subsequent layer convolves over the previous layer's feature maps, detecting progressively more complex patterns. Layer 1: edges. Layer 2: corners and textures (combinations of edges). Layer 3: object parts (combinations of textures). Layer 4: objects (combinations of parts). This hierarchical feature learning is the fundamental mechanism behind CNNs' success in vision.

1D and 3D Convolutions

Convolutions aren't limited to 2D images. 1D convolutions process sequences (audio waveforms, time series, text), sliding a kernel along one dimension. 3D convolutions process volumes (video, medical scans), sliding along three dimensions. The principle is identical: local pattern detection with parameter sharing. 1D convolutions are used in some modern architectures (ConvNeXt, Hyena) as efficient alternatives to attention for certain operations.

相關概念

← 所有術語
ESC