Tensor: Definition & Meaning — AI Wiki

一個多維數字陣列 — 深度學習中的基本資料結構。純量是 0 維 tensor(一個數字)。向量是 1 維 tensor。矩陣是 2 維 tensor。影像是 3 維 tensor(高 × 寬 × 通道)。一批影像是 4 維 tensor。模型權重、激活、梯度 — 神經網路裡的一切都是 tensor。

為什麼重要

Tensor 是深度學習的語言。PyTorch、TensorFlow、JAX 本質上都是 tensor 運算函式庫。理解 tensor shape 和操作對讀模型程式、除錯 shape 不匹配(ML 程式中最常見的錯誤)、理解神經網路內部發生的事情至關重要。如果你能跟著 tensor shape 走,你就能跟著架構走。

Deep Dive

Common tensor shapes in NLP: input tokens are (batch_size, sequence_length) integers. Embeddings are (batch_size, seq_len, model_dim) floats. Attention weights are (batch_size, num_heads, seq_len, seq_len). The output logits are (batch_size, seq_len, vocab_size). Understanding these shapes tells you exactly what's happening: the attention tensor is N×N because each token attends to every other token.

Operations

Key tensor operations: matmul (matrix multiplication — the core computation in neural networks), reshape (changing dimensions without changing data), transpose (swapping dimensions), concat (joining tensors along a dimension), slice (extracting subtensors), and broadcast (making differently-shaped tensors compatible for element-wise operations). Deep learning is really just a sequence of these operations applied to tensors.

GPU Acceleration

Tensors are computed on GPUs because tensor operations are massively parallel: multiplying two matrices involves millions of independent multiply-add operations that can run simultaneously. This is why GPU VRAM matters — all tensors involved in computation must reside in GPU memory. When you run out of VRAM, it's because the sum of all tensor sizes (model weights + activations + gradients + optimizer states) exceeds capacity. Techniques like gradient checkpointing, mixed precision, and model sharding are all about managing tensor memory.

Tensor

為什麼重要

Deep Dive

Operations

GPU Acceleration

相關概念