Tensor: Definition & Meaning — AI Wiki

一个多维数字数组 — 深度学习中的基本数据结构。标量是 0 维 tensor(一个数字)。向量是 1 维 tensor。矩阵是 2 维 tensor。图像是 3 维 tensor(高 × 宽 × 通道)。一批图像是 4 维 tensor。模型权重、激活、梯度 — 神经网络里的一切都是 tensor。

为什么重要

Tensor 是深度学习的语言。PyTorch、TensorFlow、JAX 本质上都是 tensor 计算库。理解 tensor shape 和操作对读模型代码、调试 shape 不匹配(ML 代码中最常见的错误)、理解神经网络内部发生的事情至关重要。如果你能跟着 tensor shape 走,你就能跟着架构走。

Deep Dive

Common tensor shapes in NLP: input tokens are (batch_size, sequence_length) integers. Embeddings are (batch_size, seq_len, model_dim) floats. Attention weights are (batch_size, num_heads, seq_len, seq_len). The output logits are (batch_size, seq_len, vocab_size). Understanding these shapes tells you exactly what's happening: the attention tensor is N×N because each token attends to every other token.

Operations

Key tensor operations: matmul (matrix multiplication — the core computation in neural networks), reshape (changing dimensions without changing data), transpose (swapping dimensions), concat (joining tensors along a dimension), slice (extracting subtensors), and broadcast (making differently-shaped tensors compatible for element-wise operations). Deep learning is really just a sequence of these operations applied to tensors.

GPU Acceleration

Tensors are computed on GPUs because tensor operations are massively parallel: multiplying two matrices involves millions of independent multiply-add operations that can run simultaneously. This is why GPU VRAM matters — all tensors involved in computation must reside in GPU memory. When you run out of VRAM, it's because the sum of all tensor sizes (model weights + activations + gradients + optimizer states) exceeds capacity. Techniques like gradient checkpointing, mixed precision, and model sharding are all about managing tensor memory.

Tensor

为什么重要

Deep Dive

Operations

GPU Acceleration

相关概念