Zubnet AIAprenderWiki › Tensor
Fundamentos

Tensor

Multidimensional Array
Un array multidimensional de números — la estructura de datos fundamental en deep learning. Un escalar es un tensor 0D (un solo número). Un vector es un tensor 1D. Una matriz es un tensor 2D. Una imagen es un tensor 3D (alto × ancho × canales). Un batch de imágenes es un tensor 4D. Pesos del modelo, activaciones, gradientes — todo en una red neuronal es un tensor.

Por qué importa

Los tensors son el lenguaje del deep learning. PyTorch, TensorFlow y JAX son fundamentalmente bibliotecas de computación tensorial. Entender shapes y operaciones de tensors es esencial para leer código de modelo, debuguear mismatches de shape (el error más común en código ML), y entender qué pasa dentro de redes neuronales. Si puedes seguir los shapes de tensors, puedes seguir la arquitectura.

Deep Dive

Common tensor shapes in NLP: input tokens are (batch_size, sequence_length) integers. Embeddings are (batch_size, seq_len, model_dim) floats. Attention weights are (batch_size, num_heads, seq_len, seq_len). The output logits are (batch_size, seq_len, vocab_size). Understanding these shapes tells you exactly what's happening: the attention tensor is N×N because each token attends to every other token.

Operations

Key tensor operations: matmul (matrix multiplication — the core computation in neural networks), reshape (changing dimensions without changing data), transpose (swapping dimensions), concat (joining tensors along a dimension), slice (extracting subtensors), and broadcast (making differently-shaped tensors compatible for element-wise operations). Deep learning is really just a sequence of these operations applied to tensors.

GPU Acceleration

Tensors are computed on GPUs because tensor operations are massively parallel: multiplying two matrices involves millions of independent multiply-add operations that can run simultaneously. This is why GPU VRAM matters — all tensors involved in computation must reside in GPU memory. When you run out of VRAM, it's because the sum of all tensor sizes (model weights + activations + gradients + optimizer states) exceeds capacity. Techniques like gradient checkpointing, mixed precision, and model sharding are all about managing tensor memory.

Conceptos relacionados

← Todos los términos
ESC