Tensor: Definition & Meaning — AI Wiki

Numbers का एक multidimensional array — deep learning में fundamental data structure। एक scalar एक 0D tensor है (एक single number)। एक vector एक 1D tensor है। एक matrix एक 2D tensor है। एक image एक 3D tensor है (height × width × channels)। Images का एक batch एक 4D tensor है। Model weights, activations, gradients — एक neural network में सब कुछ एक tensor है।

यह क्यों matter करता है

Tensors deep learning की language हैं। PyTorch, TensorFlow, और JAX fundamentally tensor computation libraries हैं। Tensor shapes और operations समझना model code पढ़ने, shape mismatches debug करने (ML code में सबसे common error), और neural networks के अंदर क्या होता है उसे समझने के लिए essential है। अगर आप tensor shapes follow कर सकते हैं, आप architecture follow कर सकते हैं।

Deep Dive

Common tensor shapes in NLP: input tokens are (batch_size, sequence_length) integers. Embeddings are (batch_size, seq_len, model_dim) floats. Attention weights are (batch_size, num_heads, seq_len, seq_len). The output logits are (batch_size, seq_len, vocab_size). Understanding these shapes tells you exactly what's happening: the attention tensor is N×N because each token attends to every other token.

Operations

Key tensor operations: matmul (matrix multiplication — the core computation in neural networks), reshape (changing dimensions without changing data), transpose (swapping dimensions), concat (joining tensors along a dimension), slice (extracting subtensors), and broadcast (making differently-shaped tensors compatible for element-wise operations). Deep learning is really just a sequence of these operations applied to tensors.

GPU Acceleration

Tensors are computed on GPUs because tensor operations are massively parallel: multiplying two matrices involves millions of independent multiply-add operations that can run simultaneously. This is why GPU VRAM matters — all tensors involved in computation must reside in GPU memory. When you run out of VRAM, it's because the sum of all tensor sizes (model weights + activations + gradients + optimizer states) exceeds capacity. Techniques like gradient checkpointing, mixed precision, and model sharding are all about managing tensor memory.

Tensor

यह क्यों matter करता है

Deep Dive

Operations

GPU Acceleration

संबंधित अवधारणाएँ