The original neural style transfer (Gatys et al., 2015) works by optimizing an image to simultaneously match the content features of one image and the style features (texture, color patterns) of another. Content is captured by deep layer activations (which represent objects and structure). Style is captured by Gram matrices of early/mid layer activations (which represent textures and patterns independent of spatial arrangement).
The original method is slow (minutes per image, optimizing pixels iteratively). Fast style transfer trains a feedforward network to apply a specific style in a single forward pass (milliseconds). The trade-off: each network only does one style. AdaIN (Adaptive Instance Normalization) solved this by adjusting normalization statistics to match any reference style, enabling arbitrary style transfer in real-time.
Today, style transfer is largely subsumed by image generation models. ControlNet with style references, IP-Adapter for style conditioning, and direct prompting ("in the style of watercolor painting") achieve more flexible and higher-quality style transfer than dedicated style transfer networks. But the core insight — that neural networks separate content from style at different layers — remains foundational to understanding visual representations.