The Initial Spark
The frequency domain has always been fascinating - there's something elegant about how DCT (Discrete Cosine Transform) can compress images so effectively, concentrating energy in just a few coefficients. This efficiency inspired by lossy compression, DSP, and even market prediction raises interesting questions.
The observation: modern image generation models can produce near real-time global illumination. What's baked into those weights? Maybe something in the frequency domain?
The wild idea: what if neural networks could operate fundamentally in the frequency domain?
🎮 Interactive Exploration
We built three interactive demos to explore the key challenges and insights from this research:
The Core Questions
- Could DCT/FFT operations replace spatial convolutions?
- Would sparsity in frequency domain (throwing away coefficients) give real speedups?
- Could we find activation functions that work natively in frequency space?
- What about wavelets, which are localized in both space AND frequency?
The Technical Reality
When we analyzed these ideas more deeply, several fundamental challenges emerged:
The Activation Function Problem
ReLU and other nonlinearities are cheap pointwise operations in spatial domain. In frequency domain, they become expensive global operations that mix all frequencies. You'd need to transform back and forth every layer, destroying any advantage.
Hardware Reality
GPUs need 90-95% sparsity to beat dense operations. Unstructured sparsity has terrible memory patterns. The theoretical speedups from frequency domain sparsity often don't materialize on real hardware.
The Deeper Issue
Fourier transforms diagonalize linear, shift-invariant operators. Neural networks are powered by nonlinearities. Once you add nonlinearities, the frequency domain advantages evaporate.
The Bigger Picture
Despite these technical challenges, some observations suggest there's still something worth exploring:
- Diffusion models achieve remarkable global coherence (lighting, shadows) efficiently
- The brain uses frequency decomposition (cochlear processing, spatial frequencies in V1)
- Compression wouldn't work if frequency domain wasn't fundamentally useful
What Actually Works
From the technical exploration, some real insights emerged:
Operations that suit frequency domain:
- Gaussian blur, smoothing (low-pass filtering)
- Compression, denoising
- Global pattern analysis
Operations that need spatial domain:
- Edge detection (spatial pattern matching, not just high-pass filtering)
- Local feature detection
- Most modern CNN operations
Existing successes:
- JPEG-domain processing for first CNN layers (skip decompression)
- Fourier Neural Operators for PDEs (but using spatial nonlinearities!)
- Spectral pooling as natural downsampling
The Deeper Question
Instead of "How do we port CNNs to frequency domain?" perhaps we should ask:
"What completely different architecture would naturally emerge if we started from frequency-domain principles?"
Not forcing existing architectures into a new domain, but imagining what intelligence would look like if it evolved in frequency space.
The Diffusion Model Mystery
One thread remains tantalizingly unexplored: modern diffusion models achieve near-instant global illumination and coherent lighting. They've learned something about global frequency relationships. What is it? How do they do it so efficiently?
This demo reveals how phase information controls structural coherence - a key insight for understanding how diffusion models achieve global consistency.
Perhaps the answer isn't in forcing neural networks into frequency domain, but in understanding what frequency-like representations emerge naturally in spatial networks when trained on the right objectives.
Key Takeaways
- Match computation to operation type - some operations are naturally frequency-friendly, others aren't
- Hardware reality shapes algorithm design - modern GPUs are so optimized for dense operations that theoretical advantages often don't translate
- The activation function problem is fundamental - no clean solution for frequency-native nonlinearities that preserve both efficiency and expressivity
- But something interesting is happening - especially in diffusion models' ability to handle global coherence
Eppur Si Muove
"And yet it moves" - despite all the technical objections, there's something here. Maybe not frequency-domain neural networks as originally conceived, but something about how intelligence works with frequency representations, how global coherence emerges efficiently, how compression and perception might be more deeply connected than we realize.
The exploration revealed fundamental challenges, but also hints at unexplored territories - particularly in how modern generative models achieve their remarkable global coherence. That's worth pursuing.