Lunaris Codex - Dense
A modern, Llama-style Transformer architecture built from scratch. Incorporates SOTA features like QK-Norm, Grouped-Query Attention (GQA), and NTK-aware RoPE for enhanced stability and performance.
Lunaris Codex - MoE
Industrial-grade Mixture-of-Experts based on Switch Transformer. Features capacity-aware routing, router z-loss for stability, and high-performance contiguous dispatch.
MoC (Mixture-of-Collaborative-Experts)
Novel architecture where experts collaborate via a '2-Pass' communication mechanism before fusion, enabling emergent reasoning.
NSA-MoE Hybrid
Fusion of Native Sparse Attention (NSA) and MoE to simultaneously address O(n²) complexity and parameter growth.