Francisco Antonio

AI Architect & Researcher

GitHub Twitter

Age

17 y/o

Self-taught Researcher

Focus

LLM Architectures &
Emergent Intelligence

Mission Protocol

Passionate about building the next generation of language model architectures from the ground up. I see a Transformer architecture as a canvas—a space to explore new pathways to intelligence, from the computational efficiency of industrial-grade Mixture-of-Experts to the emergent complexity of collaborative reasoning systems.

My work focuses on designing, implementing, and training high-performance AI systems in PyTorch that don't just work, but also teach us something new about learning itself.

SELECTED_WORKS

Lunaris Codex - Dense

A modern, Llama-style Transformer architecture built from scratch. Incorporates SOTA features like QK-Norm, Grouped-Query Attention (GQA), and NTK-aware RoPE for enhanced stability and performance.

PyTorch GQA QK-Norm

Lunaris Codex - MoE

Industrial-grade Mixture-of-Experts based on Switch Transformer. Features capacity-aware routing, router z-loss for stability, and high-performance contiguous dispatch.

MoE FSDP Distributed

MoC (Mixture-of-Collaborative-Experts)

Novel architecture where experts collaborate via a '2-Pass' communication mechanism before fusion, enabling emergent reasoning.

Research Novel

NSA-MoE Hybrid

Fusion of Native Sparse Attention (NSA) and MoE to simultaneously address O(n²) complexity and parameter growth.

Sparse Attn Long Context

Let's Connect

Always open to discussing new architectures and opportunities.