The package was addressed to the company's lead programmer, John. Curiosity piqued, he opened the box to find a single, sleek CD-ROM with a label that read: "Falcon 4.0 Source Code - Confidential".
TII is reportedly preparing a "Source Available Plus" license for Falcon 180 that releases the custom Flash kernels to the public, keeping only the orchestration layer proprietary. falcon 40 source code exclusive
class FalconDecoderLayer(nn.Module): def __init__(self, config): # Input Layer Norm (Falcon uses Pre-Normalization) self.input_layernorm = LayerNorm(...) # The Attention Mechanism (Multi-Query Attention) self.self_attn = FalconAttention(config) The package was addressed to the company's lead
Because of MQA, the KV cache is tiny, but Falcon 40B still needs to manage 40B weights. The source includes a custom CacheManager class that implements . When the sequence exceeds the cache limit, the code drops intermediate tokens but keeps the first token (the system prompt) and the last 512 tokens. class FalconDecoderLayer(nn
The inference code ( serve/falcon_server.py ) shows built-in support for:
: Training was performed using TII’s custom distributed training codebase, 4. Recommended Paper Citations
While many users have interacted with Falcon 40 via Hugging Face or API endpoints, the proprietary inner workings, the custom CUDA kernels, and the specific training dynamics have remained shrouded in mystery. Until now. We have obtained exclusive access to the unredacted source code repository, and here is everything you need to know.