Capability 04

Model swap

Code implementation in Mandelia is model agnostic. Any model fulfills the interface — frontier API, open weights, customer fine-tune, or task-specific specialist.

Today, Mandelia utilizes a frontier LLM and we are interested in exploring the benefits of swapping this baseline capability for customer-specified models.

We're looking for partners in applying our model agnosticism to the following:

Debloating & refactoring specialists

Models tuned on transformation traces, AST-aware edits, or specific reduction techniques.

Modern applications include hundreds of libraries with vast unused code surface area — creating security risk and runtime bloat. Debloating means automatically removing unused code while preserving used behavior, an active ONR research area under TPCP (Total Platform Cyber Protection). Specialized models tuned on AST (Abstract Syntax Tree) transformations and reduction traces significantly outperform general-purpose LLMs at this task; Mandelia plugs them in as implementer nodes.

Cybersecurity authorship

Models tuned for threat modeling, vulnerability remediation, security-control writing, or red-team narrative generation.

Models tuned specifically for security-relevant code generation — vulnerability remediation, security control insertion (input validation, access checks, crypto routines), threat modeling, red-team narrative generation. General-purpose LLMs are notoriously inconsistent in security-sensitive code where one missed check creates a real exploit; domain-tuned models perform substantially better. Mandelia routes security-sensitive nodes to these specialists.

Language-tuned models

Swift, Kotlin, Rust, Ada, COBOL, MISRA-C, embedded C, CUDA, HDL, and other niches where frontier providers underperform.

Frontier LLMs are trained primarily on common languages (Python, JavaScript, TypeScript). For Swift, Kotlin, Rust, Ada, COBOL, MISRA-C, embedded C, CUDA, and HDL (Hardware Description Languages), specialized models outperform frontier providers significantly — and defense systems disproportionately use exactly these niche languages. Mandelia lets customers plug in language-specialist models per node so each code transformation runs on the model best suited to its target language.

Device-tuned models

Mobile (iOS / Android), embedded RTOS, FPGA, GPU kernels, or specific microcontroller families.

Mobile (iOS Swift / Android Kotlin), embedded RTOS (Real-Time Operating Systems for industrial and aerospace platforms), FPGA HDL, GPU kernels (CUDA / HIP / Metal), and specific microcontroller families each have idiosyncratic patterns and constraints. Generic models produce code that "looks right" syntactically but doesn't run well — or at all — on the actual hardware. Mandelia routes device-targeted nodes to models tuned for the specific target platform.

Customer-provided policies or fine-tunes

Models trained on the customer's own codebase, doctrine, internal style guides, or domain corpora — slotted into implementer or verifier nodes to operate directly against the existing system.

Large customers (banks, defense primes, hospitals) have internal codebases with proprietary patterns, style guides, doctrine, and APIs they can't share with public model providers. Fine-tuning a base model on their internal corpus produces dramatically better outputs for their specific environment — and the model stays inside their boundary. Mandelia treats their fine-tune as just another implementer node — no architecture change, no data leakage.

When a custom fine-tune isn't feasible, the lighter alternative is to provide practices and implementation policies as reference documents: Mandelia builds in accordance with them, and verifier nodes validate the output against the same policies.

Domain-expert models

Legal, medical, scientific, financial, or defense-doctrine models for nodes where general-purpose LLMs underperform.

Legal (contract automation, regulatory drafting), medical (FDA-regulated software, clinical decision support), scientific (numerical methods, simulation code), financial (compliance code, transaction processing), and defense doctrine (doctrinally correct mission logic) each have models that substantially outperform general-purpose LLMs in their domain. Mandelia routes domain-sensitive nodes to domain-expert models per task — so code that touches HIPAA-regulated workflows runs through a medical-tuned model, code that touches financial transactions through a finance-tuned one.

Open-weights & self-hosted models

For on-premise, air-gapped, or classified-network deployment when SaaS APIs aren't acceptable.

For air-gapped, classified, or sovereign environments where SaaS APIs aren't acceptable, open-weights models (Llama, Qwen, DeepSeek, Mistral) running entirely on customer infrastructure are the only option. The performance gap with frontier API models is narrowing rapidly — open weights now match closed-API performance in many domains. Mandelia treats them identically to API-based models — same interface, same verification primitives, no architectural change.

Accreditation-bound hosting

FedRAMP, ITAR-aware, IL4 / IL5 / IL6, or sovereign-cloud models when data residency or accreditation matters.

FedRAMP (Federal Risk and Authorization Management Program — the federal cloud accreditation baseline), IL4 / IL5 / IL6 (DoW Cloud Impact Levels for sensitivity tiers, IL5 covers CUI, IL6 covers Secret), ITAR (International Traffic in Arms Regulations), and sovereign cloud regions (Azure Government, AWS GovCloud, Google Distributed Cloud Hosted). Defense and federal customers can only use models hosted under specific accreditation regimes. Mandelia binds model selection to accreditation requirements per node so the right model runs in the right enclave.

Cost & latency optimization

Small fast models for routine implementer work, larger reasoning models for planning, selected per node or by policy.

Frontier reasoning models are expensive ($15+/M tokens) and slow (5–30 seconds per call). Small fast models are 50–100x cheaper and faster but less capable. Mandelia routes per node — large reasoning models for planner nodes that need deep deliberation, small fast models for routine implementer work — selected automatically or by policy. Total cost can drop 10–20x without quality loss, which compounds significantly across the thousands of LLM calls a real build requires.

Reach out to see the demo or discuss how we can work together.

← Back to all capabilities