// services/maas
dedicated gpu · single tenant · nz sovereign

Models as a Service. Your weights. Our GPU.

Rent a dedicated NVIDIA Blackwell GPU in our Auckland facility. Ship us any model weights you have rights to — open-weight, fine-tuned, or proprietary — and we hand back a private, OpenAI-compatible endpoint. No shared tenancy. No tokens leaving the country.
// dedicated-gpu.aifoundry.co.nz

Your model. Our hardware. Private API.

Rent a dedicated GPU in our Auckland facility. Bring any model weights you have rights to — we load them and give you a private, OpenAI-compatible endpoint. Nothing shared, nothing logged, nothing offshore.

Dedicated 24GB

Entry tier for small-to-mid models and prototyping.

Pricing

Contact for pricing

Bespoke per workload · monthly or term commitments
GPU

NVIDIA RTX PRO 4000 Blackwell

VRAM

24 GB GDDR7

Tenancy

Dedicated · single tenant

Dedicated GPU
Private endpoint
NZ sovereign
Bring your own weights

Up to 14B dense models

Quantised 30B-class (Q4/Q5)

Fine-tune serving & LoRA adapters

Embedding and reranker workloads

Internal prototypes and pilots

See the spec

Dedicated 96GB

Flagship

Flagship for 70B-class models and long-context workloads.

Pricing

Contact for pricing

Bespoke per workload · monthly or term commitments
GPU

NVIDIA RTX PRO 6000 Blackwell

VRAM

96 GB GDDR7

Tenancy

Dedicated · single tenant

Dedicated GPU
Private endpoint
NZ sovereign
Bring your own weights

70B-class dense models

100B+ MoE (active params permitting)

Long-context coding and agent stacks

Vision-language and multimodal models

Production workloads at single-tenant latency

See the spec
Pricing is bespoke per workload — model size, expected concurrency, and retention all factor in.
// how_it_works

From weights to API in days, not quarters.

Four steps. No procurement cycle for the GPU, no data leaving the country.

01

Pick a model

Tell us what you want to run. Open-weight model from HuggingFace, your own fine-tune, a LoRA adapter, or a stack of smaller models in memory. If it fits in 24GB or 96GB of VRAM, we can serve it.

02

Pick a GPU

RTX PRO 4000 Blackwell (24GB) for smaller models and pilots, or RTX PRO 6000 Blackwell (96GB) for 70B-class and long-context workloads. Both are single-tenant — your GPU is yours.

03

We deploy

We load the weights onto your GPU in our Auckland facility, wire up an OpenAI-compatible endpoint, and hand you an API key. Typical lead time is a few business days from signed agreement.

04

You ship

Point your existing OpenAI SDK at your private endpoint. Streaming, tool use, system prompts — all standard. We monitor the hardware; you own the model and the data.

// maas_faq

The honest details.

Anything that fits the VRAM. The 24GB tier comfortably runs models up to about 14B at full precision, or 30B-class with quantisation (Q4/Q5). The 96GB tier handles 70B-class dense models at full precision, larger MoE models within active-parameter limits, and most vision-language models. Common picks: Llama 3.x, Qwen 3, DeepSeek, GPT-OSS, Mistral, Gemma — plus your fine-tunes and LoRAs.

You do. We need a model file (or a HuggingFace repo we can pull from) and confirmation that you have the licence to run it. We do not redistribute your weights, we do not log your prompts or responses, and we do not train on anything you send through the endpoint.

In our Auckland facility, hosted in Datacom Datacentres on NZ-controlled networking. Your traffic never leaves the country — no offshore failover, no cloud passthrough, no cross-border monitoring.

Yes. Swapping the loaded model on your dedicated GPU is part of the service — typical turnaround is a business day. If you want to hot-swap or run multiple models concurrently within the VRAM budget, we can configure multi-model serving (vLLM or TGI) on request.

OpenAI-compatible /v1/chat/completions with streaming, tool use, and system prompts. Your private endpoint is a unique subdomain on api.aifoundry.co.nz protected by an API key issued just for your deployment. Drop-in with the OpenAI SDK, LangChain, LiteLLM, or anything that speaks the OpenAI shape.

Month-to-month with a notice period, or a discounted term (6–24 months) for production workloads. Hardware setup and model loading are billed once at deployment. Pricing depends on the GPU tier, expected concurrency, and any custom serving requirements — talk to us for a quote.

// ready_when_you_are
live now · plans from $7 USD/mo

Start building today.

Sign up, generate an API key, and send your first request in minutes. Cancel from the Stripe portal any time.