// services/maas

dedicated gpu · single tenant · anz compute

Models as a Service. Your weights. Our GPU.

Rent a dedicated NVIDIA Blackwell GPU in our Auckland facility. Ship us any model weights you have rights to — open-weight, fine-tuned, or proprietary — and we hand back a private, OpenAI-compatible endpoint. No shared tenancy. No tokens leaving the country.

// dedicated-gpu.aifoundry.co.nz

Your model. Our hardware. Private API.

Rent a dedicated GPU in our Auckland facility. Bring any model weights you have rights to — we load them and give you a private, OpenAI-compatible endpoint. Nothing shared, nothing logged, nothing offshore.

Dedicated 24GB

Entry tier for small-to-mid models and prototyping.

Pricing

Contact for pricing

Bespoke per workload · monthly or term commitments

GPU

NVIDIA RTX PRO 4000 Blackwell

VRAM

24 GB GDDR7

Tenancy

Dedicated · single tenant

Dedicated GPU

Private endpoint

ANZ compute

Bring your own weights

Up to 14B dense models

Quantised 30B-class (Q4/Q5)

Fine-tune serving & LoRA adapters

Embedding and reranker workloads

Internal prototypes and pilots

See the spec

Dedicated 96GB

Flagship

Flagship for 70B-class models and long-context workloads.

Pricing

Contact for pricing

Bespoke per workload · monthly or term commitments

GPU

NVIDIA RTX PRO 6000 Blackwell

VRAM

96 GB GDDR7

Tenancy

Dedicated · single tenant

Dedicated GPU

Private endpoint

ANZ compute

Bring your own weights

70B-class dense models

100B+ MoE (active params permitting)

Long-context coding and agent stacks

Vision-language and multimodal models

Production workloads at single-tenant latency

See the spec

Pricing is bespoke per workload — model size, expected concurrency, and retention all factor in.

// how_it_works

From weights to API in days, not quarters.

Four steps. No procurement cycle for the GPU, no data leaving the country.

Pick a model

Tell us what you want to run. Open-weight model from HuggingFace, your own fine-tune, a LoRA adapter, or a stack of smaller models in memory. If it fits in 24GB or 96GB of VRAM, we can serve it.

Pick a GPU

RTX PRO 4000 Blackwell (24GB) for smaller models and pilots, or RTX PRO 6000 Blackwell (96GB) for 70B-class and long-context workloads. Both are single-tenant — your GPU is yours.

We deploy

We load the weights onto your GPU in our Auckland facility, wire up an OpenAI-compatible endpoint, and hand you an API key. Typical lead time is a few business days from signed agreement.

You ship

Point your existing OpenAI SDK at your private endpoint. Streaming, tool use, system prompts — all standard. We monitor the hardware; you own the model and the data.

// maas_faq

The honest details.

Anything that fits the VRAM. The 24GB tier comfortably runs models up to about 14B at full precision, or 30B-class with quantisation (Q4/Q5). The 96GB tier handles 70B-class dense models at full precision, larger MoE models within active-parameter limits, and most vision-language models. Common picks: Llama 3.x, Qwen 3, DeepSeek, GPT-OSS, Mistral, Gemma — plus your fine-tunes and LoRAs.

You do. We need a model file (or a HuggingFace repo we can pull from) and confirmation that you have the licence to run it. We do not redistribute your weights, we do not log your prompts or responses, and we do not train on anything you send through the endpoint.

In our Auckland facility, hosted in Datacom Datacentres on NZ-controlled networking — serving all of ANZ. Your traffic stays in the ANZ region — no offshore failover, no cloud passthrough, no third-country monitoring.

Yes. Swapping the loaded model on your dedicated GPU is part of the service — typical turnaround is a business day. If you want to hot-swap or run multiple models concurrently within the VRAM budget, we can configure multi-model serving (vLLM or TGI) on request.

OpenAI-compatible /v1/chat/completions with streaming, tool use, and system prompts. Your private endpoint is a unique subdomain on api.aifoundry.co.nz protected by an API key issued just for your deployment. Drop-in with the OpenAI SDK, LangChain, LiteLLM, or anything that speaks the OpenAI shape.

Month-to-month with a notice period, or a discounted term (6–24 months) for production workloads. Hardware setup and model loading are billed once at deployment. Pricing depends on the GPU tier, expected concurrency, and any custom serving requirements — talk to us for a quote.

// ready_when_you_are

auckland · anz · sovereign compute

Let's build in-region.

Tell us what you're running and where it needs to live. We'll size the hardware, handle the deployment, and keep it in the ANZ region.

Email us

AI Foundry

Models as a Service. Your weights. Our GPU.

Rent a dedicated NVIDIA Blackwell GPU in our Auckland facility. Ship us any model weights you have rights to — open-weight, fine-tuned, or proprietary — and we hand back a private, OpenAI-compatible endpoint. No shared tenancy. No tokens leaving the country.

Your model. Our hardware. Private API.

Dedicated 24GB

Dedicated 96GB

From weights to API in days, not quarters.

Pick a model

Pick a GPU

We deploy

You ship

The honest details.

What models can I actually run?

Who owns the weights I deploy?

Where does the GPU physically live?

Can I swap the model later?

What does the API look like?

What is the commitment?

Let's build in-region.

Models as a Service. Your weights. Our GPU.

Rent a dedicated NVIDIA Blackwell GPU in our Auckland facility. Ship us any model weights you have rights to — open-weight, fine-tuned, or proprietary — and we hand back a private, OpenAI-compatible endpoint. No shared tenancy. No tokens leaving the country.

Your model. Our hardware. Private API.

Dedicated 24GB

Dedicated 96GB

From weights to API in days, not quarters.

Pick a model

Pick a GPU

We deploy

You ship

The honest details.

01What models can I actually run?

What models can I actually run?

02Who owns the weights I deploy?

Who owns the weights I deploy?

03Where does the GPU physically live?

Where does the GPU physically live?

04Can I swap the model later?

Can I swap the model later?

05What does the API look like?

What does the API look like?

06What is the commitment?

What is the commitment?

Let's build in-region.