Question 1

What is model drift and how do you detect it?

Accepted Answer

Model drift refers to the gradual degradation of a model's performance in production, caused by changes in input data or user behaviour. It shows up as a drop in the quality score, more refusals or off-topic answers. We put automatic probes in place: input/output sampling, periodic evaluation on a reference set, and configurable alerts. On average, RAG models drifting on evolving corpora need quarterly re-evaluation.

Question 2

What GPU infrastructure do you need to host your own models?

Accepted Answer

It depends on the model. A 7-billion-parameter model quantised to 4-bit fits on an A10G card (24 GB VRAM) with acceptable latency. A 70-billion model needs several A100 or H100 GPUs in parallel. We size the infrastructure based on your target request load (tokens/second), the chosen model and your budget. For companies without GPU servers, Swiss cloud options exist with several local providers we have referenced.

Question 3

Can you take over a model already deployed by another team?

Accepted Answer

Yes, that's a frequent situation. We start with a technical audit: existing infrastructure, pipelines in place, available documentation, log quality. On that basis we identify the fragility points and propose a stabilisation plan before scaling further. Taking over an existing deployment usually takes two to four weeks, depending on the state of the documentation and the complexity of the infrastructure.

AI model deployment and MLOps

Four pillars of MLOps.
An operational discipline.

Reproducible MLOps pipelines

Quality monitoring

Hosting & GPU inference

Existing deployment takeover

Audit, pipeline, observability.
From notebook to production.

Audit & instrumentation

Reproducible pipeline

Continuous monitoring

Our reference MLOps stack.

Frequently asked questions.

Related services.

Got a use case in mind?
Let's talk.

Four pillars of MLOps.An operational discipline.