LLMs deployed in the cloud using NVIDIA GPUs

DeepSeek / Gemma / Llama / Mistral / Qwen / GPT-OSS etc

Cloud Services for LLMs and Generative AI in Ukraine

Deploy enterprise LLM assistants, RAG systems, chatbots, and AI applications on De Novo’s infrastructure, powered by NVIDIA H200 NVL, H100, and other GPUs.

A solution for rapidly building applications with generative AI. It allows you to quickly deploy a model, connect corporate data, test scenarios, run inference, and securely integrate Gen AI into business processes.

The LLM Cloud provides a ready-made infrastructure foundation for working with models, data, and AI applications without the need to deploy the entire stack from scratch yourself.

Dozens of pre-installed models including Gemma, GPT, DeepSeek, Granite, Llama, Mistral, and others are available on widest range of NVIDIA GPUs: H200 NVL, H100, A100 NVL, L4, and L40s.

Cloud Services for LLMs and Generative AI in Ukraine

Get a consultation

What is De Novo’s LLM Cloud?

The LLM Cloud is a comprehensive environment for working with large language models, combining GPU infrastructure, ML/LLMOps tools, Kubernetes, data storage, secure network infrastructure, and a platform for building generative applications.

Tensor Cloud — the compute layer: GPUs, Kubernetes, scaling, infrastructure for inference, fine-tuning, and ML/AI workloads.

AI Studio — application layer: a low-code/no-code environment for rapidly creating LLM assistants, chatbots, RAG solutions, document analysis, and integration with business processes.

What tasks does the cloud solve for LLMs?

LLM assistants for employees

Internal assistants for searching for information in regulations, instructions, knowledge bases, and technical documentation

RAG on corporate data

Responses based on company documents without transferring data to external public AI services

Chatbots for customers and contact centers

Automation of common inquiries, preparation of responses, summarization of dialogues, and support for agents

Document analysis

Processing of contracts, policies, tender documentation, technical specifications, reports, and regulatory materials

AI product prototyping

Rapid hypothesis testing before investing in full-scale development

High-performance LLM inference

Deploying models via API for integration with CRM, ERP, DMS, portals, and internal applications

A Quick Start for LLM Projects

A solution for teams that want to get started with their pipeline right away without having to manually deploy the entire stack. Saves MLOps and DevOps engineers dozens of hours.

AI Studio provides companies with a ready-to-use environment for building generative AI solutions. Here, you can select models, test scenarios, connect data, analyze application performance, manage resources, and gradually transition solutions from prototype to production. The platform is suitable for both development teams and business professionals who need a quick start without delving deeply into infrastructure configurations.

How to get started with an LLM?

Step 1. Define your use case
AI assistant, RAG, document analysis, chatbot, model inference, fine-tuning, or an AI product prototype.

Step 2. Choose a path
AI Studio for quick deployment, ML Cloud for model engineering, or Tensor Cloud for infrastructure scenarios.

Step 3. Deploy the environment
AI Studio can be automatically deployed from templates in Tensor Cloud; the user receives a ready-to-use environment with integrated services, authentication, monitoring, and backup.

Step 4. Connect data and test quality
Documents, knowledge bases, corporate systems, APIs, response scenarios, restrictions, and user roles.

Step 5. Deploy to production
Configure monitoring, access, backup, integration with business processes, and support.

Products for AI/ML

AI Studio

A ready-made environment where you can create your own applications with generative AI, test them, and safely connect them to business processes. The platform supports a low-code/no-code approach

Cloud Services for LLMs and GenAI

It allows you to quickly deploy a model, connect corporate data, test scenarios, run inference, and securely integrate Gen AI into business processes

Tensor Cloud with NVIDIA GPUs

Cloud with Kubernetes and NVIDIA GPU H200 NVL, H100, A100 NVL, L40S, L4 with tensor cores to run artificial intelligence and machine learning (AI/ML) workloads

Hosted Tensor Infrastructure

AI/ML-accelerated Kubernetes with NVIDIA GPU H200 NVL, H100, A100 NVL, L40S, L4 with Tensor Cores on Hosted Private Infrastructure (HPI)

Cloud for LLMs or On-Premise?

The first stage of any AI project is choosing the neural network itself. Once that choice has been made, the next question immediately arises: where should this LLM be deployed? There are two main options: on the company’s own resources, or on an operator’s cloud platform.

If you are certain that the hardware resources, primarily GPU accelerators, will be used by your model at 80% capacity or more, or close to that, around the clock, and that the data processed by the neural network must under no circumstances leave your company’s protected perimeter, then having your own LLM infrastructure is likely to be the most suitable option. In all other cases, the optimal choice is an operator cloud for Large Language Models.

It is worth bearing in mind that an in-house site is not just about GPUs, which are expensive in their own right. It also requires all the necessary data centre engineering infrastructure, dedicated interconnects, software configuration, and costly specialists, including an MLOps team capable of implementing and maintaining the whole environment. By contrast, running an LLM in the cloud allows companies to avoid all capital expenditure (CapEx) and long-term investment, moving instead to a fully operational expenditure (OpEx) model. In other words, resources are paid for only as they are used. This is how cloud LLM hosting works.

What should you consider when choosing a GPU server for LLMs?

When choosing a GPU server for Large Language Models, it is important to assess not only the number and nominal performance of the accelerators, but also the characteristics of the models being used, along with the related technical details. This is especially important when designing a comprehensive LLM platform. In this case, the critical parameters are the amount of available high-speed memory and the bandwidth of the data exchange bus between accelerators, or interconnect.

For example, the weights of a large LLM model can take up hundreds of gigabytes even before user context is taken into account. If they do not fit into the memory of a single accelerator, the system will distribute computation across several GPUs. That is why a server for LLMs must support high-speed interconnects such as NVLink.

At the same time, most LLM deployments are rarely limited to a single server. Sooner or later, expansion will be required, and this should be planned for in advance. For example, distributed high-speed storage may be needed for rapid weight loading, along with monitoring systems and tools for inference orchestration. Building, developing and maintaining such a system in-house is difficult and very expensive. Yet all the required resources are now available in the cloud. Any major LLM provider offers a complete set of capabilities for deploying and developing large language models.

How can you build a corporate ChatGPT based on cloud LLM hosting?

Training an LLM from scratch is time-consuming and expensive. Today, businesses almost never set themselves this kind of task. Instead, when they need to create a corporate equivalent of a neural network such as ChatGPT, they take a ready-made foundation model from the many dozens available on the market and give it access to a body of closed corporate knowledge, without changing the architecture of the network itself. This knowledge usually consists of documents, databases, internal policies and similar sources. Cloud hosting for Large Language Models in such projects must provide secure and reliable connection of the LLM to these internal sources without the risk of sensitive data leakage. Accordingly, hosting for LLMs must integrate seamlessly with the organisation’s IT security system. The integration of the organisation’s own knowledge is carried out using the RAG approach, or Retrieval-Augmented Generation, which dynamically inserts the relevant documents into the user’s query at the moment of request.

For companies in Ukraine, an important factor remains the deployment of AI platforms within Ukrainian jurisdiction, so cloud hosting for Large Language Models must take this into account. However, data localisation is no longer a problem, as all major Ukrainian operators offer LLM hosting in the cloud within the country. This ensures not only that data is hosted without being moved abroad, but also that network latency is kept to a minimum.

Cloud Services for LLMs and Generative AI in Ukraine

What is De Novo’s LLM Cloud?

What tasks does the cloud solve for LLMs?

A Quick Start for LLM Projects

How to get started with an LLM?

Order a LLM cloud

Products for AI/ML

Cloud for LLMs or On-Premise?

What should you consider when choosing a GPU server for LLMs?

How can you build a corporate ChatGPT based on cloud LLM hosting?