Published: Jun 29, 2021 by
This main release introduced many exciting new features, including GPU acceleration, customer runtime definitions, assigning Pods based on labels of nodes to control workload accurately, supporting binding the HTTP port, and so on.
GPU acceleration
On the Create
or Update
Deployment page, in addition to pre allocating CPU cores and memory size, you can also specify the number of GPUs required for deploying services.
GPUs cannot be shared among multiple Pods. A Pod can specify one or more GPUs, but can only be an integer quantity and cannot use decimals. This is different from the number of CPU cores. For information on how to use GPUs in k8s, refer to the official document Schedule GPUs.
Currently, NVIDIA GPUs are supported. The most convenient way to enable GPUs in a k8s environment is to install the NVIDIA GPU Operator, which can automatically install NVIDIA graphics card drivers, GPU device plugins, container runtime, and automatically label nodes. In Microk8s, the GPU Operator can be installed with a simple command microk8s.enable gpu
.
During the model testing process, our prediction script will output the device type used in the log, such as CPU or GPU. You can see the information “ONNX Runtime device: GPU” in the standard output on the right side of the page. ONNX Runtime uses GPU, and by entering the command nvidia smi in the shell, you can view which processes are using GPU:
nvidia-smi
Sun Jul 11 19:13:26 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80 Driver Version: 460.80 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 Off | 00000000:02:00.0 On | N/A |
| 40% 43C P3 25W / 90W | 858MiB / 3902MiB | 34% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1758 G /usr/lib/xorg/Xorg 35MiB |
| 0 N/A N/A 2821 G /usr/lib/xorg/Xorg 241MiB |
| 0 N/A N/A 2932 G /usr/bin/gnome-shell 140MiB |
| 0 N/A N/A 4203 G ...AAAAAAAAA= --shared-files 47MiB |
| 0 N/A N/A 9977 G /usr/lib/firefox/firefox 155MiB |
| 0 N/A N/A 11589 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 145526 C python 223MiB |
+-----------------------------------------------------------------------------+
We can see that the final Python process is the one that is used by the current predicted model. Model testing is a non possessive operation. After the above call is completed, execute nvidia smi again and it will be found that the Python process has disappeared. The formal deployment process is fully possessive and will continue to exist. If there are no idle GPUs in K8S and subsequent deployments specify GPUs, POD will be unable to start due to the lack of available resources.
Runtime Definations
The default Python 2.7 based runtimes have been removed, and only the Python 3.7 based runtimes have been retained: AI environment with Python 3.7
and AI worker with Python 3.7
. They are based on the same base image daas-ai: 2.0.0-py37
, which packages almost all popular Python based machine learning and deep learning libraries for user testing. Considering the size of the image, the deep learning library is based on the CPU version. For GPU versions of deep learning, they can be created as independent runtimes, you can refer to Dockerfiles in daas-runtimes that integrate the popular deep learning libraries such as ONNX Runtime, Tensorflow, and Pytorch.
During the formal deployment process, it is recommended that users create a runtime based on their own version of the model training library to deploy the model, especially using the language based binary serialization format pickle/joblib. Using different library versions may cause model loading failure or other compatibility issues.
Assign Pods based on Node Labels
In order to better manage and allocate the resources of each node in k8s, nodes are divided into three categories by using Node Labels:
- Control nodes, running various Pods of DaaS itself, including Nginx proxy servers, webapps, and other microservices.
- Compute nodes, including model validation, model testing, and task deployment.
- Deploy nodes mainly run service deployments.
A physical node can be set to one or more different labels, for example, in a single node Micork8s environment, we need to set the current node to include all three labels.
Support binding to HTTP ports
Previously, the default binding was HTTPS port, and DaaS included a Self Certification SSL certificate that would block access to this address in some browsers. Through our new script bind external ip.sh, in addition to HTTPS port, it is possible to bind an HTTP port at the same time, making it more efficient to use it in internal networks.
bind-external-ip.sh xxx.xxx.xxx.xxx [HTTPS_PORT] [HTTP_PORT]
Get a trial version of AI-DaaS microk8s
AI-DaaS microk8s is a free version of DaaS installed in the k8s single node distribution version of microk8s, providing complete model deployment functionality for offline installation. If you want to deploy locally, please send an email to info(at)autodeployai.com
and provide your enterprise organization and details about model deployment requirements.