Troubleshooting Deployment Errors
This page discusses scenarios you may encounter during model deployment that will require troubleshooting, including:
Image building errors
Deployment errors
Troubleshooting views
Merlin provides the following views on the UI to troubleshoot a deployment:
Logs - the live console output when the image is building or the deployment is running
History - the list of deployment history status and message
You can navigate to these views from the Model Version page by clicking on the Logs tab or History tab.
Known Errors
OOMKilled
The "OOMKilled" error occurs when a container is terminated due to out-of-memory conditions. This typically happens when a container exceeds its allocated memory limit and the system is unable to provide additional memory. When this occurs, the container will be killed with exit code 137 to free up resources.
This error can affect both image building and deployment steps. To resolve the OOMKilled error, follow these steps:
Check which components that got OOMKilled
Check affected component memory limits
Monitor memory usage
Optimize model memory usage
Adjust memory limits
Liveness or readiness probe failures
Liveness and readiness probes are essential for ensuring the health and availability of Model services. The liveness probe is used to determine if a model is initialized and running properly, while the readiness probe indicates if a model is ready to serve traffic. When these probes fail, it can lead to service disruptions and impact the overall stability of the application.
Troubleshooting steps:
For standard model type, check pre-trained model size
For pyfunc model type, check how model was initialized
Inspect model logs
Monitor resource utilization
Image not found
The "Image Not Found" error occurs when Merlin is unable to locate the specified container image. This can happen for various reasons, such as the image not being available in the specified registry, incorrect image name, or network issues preventing the image pull.
To troubleshoot and resolve the "Image Not Found" error, follow these steps:
Verify image name and tag
Check image registry
Test image pull manually
Last updated