CaraML Docs
CaraML Homepage
  • Introduction
    • What is CaraML?
    • Architecture
      • Feature Store Architecture
      • Models Architecture
      • Routers Architecture
      • Experiments Architecture
      • Pipelines Architecture
    • Core Concepts
      • Models Concepts
      • Router Concepts
      • Experiment Concepts
  • User guides
    • Projects
      • Create a project
      • Managing secrets
    • Feature Store
    • Models
      • Create a Model
        • Custom Model
      • Deploy a Model
        • Deploying a Model Version
        • Severing a Model Version
        • Configuring Transformer
          • Standard Transformer
            • Standard Transformer Expressions
            • Standard Transformer UPI
          • Custom Transformer
        • Redeploying a Model Version
      • Deleting a Model
      • Configuring Alerts
      • Batch Prediction
      • Model Schema
      • Model Observability
    • Routers
      • Creating a Router
        • Configure general settings
        • Configure routes
        • Configure traffic rules
        • Configure autoscaling
        • Configure experiment engine
        • Configure enricher
        • Configure ensembler
        • Configure logging
      • Viewing Routers
        • Configuration
        • History
        • Logs
        • More actions
      • Edit Routers
      • Monitoring router
        • Monitor Router Performance
        • Configure Alerts
      • Undeploying Router
      • Redeploying Router
        • Redeploy undeployed router
        • Redeploy version from history
        • Redeploy version from version details page
      • Deleting Router
        • Deleting router versions
        • Deleting router versions from details page
        • Deleting routers
      • Deleting Emsemblers
        • Delete an Ensembler without related entity
        • Delete an Ensembler with active entities
        • Delete an Ensembler with inactive entities
    • Experiments
      • View Experiment Settings
      • Modify Experiment Settings
      • Creating Experiments
      • Viewing Experiments
      • Modifying Experiments
      • Running Experiments
      • Monitoring Experiments
      • Creating Treatments
      • Viewing Treatments
      • Modifying Treatments
      • Creating Segments
      • Viewing Segments
      • Modifying Segments
      • Creating Custom Segmenters
      • Viewing Custom Segmenters
      • Modifying Custom Segmenters
    • Pipelines
  • Tutorial and Examples
    • Model Sample Notebooks
      • Deploy Standard Models
      • Deploy PyFunc Model
      • Using Transformers
      • Run Batch Prediction Job
      • Others examples on Models
    • Router Examples
    • Feature Store Examples
    • Pipeline Examples
    • Performing load test in CaraML
    • Best practice for CaraML
  • CaraML SDK
    • Feature Store SDK
    • Models SDK
    • Routers SDK
    • Pipeline SDK
  • Troubleshooting and FAQs
    • CaraML System FAQ
    • Models FAQ
      • System Limitations
      • Troubleshooting Deployment Errors
      • E2E Test
    • Routers FAQ
    • Experiments FAQ
    • Feature Store FAQ
    • Pipelines FAQ
    • CaraML Error Messages
  • Deployment Guide
    • Deploying CaraML
      • Local Development
    • Monitoring and alerting
      • Configure a monitoring backend
      • Configure an alerting backend
    • Prerequisites and Dependencies
    • System Benchmark results
    • Experiment Treatment Service
  • Release Notes
    • CaraML Release Notes
Powered by GitBook
On this page
  • Troubleshooting views
  • Known Errors
  • OOMKilled
  • Liveness or readiness probe failures
  • Image not found
  1. Troubleshooting and FAQs
  2. Models FAQ

Troubleshooting Deployment Errors

PreviousSystem LimitationsNextE2E Test

Last updated 1 year ago

This page discusses scenarios you may encounter during model deployment that will require troubleshooting, including:

  • Image building errors

  • Deployment errors

Troubleshooting views

Merlin provides the following views on the UI to troubleshoot a deployment:

  • Logs - the live console output when the image is building or the deployment is running

  • History - the list of deployment history status and message

You can navigate to these views from the Model Version page by clicking on the Logs tab or History tab.

Known Errors

OOMKilled

The "OOMKilled" error occurs when a container is terminated due to out-of-memory conditions. This typically happens when a container exceeds its allocated memory limit and the system is unable to provide additional memory. When this occurs, the container will be killed with exit code 137 to free up resources.

This error can affect both image building and deployment steps. To resolve the OOMKilled error, follow these steps:

  1. Check which components that got OOMKilled

  2. Check affected component memory limits

  3. Monitor memory usage

  4. Optimize model memory usage

  5. Adjust memory limits

Liveness or readiness probe failures

Liveness and readiness probes are essential for ensuring the health and availability of Model services. The liveness probe is used to determine if a model is initialized and running properly, while the readiness probe indicates if a model is ready to serve traffic. When these probes fail, it can lead to service disruptions and impact the overall stability of the application.

Troubleshooting steps:

  1. For standard model type, check pre-trained model size

  2. For pyfunc model type, check how model was initialized

  3. Inspect model logs

  4. Monitor resource utilization

Image not found

The "Image Not Found" error occurs when Merlin is unable to locate the specified container image. This can happen for various reasons, such as the image not being available in the specified registry, incorrect image name, or network issues preventing the image pull.

To troubleshoot and resolve the "Image Not Found" error, follow these steps:

  1. Verify image name and tag

  2. Check image registry

  3. Test image pull manually

Model Version's Logs & History tabs
Failed image building due to OOMKilled
Model service in crash loop backoff state
Image not found