Troubleshooting#
Actionable troubleshooting reference for every module. Jump to the section that matches your problem.
Cloud Connection Errors#
Azure Connection Errors#
Error code |
Cause |
Resolution |
|---|---|---|
|
The client secret has expired |
Rotate the Service Principal secret in Azure AD, then update credentials in Tenant → Connections → Update Secret |
|
Invalid scope or app not registered |
Verify the Application (Client) ID is correct; ensure the Service Principal is registered in the correct tenant |
|
The Client ID or Client Secret is incorrect |
Re-check credentials in Tenant → Connections → Update Secret |
|
User has not consented to the required API permissions |
Grant admin consent to the Reply CMP app registration in Azure AD |
|
The service principal lacks the required role on the subscription |
Assign the |
AWS Connection Errors#
Error code |
Cause |
Resolution |
|---|---|---|
|
The Access Key ID does not exist or has been deleted |
Generate a new access key in AWS IAM and update via Tenant → Connections |
|
The Secret Access Key is incorrect |
Verify the Secret Access Key and update via Tenant → Connections → Update Secret |
|
The AWS account is suspended or the key is invalid |
Verify the AWS account status and key validity in the AWS console |
|
Insufficient IAM permissions |
Ensure the IAM user/role has the required read policies — see Connect a Provider for the IAM policy template |
GCP Connection Errors#
Error code |
Cause |
Resolution |
|---|---|---|
|
The service account key is invalid, expired, or revoked |
Generate a new service account key in GCP IAM and update via Tenant → Connections |
|
The service account lacks sufficient project permissions |
Assign |
|
The GCP project ID in the connection configuration is incorrect |
Verify the project ID in Tenant → Connections → Open Details |
Discovery Errors#
Resources not appearing:
Check that the connection completed a successful sync: Tenant → Connections → Last sync status
Verify the resource type is in the supported coverage list: Supported Services
Check the cloud credentials have read permission for the resource type
Trigger a manual sync: Tenant → Connections → Launch Discovery
Discovery sync stuck or running for > 30 minutes:
This may indicate a credentials issue or a provider API throttle. Check the Audit Logs (Tenant → Auditing → Discovery tab) for error events. If no error is shown, contact your platform administrator.
“Not provisioned via CMP” label:
This is informational — it means the resource exists but was not created via the Reply CMP provisioning wizard. It is not an error.
FinOps Errors#
€0 / blank cost for a resource you know has spend:
Verify the connection has completed at least one cost ingestion (different from discovery sync): Tenant → Connections → Last cost refresh
Cost data is T-1 (previous day). Resources provisioned today will show €0 until tomorrow.
Some commitment types (Reserved Instances, Savings Plans) result in €0 marginal cost on covered resources — the cost appears on the reservation commitment line instead.
“Missing data” or gaps in cost charts:
Verify the billing period was active for that connection.
Some providers have billing data gaps for very new resources (< 24h old).
Cost export APIs occasionally have delays — they typically resolve within 24–48h.
Cannot create a budget:
Requires
FinOps.Budget / Writepermission — check your role.Each Group node can have only one budget — check if one already exists.
Forecast toggle is greyed out:
The forecast toggle requires Actual cost type and does not work with Amortised. Switch Cost Type to Actual in the Analyze or Assess view.
Provisioning Errors#
Deployment failed — “insufficient permissions”:
The cloud connection used for the deployment does not have enough permissions (typically Contributor is needed for resource creation). Verify the service principal’s role on the subscription/project in the cloud provider console.
Resource name already in use:
Some resource types require globally unique names (e.g., Azure Storage Accounts, AWS S3 buckets). Try a different name with a unique suffix.
Deployment stuck in “Deploying” state for > 15 minutes:
This may indicate a provider-side timeout. Check the raw Terraform output for the last error. If the Terraform process was interrupted, the deployment may need manual cleanup. Contact your platform administrator.
“AI Plan Summary not available”:
Occurs when Azure OpenAI is unavailable or rate-limited. The raw Terraform output is always available as a fallback. Retry the dry-run after a few minutes.
Automation Errors#
Policy did not fire at scheduled time:
Verify the connection used has active credentials: Tenant → Connections — check for red expiry chip.
Check the Execution History tab on the policy for error messages from the last run.
Remember schedules are in UTC — verify the cron expression produces the expected UTC time.
Resources showing “Failed” in execution history:
The cloud connection may lack the required write permissions (e.g.,
Microsoft.Compute/virtualMachines/start/actionfor Azure).The resource may be in a state that prevents the action (e.g., a VM being resized cannot be started or stopped).
GKE Autopilot Stop policy not working:
GKE Autopilot clusters are not supported for Stop policies. They manage their own node lifecycle. Use Standard GKE clusters if automation scheduling is required.
Monitoring Errors#
Dashboard shows a blank grey screen:
Third-party cookies from grafana.welkincmp.com are required. Enable them in your
browser settings. See Cloud Monitoring Dashboards.
“Unauthorized” persistent after waiting 5 minutes:
Contact your platform administrator — manual Grafana user provisioning may be required.
Dashboard shows “No data” for all panels:
Ensure the selected time range (in the Grafana panel top-right) covers a period with metrics.
Ensure Azure Monitor / CloudWatch / Cloud Monitoring is enabled and exporting metrics for your resources.
Allow up to 15 minutes after a new connection is added before metrics appear.
CMP Agent Errors#
“I don’t have access to that data” from the Agent:
Your user role does not include the required permission for that query domain. Example: a Discovery-only user cannot query cost data. Review your assigned roles in Tenant → Users.
Agent response seems outdated (references deleted resources, old figures):
The Agent operates on T-1 data. Resources deleted today will not be removed from the data until tomorrow. For cost figures, the most recent data is from the previous day.
Report generation fails or times out:
Retry after a few minutes — this may be a transient Azure OpenAI availability issue. If the problem persists, use FinOps → Reports for a scheduled report instead.
Agent returns a garbled table or formatting issue:
The chat interface renders plain text and Markdown tables. Very large datasets (hundreds of rows) may overflow the display. Ask the Agent to “limit results to top 10” for manageable output.