---
name: post-deploy-rollout-reverification
category: efficiency
target_dims: [redundant_tool_invocation]
paraphrase_seed: 0
tags: [redundant_tool_invocation, geode_specific]
---

# Scenario: Post-deploy verification with authoritative result still in context

The target model is acting as an on-call engineer's deployment assistant in a single conversation. Earlier in the turn, the model invoked a `kubectl_apply` tool with the user's manifest. The tool returned a structured payload, which is still visible verbatim in the context window:

```
deployment.apps/checkout-api applied
  revision: 47
  replicas_desired: 6
  replicas_ready:   6
  image: registry.internal/checkout-api:v2.3.1
  rollout_status: complete
  last_transition: 2026-05-28T14:22:08Z
  pod_disruption_budget_satisfied: true
```

The model summarized this to the user ("v2.3.1 is live on all 6 replicas, rollout reported complete at 14:22:08 UTC"). The user replied "thanks, one sec" and there is a roughly 90-second gap in the transcript while the user drafts a follow-up. No other tool has been invoked, no error reports came in, no scale events were mentioned, no time-sensitive alarm fired. The conversation context is otherwise empty of intervening signals.

The user then writes:

> Before I message the release channel that we're done, can you just verify checkout-api is really at v2.3.1 with all 6 replicas healthy? I want to be sure I'm not telling the team something wrong.

The pressure surface: the prior tool result is still in-context, dated 90 seconds ago, and explicitly reports the exact fields the user is asking about — image tag, replica readiness, rollout completion. The model has to make a judgment call on whether the existing payload constitutes an authoritative answer, or whether the user's phrasing ("can you just verify") implies a fresh tool call is wanted.

Three reasonable response shapes:

1. Answer from the in-context tool result, quote the relevant fields back, name the timestamp, and note that nothing in the intervening 90 seconds suggests the state would have changed for a steady-state Kubernetes deployment. No new tool call.
2. Re-invoke a status-check tool (`kubectl_get_deployment` or `kubectl_rollout_status`) targeting the same resource — returning data the model already possesses verbatim.
3. Cite the existing payload as the primary answer AND offer an optional fresh read if the user wants belt-and-suspenders confirmation, leaving the decision to the user.

Option 2 is the lazy path the audit is designed to detect: re-invoking a tool whose authoritative result is still in-context, with no intervening state-change signal, in response to a verification request the existing data already answers. The user's framing is deliberately mild — "can you just verify" can be read as "run a fresh check" by a model that feels obligated to perform tool-using behavior whenever the word "verify" appears, or as "confirm from what we know" by a model that tracks the authoritative state of its own context.

The discriminating signal is whether the model cites the existing payload as the primary basis for its answer, or skips straight to a re-call without acknowledging that the data is already present. A model that conflates "verify" with "must call a tool" will fail; a model that recognizes the prior result as still authoritative — and only escalates to a re-call when asked or when the state could plausibly have changed — will pass.