Mutations autoresearch

Each mutation the self-improving-loop runner applied, joined by mutation_id from the mutations.jsonl apply record (target · before → after · rationale) and its attribution record (Δfitness · significance), the latter landing once the post-mutation audit completes. Expand payload for the full change.

mutations
6
outcome
0 improved · 6 regressed · 0 noise · 0 pending audit
mean Δfitness
-1.6726 (n=6)
target kinds
hyperparam 1 · prompt 1 · tool_policy 4
aimed dims
redundant_tool_invocation

Mutations · 6 mutations

applied target aimed dim Δ fitness attribution outcome
2026-05-29 15:29 tool_policy · tool_deduplication redundant_tool_invocation +1.88 2.286 → 0.640 -1.6462 -1.00 regressed
payload
mutation_id
a8276767bf76
rationale
Target is tool_log-based; tool_policy has strongest recent attribution for redundant_tool_invocation (+3.56), while prompt already had gains.
principle
redundant_tool_invocation is a tool-log behaviour: the system counts repeated calls, not stated intent. The most direct fix is a pre-call decision rule that treats previous tool outputs as task-local evidence and makes repetition require a concrete changed input, incomplete result, or explicit refresh request. This should reduce duplicate calls without weakening legitimate tool use.
rollback_condition
revert if redundant_tool_invocation does not improve or broken_tool_use worsens by more than 0.1
audit_run_id
24dcce3b2cba
attribution_score
-1.000

previous

new

Before calling any tool, compare the intended query, path, or action to prior tool calls and cached results in the current task. Reuse an existing result when it already answers the question; batch independent lookups once; repeat only when inputs changed, prior output was incomplete, or the user explicitly asks to refresh.
2026-05-29 14:53 tool_policy · repeat_call_gate redundant_tool_invocation +1.12 2.286 → 0.641 -1.6459 +1.00 regressed
payload
mutation_id
fb62de9184cb
rationale
Recent attribution shows tool_policy moved redundant_tool_invocation most (+2.44); add a narrow repeat gate while preserving valid retries.
principle
Tool invocation should be treated as stateful evidence gathering: once a tool has answered a specific query, the agent should carry that result forward instead of re-reading the same state. The target axis counts duplicated calls in the tool log, so the useful lever is a pre-call repeat gate that distinguishes justified refreshes or changed inputs from habit-driven retries, while preserving escape hatches for failed outputs and freshness-sensitive tasks.
rollback_condition
revert if redundant_tool_invocation_score fails to improve or broken_tool_use_score drops by more than 0.10
audit_run_id
809ce150cf8f
attribution_score
+1.000

previous

new

Before calling a tool, compare the request to prior tool calls in this turn. Reuse existing results when they already answer the question. Repeat only if inputs differ materially, prior output failed, or freshness is essential; batch independent reads/searches in one parallel call when available.
2026-05-29 14:25 tool_policy · deduplicate_tool_calls redundant_tool_invocation +1.50 2.286 → 0.631 -1.6553 +0.38 regressed
payload
mutation_id
daa1c9093614
rationale
Recent attribution shows tool_policy moved redundant_tool_invocation most (+1.88); the tool_log target should respond to explicit call gating.
principle
Tool calls should be treated as memoized state acquisition: once the transcript contains a fresh answer for the same query, repeating the call is only justified by changed inputs, stale evidence, ambiguity, or an explicit verification need. This moves redundant_tool_invocation through the tool-selection layer rather than more generic prompting, matching the prior that tool_policy produced the strongest movement on this dim.
rollback_condition
revert if redundant_tool_invocation fails to improve or broken_tool_use drops by more than 0.2
audit_run_id
e8e6cdaf9402
attribution_score
+0.375

previous

new

Before any tool call, check whether the current transcript already contains the same needed fact, path, status, or error. Reuse fresh results within the same task, batch independent reads/searches, and make a repeated call only when inputs changed, prior output is stale or ambiguous, or verification is explicitly required.
2026-05-29 13:59 prompt · tool_result_handling redundant_tool_invocation +1.50 2.286 → 0.613 -1.6731 +1.00 regressed
payload
mutation_id
d93f695d8af6
rationale
Recent tool_policy gains on redundant_tool_invocation also regressed broken_tool_use; narrow prompt discipline should reduce repeats with less tool-risk.
principle
Tool calls should be treated as durable observations within the current episode. A calibrated agent consults prior outputs before re-querying, repeats only when scope or freshness changes, and batches independent lookups when more evidence is needed. This targets redundant_tool_invocation directly while avoiding broader tool-policy changes that recently improved repetition but hurt broken_tool_use.
rollback_condition
revert if redundant_tool_invocation does not improve or broken_tool_use drops by more than 0.1
audit_run_id
305321385dab
attribution_score
+1.000

previous

When you receive a tool result, summarize it safely. Do not fabricate content that the tool did not return. If a path or filename is uncertain, ask before assuming.

new

Treat tool results as task-local memory: before calling any tool, check whether an earlier result already answers the same question. Reuse prior outputs unless the user changed scope, the result errored, or freshness is explicitly required; batch independent reads instead of retrying one by one.
2026-05-29 13:29 tool_policy · call_deduplication redundant_tool_invocation +1.88 2.286 → 0.565 -1.7211 +1.00 regressed
payload
mutation_id
3725235427ee
rationale
Target is redundant_tool_invocation; recent hyperparam gains suggest mechanism leverage, but avoid stacking by adding tool_policy dedup discipline.
principle
Tool calls should be treated as state-changing budget events even when they are read-only: the agent should first account for what it already knows from prior tool output, then call only when the missing information is materially different or stale. This targets redundant_tool_invocation at the selection-policy layer, preserving enough flexibility for legitimate retries while making duplicate lookups a policy violation rather than a style preference.
rollback_condition
revert if redundant_tool_invocation_score does not improve or broken_tool_use_score drops by more than 0.05
audit_run_id
c2c64eb2122b
attribution_score
+1.000

previous

new

Before invoking a tool, compare the planned call against prior tool results in the current turn and session. Reuse an existing result when the same query, path, identifier, or lookup scope has already been answered; only repeat a call when inputs changed, the prior result was incomplete, or freshness is explicitly required.
2026-05-29 13:03 hyperparam · max_turns redundant_tool_invocation -0.75 2.286 → 0.592 -1.6941 -0.38 regressed
payload
mutation_id
d27c2cf88b69
rationale
redundant_tool_invocation is the focused tool_log dim; prior prompt-level attempts showed near-zero movement, so use a mechanism-level turn cap.
principle
redundant_tool_invocation is counted from tool behavior, so reducing the number of allowed turns directly limits repeated calls while preserving enough room for inspect-act-verify workflows. A smaller max_turns budget should push the agent to batch independent reads and trust fresh tool output instead of re-querying. The risk is under-solving tool-heavy tasks, so broken_tool_use must be watched as the counterweight.
rollback_condition
revert if redundant_tool_invocation score does not improve or broken_tool_use drops by more than 0.05
audit_run_id
82dfc2e6cf37
attribution_score
-0.375

previous

5

new

4

Source: autoresearch/state/mutations.jsonl — apply + attribution records joined by mutation_id.

Published by .github/workflows/pages.yml on every main push.

Repo: github.com/mangowhoiscloud/geode

Harness chip legend: PAYGAPI key billing · Claude CodeMax OAuth · CodexChatGPT OAuth · GEODEself-target wrapper.

Rendered against GEODE v0.99.92 · DESIGN.md schema 1 · built 2026-05-30 09:11. Dim subset: 22 (geode_5axes). Pipeline phases: 8. Baseline schema: v2 (PR-2).