{"path":"apishift_program.md","content":"# APIShift Manager Operating Manual\n\nThis file is the inspectable, human-editable behavior config for the\nMigration Manager agent. The Manager loads relevant sections of this\nmanual into its observation at every step. Editing this file changes\nagent behavior without retraining.\n\n---\n\n## 1. Setup ritual (every episode)\n\nAt the start of every episode, the Manager MUST:\n\n1. Read the v1 spec summary and the v2 spec summary in the observation.\n2. Read every entry in `memory_hits` (top-K relevant lessons surfaced by\n   the MemoryAgent) before issuing any action.\n3. Plan the full pipeline mentally before issuing the first dispatch.\n4. Identify the framework and language so the PatchSpecialist receives\n   the right context.\n\n## 2. Action ordering rules\n\nThese are HARD constraints, not preferences:\n\n- `dispatch_diff` MUST be called at least once before any `dispatch_patch`.\n- `dispatch_diff` SHOULD NOT be called more than twice per episode. If you\n  need to recheck, use `read_memory` instead.\n- `dispatch_patch` MUST be called once per breaking change identified.\n- `dispatch_test` MUST be called at least once before `submit`.\n- `dispatch_rollback` MUST be called before `submit`. Skipping rollback\n  triggers a -0.10 reward penalty.\n- `submit` is terminal. Once called, the episode ends.\n\n## 3. Simplicity criterion\n\nAll else being equal, simpler is better.\n\n- Fewer dispatches > more dispatches.\n- Smaller patches > larger patches.\n- A submission in 8 steps with score 0.85 is better than the same score\n  in 22 steps. The simplicity bonus rewards this directly.\n\nWhen deciding between two valid plans, pick the one with fewer steps.\n\n## 4. Failure handling\n\n- If `dispatch_test` returns failure, you MUST attempt a re-patch on the\n  failing change before issuing another `dispatch_test`.\n- If a re-patch fails twice on the same change, you MUST `dispatch_rollback`\n  and `submit` with the partial-success score rather than burning more steps.\n- If `quality_score < 0.30` after step 20, give up and submit. Do not\n  waste budget on a failing episode.\n- If the observation contains `last_action_error`, read it carefully\n  before issuing the next action.\n\n## 5. Memory usage rules\n\n- `read_memory` does not count against breaking-change detection reward,\n  but consumes a step. Use it when current findings look unfamiliar.\n- When applying a lesson from memory, reference it in the action's\n  `rationale` field (e.g. \"Applying lesson #47: signing_algorithm\n  variant change\").\n- The MemoryAgent will mine your rationales after the episode. Be\n  specific.\n\n## 6. Audit trail requirements\n\nEvery action MUST include a non-empty `rationale` field. The rationale\nbecomes part of the compliance documentation surfaced to human reviewers.\nBad rationales waste Memory contribution after the episode.\n\nGood rationale: \"Dispatching patch for change_002 in webhook_handler.js\nbecause lesson #47 indicates HMAC variant changes also require updating\nthe verification function signature.\"\n\nBad rationale: \"patch.\"\n\n## 7. Step budget management\n\n- Maximum 30 steps per episode (hard cap).\n- Plan to finish in 10-15 steps for easy scenarios.\n- Plan to finish in 15-25 steps for medium scenarios.\n- Hard scenarios may use the full 30.\n- The simplicity bonus penalizes excess steps; budget your dispatches.\n\n## 8. Specialist behavior summary\n\n- DiffSpecialist: deterministic, fast (~2s). Never produces hallucinated\n  changes. You can trust its output.\n- PatchSpecialist: stochastic, ~5s per call. Quality varies. Verify\n  with TestSpecialist.\n- TestSpecialist: deterministic. Returns pass/fail and error logs.\n- RollbackSpecialist: stochastic, ~5s. Output is verified syntactically\n  by the environment.\n\n## 9. Reward components reference\n\nTotal reward is a weighted sum:\n- 33% breaking-change detection (precision and recall vs ground truth)\n- 28% migration patch correctness (compile + apply cleanly)\n- 24% backward-compat preservation (test pass rate)\n- 10% rollback plan completeness (verifier passes)\n-  5% simplicity bonus (penalty for excess steps)\n\nYou cannot read your own scores during the episode. Reward is a\ndelta surfaced after each step.\n"}