Scripted-bot softlock root cause: client-stocked SELECT_SKILL_URI /
SLIDE_OBJECT_URI hand emits (e.g. target selection on unit play / leader
attack) arrive as SIO BinaryEvent("hand", ...) with an ack-id. Our
DispatchSocketIo only had cases for "msg" and "alive" — "hand" fell to
the default Debug-drop with no SIO ack going back. Client's
stockEmitMessageMgr (RealTimeNetworkAgent.cs:1463) blocks subsequent
emits until the previous one is acked, so all follow-up PlayActions /
TurnEndActions / TurnEnd frames were stocked but never transmitted. The
loader hooks at EmitMsg (intent) not the socket layer, which is why
battle-traffic.ndjson shows the frames as sent while the server never
received them. ~10s later the client gives up and aborts the WS.
Wire-shape proof from data_dumps/captures/logs/websocket_output.txt:
line 619: [sio-in] uri=TurnStart pubSeq=17 ackId=16 ... (T3 start)
line 689: [ws-rx-text] preview=451-26["hand", {...}] ← unhandled
line 691: [ws-rx-bin] binLen=58 pendingFrame=hand
(no further [sio-in] entries — server received nothing else)
line 709: [ws-recv-exit] reason=OperationCanceled wsState=Aborted
New HandleHandEventAsync (RealParticipant.cs):
- Fire-and-forget hand frames (no ack-id; TOUCH_URI / SELECT_OBJECT_URI /
TURN_END_READY_URI) are silently swallowed — no queue-blocking risk
- Stocked hand frames decode the binary attachment via the same
msgpack-string + NodeCrypto.Decrypt pipeline as HandleMsgEventAsync,
parse the JSON, extract top-level "pubSeq", and SendSioAckAsync with
that pubSeq as the ack arg (matches what stockEmitMessageMgr.GetSelectData
expects to look up)
- Body shape is {"StockHandData":[uri_int, viewerId, udid, ...params,
pubSeq], "try":0, "pubSeq":N} — NOT a MsgEnvelope (no top-level "uri"),
so we can't reuse HandleMsgEventAsync as-is
- Missing-pubSeq fallback acks with arg=0 (rare path, logged at Warning)
so we never softlock from a malformed body
WireConstants gets the HandEvent = "hand" constant for the dispatch case.
In scripted/Bot mode the ack-only handler is correct (no opponent to
forward touches to). PvP-side forwarding semantics are unverified — see
docs/audits/battle-node-sio-events-2026-06-02.md (outer repo) for the
full event inventory and remaining gaps.
Tests:
- RealParticipantHandEventTests covers the three paths: stocked-with-ack,
fire-and-forget (no ack expected), missing-pubSeq fallback (arg=0). Each
drives a real hand frame through RunAsync via TestWebSocket and asserts
the SIO ack frame shape (43<ackId>[<arg>]) in outbound sends.
- 175 battle-node tests passing (was 172; +3 new). Full suite green.
Diagnostic logs ([sio-in] / [sio-out] / [ws-rx-text] / [ws-rx-bin] /
[ws-recv-exit] / [ws-loop-exit]) are left in place for one verification
cycle. After a live re-run confirms the fix, they should be stripped per
the audit doc's recommended-order step 2.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The PvP waiting-room timeout path in BattleNodeWebSocketHandler used to
return immediately after RemovePending, leaving the parked first arriver
to learn about the disconnect via TCP teardown after Kestrel finished
draining the request. BestHTTP / socket.io-client log that as an abrupt
drop rather than a controlled disconnect.
New TryPoliteCloseAsync helper emits an EIO "1" (Close) text frame, then
runs the WebSocket close handshake with NormalClosure. Wrapped in
try/catch + Debug log — teardown races between the server-side close and
client disconnect are routine and not actionable. Uses a fresh 5s CTS so
ctx.RequestAborted being canceled doesn't skip the close.
Wired into both bail-out paths post-AcceptWebSocketAsync that previously
just returned:
- PvP waiting-room timeout / Park-Park race (the main case, per PLAN.md
L104 (c))
- Unknown BattleType default case (same shape, log message already said
"closing WS" but didn't actually close — opportunistic fix)
PvpWaitingRoomTimeout integration test tightened: now asserts the polite
"1" text frame arrives before the close handshake, not just that the WS
eventually closes by any means.
172 battle-node tests passing (was 172 before the assertion tightening;
the existing timeout test stayed in.)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes audit Md11. BattleSession.RunAsync now clears each
RealParticipant.Outbound archive immediately before the TerminateAsync
cascade, releasing the heavy dict the moment the battle ends instead of
waiting for the participant to be GC'd. Bots (NoOp / Scripted) don't
expose an OutboundSequencer, so the 'p is RealParticipant rp' conditional
cast is the natural filter.
Tests: 1 new BattleSessionTerminateCascadeTests — pre-load the archive,
drive RunAsync to completion via TestWebSocket.CompleteIncoming, assert
the archive is empty. Suite: 939 → 948.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Audit Md11 (part 2 of 2). Adds an explicit Clear() so BattleSession can
release the archive at battle-end instead of waiting for the participant to
be GC'd. _next is intentionally NOT reset — a post-Clear emit is a bug per
the design, but the seq stream must stay monotonic if it does happen.
Tests cover empty archive after Clear, _next preservation across Clear,
and Clear-on-empty no-op. The BattleSession integration that calls Clear
lands in the next commit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Audit Md11 (part 1 of 2). Replace unbounded HashSet<long> _seen with a
WindowSize=256 ring (HashSet + Queue, LRU eviction). The stale-below-window
guard (pubSeq <= HighWaterMark - WindowSize) prevents window eviction from
re-admitting old seqs as novel — the load-bearing invariant.
pubSeq is client-monotonic and SIO retransmit horizons are seconds-scale, so
256 covers realistic retries by a wide margin. HighWaterMark semantics
preserved (Gungnir still reports it).
Tests: 5 new InboundTrackerTests covering below-window guard, evicted-seq
rejection, within-window dedup after eviction, memory bound, and watermark
monotonicity.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Live-smoke bug 2026-06-02: queued Bloodcraft (deck #5), wire showed
classId=2 (Swordcraft) for self_info on the /ai_unlimited_rank_battle/start
response — client rendered the wrong leader.
Two layers of the same bug:
1. MatchContextBuilder.BuildForRankBattleAsync hardcoded deckNo=1 instead
of taking it from the do_matching request — verified against
data_dumps/captures/traffic.ndjson L17 where deck_no=5 was on the wire.
Signature changes to (viewerId, format, deckNo); DoMatchingInternal
passes req.DeckNo.
2. AiStartInternal rebuilt MatchContext from scratch — but the /ai_*/start
request body is BaseRequest only, no deck_no on the wire. The fix uses
the MatchContext the bridge already stored at do_matching resolution time
(in the Bot PendingBattle), so deck/cosmetic data is consistent end-to-end.
New IBattleSessionStore.TryFindPendingForViewer(viewerId) finds the
viewer's pending battle for lookup. The store entry persists across
ai_start (idempotent reads are fine — the WS handler removes on connect).
No-pending sentinel: ai_id=-1 surfaces the "no AI assigned" error in the
client.
Tests: 936 → 939 passing.
- MatchContextBuilderTests.BuildForRankBattle_uses_the_caller_supplied_deck_number
seeds deck #1 (class 1) and deck #5 (class 6) and asserts the deckNo
argument picks the right one.
- RankBattleControllerTests.AiStart_self_info_class_matches_queued_deck_number
is the end-to-end regression: register Bot battle with deck #5, hit
/ai_unlimited_rank_battle/start, assert self_info.classId == 6.
- RankBattleControllerTests.AiStart_without_pending_battle_returns_neg1_sentinel
locks the defensive ai_id=-1 path.
- Existing AiStart_* tests bypass do_matching, so adapted to call a new
RegisterBotBattleAsync helper that mirrors what InProcessPairUp does on
AI-fallback resolution.
SeedDeckAsync gains an optional classId so test cases can differentiate
decks by class (was always picking Classes.First()).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous commit 51e9dd2 changed Bot's InitBattle to push Matched and
Loaded to push BattleStart+Deal, on the theory that the architecture
spec's "no Matched in Bot mode" claim was wrong. That theory was based
on misreading Matching.cs:400 (the Matched handler) as a required
state-machine trigger.
End-to-end trace of the AI client flow shows:
1. _initNetworkSuccess (set when the client receives uri=InitNetwork,
i.e., our ack) is the actual trigger — MatchingNetworkConnectChecker
phase 3 sees it and calls MatchingInitBattle.
2. MatchingInitBattle (Matching.cs:298) for IsAINetwork IMMEDIATELY
calls StartBattleLoad + GotoBattle right after emitting InitBattle.
It does NOT wait for any wire envelope.
3. The Matched handler at Matching.cs:400 is gated on
status == Connect and is already past Prepared by the time the
wire round-trip completes — sending Matched is harmless but
unnecessary.
4. The BattleStart handler at Matching.cs:417 runs UNCONDITIONALLY and
SetNetworkInfo at RealTimeNetworkAgent.cs:1562 overwrites
OppoBattleStartInfo with the wire envelope's oppoInfo. Our oppoInfo
comes from NoOpBotParticipant.Context placeholders (classId/emblemId
etc. = 0), corrupting the good values the client set from the HTTP
AIBattleStart response.
The "Waiting for opponent" hang was caused by SBattleLoad.LoadOpponentAssets
trying to fetch emblemId=0, degreeId=0, etc. after BattleStart corrupted
OppoBattleStartInfo. The asset group load silently hangs on missing
assets, no error logged.
Restored the spec's original Bot arms: InitBattle ack-only, Loaded silent,
TurnEnd Judge-to-sender. ai-passive.md updated with the corrected reasoning
and a discovery-history note.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 3 shipped a Bot dispatch table that ack'd InitBattle without
pushing Matched and stayed silent on Loaded, per the architecture spec's
inference that "the client uses AIBattleStart HTTP data instead of
Matched in Bot mode." That inference was wrong.
The client's matching state machine (Matching.ReactionReceiveUri,
Matching.cs:400) gates StartBattleLoad() on the Matched envelope, and
BattleStart at Matching.cs:417 triggers GotoBattle. Without those
envelopes the client never transitions out of MatchingStatus.Connect —
which renders as the "Waiting for opponent" hang on the loading screen.
AIBattleStart HTTP only provides opponent cosmetics, not state-machine
triggers.
Fix: drop the Bot-specific InitBattle ack-only and Loaded silent arms;
let Bot fall through to the existing handshake arms that push Matched
and BattleStart + Deal. Only TurnEnd stays Bot-specific (Judge to
sender, not broadcast — there's no real other side to broadcast to).
Tests updated to match the corrected contract. ai-passive.md doc
amended with a correction note.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Single-client end-to-end Bot lifecycle: InitNetwork → ack, InitBattle →
ack (no Matched), Loaded → silent, Swap → SwapResponse + Ready,
two TurnEnd cycles each producing a single Judge frame back to sender,
Retire → BattleFinish. Pending battle evicted at session start.
Closes Phase 3 — battle-node v2's three-phase migration (Scripted → PvP →
Bot) is now complete. Test budget: 884 → 931 (+47 across Phase 3).
Next: matching-queue API rewrite + real rank progression, as separate
specs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three new arms gated on Type == BattleType.Bot, placed before the
existing PvP / Scripted arms:
- InitBattle → ack to sender (no Matched push — client uses AIBattleStart HTTP data)
- Loaded → silent (no BattleStart, no Deal — client short-circuits to GotoBattle)
- TurnEnd / TurnEndFinal → Judge to sender only (not broadcast)
Other URIs in Bot mode fall through existing arms: Swap is Type-agnostic
(per-sender SwapResponse + Ready), Retire/Kill hits the existing
Scripted no-contest BattleFinish(Win), gameplay forwarders are gated on
Pvp so Bot's PlayActions/Echo/etc. fall through default (drop). 8 new
dispatch tests cover the wire contract.
Reference: docs/api-spec/in-battle/ai-passive.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Four end-to-end tests against two parallel RawSocketIoTestClients:
handshake to AfterReady on both sides with per-perspective Matched;
TurnEnd broadcast to both sides + Judge; A's PlayActions forwarded to
B; Retire flipped to Lose-for-sender, Win-for-other; A's abrupt WS
close cascades to BattleFinish(Win) for B with PendingBattle eviction;
waiting-room timeout closes the first arriver's WS (fallback long-wait
path — the 60s default is left in place; TestServer-side WS close is
observed via ReceiveAsync returning Close or throwing).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
RealParticipant gains _sessionFinished TCS + MarkSessionFinished /
AwaitSessionFinishedAsync. PvP first-arriver's handler awaits the
signal instead of calling self.RunAsync (which the session does
internally on the same instance — double-call would race the WS read).
BattleSession.RunAsync branches on Type: Pvp uses WhenAny + synthesize
BattleFinish(Win) to survivor + WhenAll(drain); Scripted/Bot keep
Phase 1's WhenAll-everything semantics. Disconnect cascade now drives
end-of-battle when a WS drops without a graceful Retire.
Per-BattleId slot keyed dict. Pair returns the first arriver to the
second; ParkAsync awaits a TCS and returns the second arriver. Timeout
defaults to BattleNodeOptions.WaitingRoomTimeout (60s); evict on timeout
keeps the dict clean. Singleton in DI; consumed by the handler in the
next task.
Pvp TurnEnd/TurnEndFinal broadcasts TurnEnd+Judge to BOTH so each
client's JudgeOperation advances. Pvp Retire/Kill pushes BattleFinish
with flipped result (sender=Lose, other=Win). Scripted Retire keeps
Phase 1 behaviour (sender-only Win via BuildBattleFinishNoContest).
TurnStart / PlayActions / Echo / TurnEndActions / JudgeResult from a
real participant in Pvp mode forward to the other participant once
BothAfterReady. Scripted's bot-burst case arms (gated on
FakeOpponentViewerId) precede the PvP forwarder so they're unaffected.
The bot-emission TurnStart/TurnEnd/Judge guard was tightened from
`ReferenceEquals(from, A or B)` (always true) to call
IsRealForwardableFromScripted directly in the `when` clause. The prior
shape used `goto default` to drop non-bot senders, which would have
short-circuited the new PvP forwarder for TurnStart in PvP mode.
Tests assert that for Type=Pvp, A's InitBattle gets Matched with A's
ctx as selfInfo and B's ctx as oppoInfo, and symmetrically for B. Same
for Loaded/BattleStart. Swap stays per-sender (each runs their own
mulligan).
ComputeFrames now reads (from as IHasHandshakePhase)?.Phase for the
four handshake arms (InitNetwork, InitBattle, Loaded, Swap) and the
TurnEnd gate, transitioning the participant's Phase instead of the
session's. RealParticipant implements IHasHandshakePhase via the new
Phase property; the session-level BattleSession.Phase stays for the
Terminal short-circuit.
Scripted dispatch + wire shape unchanged (single-Real-participant case
collapses to Phase 1 semantics). Test fixture migrates FakeParticipant
to FakeRealParticipant for the side that drives handshake states. The
bot's TurnEnd previously rode the session-level AfterReady arm; with
that arm now gated on the sender's per-participant Phase (which the
bot lacks), TurnEnd joins TurnStart/Judge in the scripted-bot
forwarder arm so the v1.2 burst still reaches the real participant.
Internal setter; defaults to AwaitingInitNetwork. PvP needs A and B to
progress through the handshake states independently, which the
session-level BattleSession.Phase can't model. Session migration to read
realFrom.Phase is the next task.
Both helpers now take the opponent's MatchContext + an explicit seed
instead of pulling ScriptedProfiles.OpponentMatchedProfile / OpponentBattleStartProfile
internally. ScriptedBotParticipant.Context fixture absorbs the cosmetic
fields previously hardcoded in ScriptedProfiles so Scripted's wire bytes
stay identical - verified by integration tests still green.
Phase 2 prep: PvP arms will call the same helpers with the real opponent
participant's Context.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Old single-WS BattleSession + its dispatch/pump/ClipAckArg tests are
obsolete after the Task 9 handler cutover. ClipAckArg overflow + boundary
coverage moved into RealParticipantTests. BattleSessionV2 renamed back
to BattleSession; the V2 suffix was a placeholder during the parallel
-build refactor.
Mirrors v1.2's BattleSessionDispatchTests but asserts on (target, frame,
noStock) routing tuples returned by ComputeFrames. Covers InitNetwork
ack, InitBattle/Loaded/Swap server-synthesized broadcasts (to the real
participant only in Scripted mode), TurnEnd forwarding to the scripted
bot, scripted-bot-emitted frames routing back to the real participant,
Retire/Kill BattleFinish path, and out-of-order frame drops.
Lifts the WS read loop, SIO encode/decode, per-WS OutboundSequencer +
InboundTracker, and SIO ack out of BattleSession into a participant.
PushAsync(noStock=false) assigns playSeq via the sequencer; noStock=true
bypasses it. FrameEmitted fires on each deduplicated inbound envelope.
The existing BattleSession keeps its own copy of the WS code for now;
Task 9 cuts the handler over to use BattleSessionV2 + RealParticipant
and Task 10 deletes the old BattleSession + duplicate code.
PushAsync(TurnEnd|TurnEndFinal) fires FrameEmitted three times:
OpponentTurnStart + OpponentTurnEnd + OpponentJudge. Behaviour-identical
to the v1.2 case arm in BattleSession.ComputeResponses; just repackaged
as a participant. Other URIs are swallowed. Used by Phase 1 to preserve
v1.2 behaviour under the new abstraction; replaces the case-arm logic
in BattleSession in Task 7.
Silent participant for the Phase 3 Bot type. PushAsync swallows;
FrameEmitted never fires; RunAsync completes immediately. ViewerId is
the existing FakeOpponentViewerId const for consistency with scripted
lifecycle builders. Three tests lock the no-op contract.
Both single-cycle and consecutive-cycles tests now assert the v1.2
three-frame burst (TurnStart + TurnEnd + Judge). Currently failing —
ComputeResponses still pushes only two frames. Implementation follows.
Adds the third frame of the burst. Wire shape from prod (spin + resultCode).
OpponentJudgeSpin const next to OpponentTurnStartSpin for consistency.
Single test locks uri, ViewerId, Cat, and body shape.
Final-review follow-ups:
- BuildOpponentTurnStart's doc comment claimed the v1 client sits
indefinitely — true before the loop closure, false after. Updated
to describe the pair with BuildOpponentTurnEnd.
- TypedBodyWireShapeTests had no coverage for BuildOpponentTurnEnd;
added the literal-JSON test so a future JsonPropertyName rename
on TurnEndBody is caught.
End-to-end through the real WS pump: after Ready, the test sends two
consecutive TurnEnd msgs and asserts the server pushes
TurnStart+TurnEnd for each. Exercises OutboundSequencer's playSeq
assignment across multiple cycles.
Locks the loop invariant: after the first cycle the phase resets to
AfterReady, so the next player TurnEnd matches the same case arm and
produces the same two-frame burst.
Replaces the v1.0 single-envelope/OpponentTurn-phase invariant with
the v1.1 two-envelope/AfterReady invariant. Currently failing —
ComputeResponses still does the v1.0 thing. Implementation follows.
Seeds a viewer + completed TK2 run, drives the WS handshake to Matched, and
asserts every cardId in selfDeck matches the run's SelectedCardIdsJson. Read
from RawBody (codec's wire-form deserialization) — not from MatchedBody —
since the test client gets the JSON-roundtripped envelope.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ClassId/CharaId/CardMasterName/BattleType flow from ctx. PlayerBattleStart
Profile removed; Rank/BattlePoint remain as standalone consts pending real
per-viewer rank tracker. One test updated, one new test added.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
selfInfo cosmetics + 30-card selfDeck now read from MatchContext. Opponent
half stays in ScriptedProfiles. DummyCardId / BuildDummyDeck / PlayerMatched
Profile removed. Two new tests lock the deck-idx pairing and cosmetic
flow-through; TypedBodyWireShapeTests + lifecycle tests thread a fixture ctx.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
IMatchingBridge.RegisterPendingBattle now takes a MatchContext; PendingBattle
carries it; BattleSession stores it. ArenaTwoPickBattleController builds ctx
from IMatchContextBuilder. ScriptedLifecycle still uses ScriptedProfiles for
the player half — Tasks 5/6 migrate the lifecycle.
Existing tests updated: MatchingBridgeTests, BattleNodeFlowTests,
InMemoryBattleSessionStoreTests, BattleSessionDispatchTests, BattleSession
PumpTests, ArenaTwoPickBattleControllerTests (which now seeds a TK2 run +
adds a no-active-run 400 case).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Client RealTimeNetworkAgent.SetNetworkInfo iterates the synchronize-data
dict in insertion order. The "uri" key, when recognized as Matched, calls
GameMgr.InitializeSelfInfo which sets _selfDeck = null. Any "selfDeck"
processed before "uri" gets wiped; Matching.StartBattleLoad then crashes
on null.Select(...). Pre-refactor ToJson built a Dictionary envelope-first
then appended body keys, so the bug never surfaced. The typed-body rewrite
inverted the order — restoring envelope-first matches the prod wire.
Regression test BuildMatched_KeyOrder_PutsUriBeforeSelfDeckAndSelfInfo
locks the contract.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>