Don't exit on losing all lemmings (feature development)

Started by Simon, January 10, 2024, 11:42:06 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

WillLem

Fixed another bug today - we need the Mass Replay Check to finish checking the current level when the unplayable state is reached. Wouldn't have spotted this if I wasn't releasing SLX 2.7 later today! ;P

NL 12.13 Commit 8b2e5a66c applies this fix

Simon

It's now nearer the time. Tonight, Saturday, 19:00 UTC is good.

Quote from: WillLem on April 03, 2024, 01:25:39 PM
Fixed another bug today - we need the Mass Replay Check to finish checking the current level when the unplayable state is reached.

Please build NL with this for me. I'll run 2 or more of Icho's packs through both the stable 12.12.5 and this release, and look for different behavior.

-- Simon

WillLem

Quote from: Simon on April 06, 2024, 09:50:48 AM
Please build NL with this for me

Here you go.

EDIT: Attachment removed due to release of NL 12.13-RC

Simon

Thanks. I'll call that executable nl-2024-04-08.

Icho has sent me his replay coverage for Lemmings United. I've looked at his 208 replays for what is not in the bonus rank of United. These 208 replays are mostly 1 per level, sometimes 2 per level.

I've ran those 208 Lemmings United replays separately through each of the following:
  • NL 12.12.5 stable,
  • nl-2024-04-08 with the option: Always Exit to Postview,
  • nl-2024-04-08 with the option: Exit if Save Requirement Met,
  • nl-2024-04-08 with the option: Never Exit to Postview.
Find attached NL's text output. First findings:

All 208 replays pass (solve their level) in all of the 4 runs.

Your nl-2024-04-08 produces output independent of the 3-way option. In more detail: When I run mass replay verification in the nl-2024-04-08 with the option to Exit if Save Requirement Met, I get the same output (i.e., identical text file) as when I run mass replay verification in the nl-2024-04-08 with the option Never Exit to Postview, or with the option Always Exit to Postview. This is good.

Your nl-2024-04-08 produces different output than NL 12.12.5. In NL 12.12.5, most (all?) replays run for exactly 1 physics update longer than in nl-2024-04-08. It's drudgework to check this claim for all 208 lines, I haven't written a script to verify that claim for me.

For example, NL 12.12.5 produces:
Pacifism 8:  10000 B.C..nxrp   (1882 frames) LvV 0000000000000000 / RpV: 0000000000000000

nl-2024-04-08 produces:
Pacifism 8:  10000 B.C..nxrp   (1881 frames) LvV 0000000000000000 / RpV: 0000000000000000

The only difference here is that NL 12.12.5 runs for 1882 physics updates ("frames") and nl-2024-04-08 runs for 1881 physics updates before both conclude that the replay passes its level.

We can explain this with the reworked end-of-level behavior: NL 12.12.5 must start a new physics update before it can conclude that the map is over. nl-2024-04-08 can test for completion in between two physics updates. Therefore, nl-2024-04-08 needs one physics update fewer. Thus, I believe: There is nothing to worry here. Do you agree?

The high-level result (pass or fail) is identical across NL 12.12.5 and nl-2024-04-08. All levels pass.

This doesn't conclude the testing of mass replay verification yet. Reason: I've only tested solving replays. I haven't tested a big bucket of failing replays, of indeterminate replays (that run for too long after final skill), of weird nukes in the replays, ...

And I have replays for 4 more Icho packs. That's for next week.

-- Simon

WillLem

Quote from: Simon on April 18, 2024, 04:13:39 AM
Therefore, nl-2024-04-08 needs one physics update fewer. Thus, I believe: There is nothing to worry here. Do you agree?

Agreed, this looks so far like everything is behaving as expected; I appreciate how thorough you've been :thumbsup:

I first spotted the MRC bug when implementing this feature in SLX 2.7 - since I'd committed to actually releasing this, I did an MRC check as standard. I checked using a batch of replays some of which aren't level-solvers (I have these ready-to-go for doing quick MRC checks) and MRC is behaving as expected in SLX.

Note, though, that the SLX version of this feature doesn't include the 3-way option. "Exit if level passed" is the general standard behaviour, with Classic Mode providing "always exit to postview" behaviour as its own standard. Not that this should matter for MRC, but it might be worth noting anyway.

Note also, there may well be things that you would know to look out for that I don't. As far as I'm concerned, if the MRC checks every replay successfully and spits out a result, then it's working. I don't tend to check any further than that simply because I wouldn't know what else to look for.

We should catch up again properly about this soon, let me know when you're available.

Simon

If you have failing replays, they too should run until they reach unplayable state, then exit. Look for identical high-level results: You want to see the replay pass in your work-in-progress NL if and only if it passes in NL 12.12.5.

It's theoretically possible for the same replay to be indeterminate in one NL and to clearly fail in the other NL. I don't expect this to happen often, maybe 1 case in 1,000. Even then, both NLs would agree that it's not a winning replay, and that's good. If it happens, I'd examine it case-by-case.

On most (not all) replays (passing or failing), I expect the number of physics updates to be 1 smaller in your WIP NL than in NL 12.12.5. But it's not a necessity; I realize now: When you nuke zombies, we watch the nuke animation in WIP NL, and then the WIP NL won't take 1 fewer, but instead many more physics updates. Thus: If you see a wild difference in physics updates, look in the level for zombies, and look in the replay for nukes.

That's practically what I will do with the remaining packs next week: Compare the high-level results, and investigate by hand the cases where the high-level result differs or where the number of physics updates has not shrunk by exactly 1. Maybe I'll write a script for that.

Quote from: WillLem on April 18, 2024, 11:47:38 AM
We should catch up again properly about this soon, let me know when you're available.

Yes, for code review and to plan the pull request.

I'll be busy this weekend. Which of the following suits you best?

Monday, April 22, 15:00 UTC
Monday, April 22, 20:00 UTC
Wednesday, April 24, 15:00 UTC
Wednesday, April 24, 20:00 UTC

-- Simon

Simon

Quote from: Simon on April 18, 2024, 08:43:36 PM
Quote from: WillLem on April 18, 2024, 11:47:38 AM
We should catch up again properly about this soon, let me know when you're available.
I'll be busy this weekend. Which of the following suits you best?
[...]
Wednesday, April 24, 20:00 UTC

I can't do much more than offer these times. I expected you to agree on one or propose something else.

Friday, 26th, 17:00 UTC I can make. Otherwise it will have to be next week.

-- Simon

WillLem

Quote from: Simon on April 24, 2024, 07:38:16 PM
I can't do much more than offer these times. I expected you to agree on one or propose something else.

Friday, 26th, 17:00 UTC I can make. Otherwise it will have to be next week.

Apologies, I missed your reply! :forehead:

Should be able to make Friday, I'll let you know first thing tomorrow (Thursday) if I can't.

WillLem

OK, I've gone ahead and enabled notifications for this topic so I won't miss another post. Only just realised this is possible!

WillLem

Hi Simon, confirming 1700 UTC tomorrow (Friday). Again,  many apologies for missing your earlier message. See you tomorrow, W

Simon


Simon

I call a replay oneframediff if and only if both of the following hold:
  • During mass-replay verification in NL 12.12.5, it shows the same high-level result (pass, fail, undetermined, ...) as during mass-replay verification in nl-2024-04-08.
  • Mass-replay verification in NL 12.12.5 takes exactly one physics update longer to run this replay than the mass-replay verification in nl-2024-04-08.
See my earlier long post for why I expect oneframediff replays.

Now:
  • All 208 replays in Lemmings United (excluding Bonus rank) are oneframediff.
  • All 120 replays in Lemmings Introduction pack are oneframediff.
  • I have 3 more Icho replay collections that I'll test for oneframediff-ness. Expect more results in the next 0-3 days.
Thus: Excellent so far!

Here is a Bash script that takes two NL output files (pass them as argument 1 and argument 2 to this script) and tells me if the replays are oneframediff:

#!bin/bash
paste "$1" "$2" \
    | sed 's%[^(]* (%,%' \
    | sed 's% [^(]*(%,%' \
    | sed 's% .*$%%' \
    | grep ^, \
    | grep -v "none" \
    | grep -v "TALISMAN" \
    | awk 'BEGIN { FS="," } { print $2 - $3; }'


The script prints the difference (see item 2 in my first enumerated list in this post) in run frames of the replay between the two NL versions. For a oneframediff replay, the script will print 1. If everything is oneframediff, you get a long stream of 1s.

The NL replay results format isn't the nicest to parse. But this script isn't particularly smart either; I should have relied on the consistently-appearing "... frames" in the NL replay format. Instead, the script cuts at open parentheses (, which will occiasionally appear in the level names. The script won't halt on such extra parentheses, but will produce garbage results over 1,000 or under −1,000 that we must examine by hand. All were fine.

-- Simon

WillLem

Quote from: Simon on April 30, 2024, 11:14:42 PM
All were fine.

I've also tested a handful of replays in 12.12.5 and the 12.13 Exp (8/4/24) - confirmed, they all run for 1 frame longer in 12.12.5 and produce the same results.

However, I made a level specifically for the purpose of testing replays, and 5 replays each of which should produce a different result (Pass, Talisman, Fail, Undetermined, Level Not Found - not sure how to test for "Error" tbh). The Replay marked "Should Fail" does indeed Fail in 12.12.5, but returns Undetermined in the 12.13 Exp (8/4/24).

I'm not sure why this is happening, and can no longer compile NL in my copy of RAD without accepting the GR32 PR.

Typical :eyeroll:

I guess I'll get my NL repo updated and just hope that it doesn't touch any of the code in the PR, then I can try to find a fix.

WillLem

OK, successfully updated my working copy NL to GR32 v3. It didn't take long thankfully, and I can now compile.

From a quick glance, I'd say the crux of the matter is here, in GameReplayCheckScreen's very long RunTests procedure:


            while Game.MessageQueue.HasMessages do
              if Game.MessageQueue.NextMessage.MessageType = GAMEMSG_FINISH then
              begin
                if Game.GameResultRec.gSuccess then
                begin
                  if Game.GameResultRec.gGotTalisman then
                    fReplays[i].ReplayResult := CR_PASS_TALISMAN
                  else
                    fReplays[i].ReplayResult := CR_PASS;
                end else
                  fReplays[i].ReplayResult := CR_FAIL;
              end;
            if fReplays[i].ReplayResult <> CR_UNDETERMINED then Break;


Best guess is that because we're now stopping gameplay 1 frame earlier (due to the unplayable state having been reached), it never gets a chance to pass the GAMEMSG_FINISH message.

So, perhaps this might fix it:


            while Game.MessageQueue.HasMessages do
              if (Game.MessageQueue.NextMessage.MessageType = GAMEMSG_FINISH)
                or Game.StateIsUnplayable then <----------------------------------------------------- added this condition
                begin


This should allow the check for a pass/fail to run, and return FAILED correctly. I'll check this now and report back with an updated .exe if it works.

WillLem

#104
No, unfortunately that didn't work.

The code snippet does deal with PASS(TALISMAN)/FAIL/UNDETERMINED, so that's almost certainly what we need to be looking at.

I thought that maybe StateIsUnplayable needs to explicitly pass a FINISH message. But, if we zoom out and examine more of the RunTests method, we can see that this should indeed be happening:


            if (Game.CurrentIteration > CutoffFrame) or
               Game.IsOutOfTime or Game.StateIsUnplayable then <------------------- N.B. this line checks for StateIsUnplayable
            begin
              Game.Finish(GM_FIN_TERMINATE); <------------------------------------- So, the message is passed
              if Game.GameResultRec.gSuccess then
                fReplays[i].ReplayResult := CR_PASS;
              if Game.GameResultRec.gGotTalisman then
                fReplays[i].ReplayResult := CR_PASS_TALISMAN;
              Break;
            end;

            while Game.MessageQueue.HasMessages do
              if (Game.MessageQueue.NextMessage.MessageType = GAMEMSG_FINISH) then
              begin
                if Game.GameResultRec.gSuccess then
                begin
                  if Game.GameResultRec.gGotTalisman then
                    fReplays[i].ReplayResult := CR_PASS_TALISMAN
                  else
                    fReplays[i].ReplayResult := CR_PASS;
                end else
                  fReplays[i].ReplayResult := CR_FAIL;
              end;
            if fReplays[i].ReplayResult <> CR_UNDETERMINED then Break;


I'm going to try adding the fail result to the first of these two code blocks and see what happens.

Suggestions welcome in the meantime.