Lemmings Scoring System

Started by WillLem, February 23, 2024, 05:26:09 PM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

WillLem

What would the best scoring system for a Lemmings game be?

We have 3 main factors over which the player has control:

1) How many lemmings are saved
2) How much (in-game) time is used
3) How many skills are used

But, how to score this?

It seems that percentages need to become an immediate filter: if we score 10 points per lemming, then an easy level with 100 lemmings offers 1000 points, whereas a difficult level with 5 lemmings only offers 50. So, we'd need this to become "how many lemmings are saved vs. how many are available." 100/100 should score the same as 5/5, perhaps.

Unfortunately, it's not as simple to apply the same logic to time and skills. An infinite time level doesn't provide a suitable basis for comparison; we'd need to know how long the solution is supposed to take, which would mean collecting more information from the level designer.

Meanwhile, using 1 of 200 available skills in a Tame level is not as much of an achievement as using, say, 5 of 8 skills in a Havoc level. So, percentages also fail here (assuming that using fewer skills is necessarily better).

There are abviously other factors to consider. But, could an algorithm be built which sufficiently factors in lemmings saved, skills used, time used and level difficulty? Maybe the higher a level's rank, the more weight certain features carry? Is the logicial conclusion of this that each level needs its own individual scoring system?

Proxima

Quote from: WillLem on February 23, 2024, 05:26:09 PMIs the logicial conclusion of this that each level needs its own individual scoring system?

Needless to say, this has been much discussed in the past, especially considering that one official game (SNES Lemmings) has a scoring system, and NeoLemmix used to.

It's impossible to have a watertight scoring system for Lemmings, because so much depends on factors that are not quantifiable. Saving one skill on "No added colours or lemmings", which was thought for years to have no skill-saving solutions, is much more impressive than finding the solution that saves a builder on "With a twist of lemming, please". But how much more impressive? Does the question even make sense?

Instead, the question I would ask is: does a scoring system need to be watertight? Touhou scoring, for example, is highly arbitrary, so that scoring becomes like a separate game within the game; it rewards skilful play (e.g. most Touhou games incorporate "graze", passing close to bullets, into the scoring system in some way) but it usually doesn't reward what one would normally think of as the main objective -- completing the game using as few lives and bombs as you can manage.

That's an extreme example, of course. But my point is, if you want to foster competition for scores, all you need is for a scoring system to exist and for it to have leaderboards. It doesn't need to be an accurate measurement of anything.

For Lemmings games, I still like a system like this because of its simplicity:
* 100 times percentage of lemmings saved
* 10 times skills remaining
* 1 times seconds remaining

Of course, some adjustment has to be made for infinite time. Perhaps "1 times seconds taken under 600, with no time score if time taken > 600"?

Simon

What is the use case? Comparing solutions across different levels? Comparing solutions for a single given level? Tallying level scores for an entire pack, then comparing different players' performance on a given pack?

Does the score have to be an integer? If not:

Consider the triple (Lemmings lost, skills used, time until final exiting) and order these triples lexicographically, i.e., losing fewer lemmings beats losing more lemmings regardless of how well you did in the other two components (skills, time). This is still a linear order.

This takes the constants (x100, x10, 600 seconds) out of the scoring, but even this tastes arbitrary. On some levels, we're interested in fastest time over fewest skills. Also, you can't naturally add these scores across levels, unless you're fine adding them component-wise.

-- Simon

∫tan x dx

See here for an example of the score system in SNES: https://youtu.be/byC680Jde1g

The score seems to be based on the following:
(Saved% * 1000) + (number of skills remaining)

If we wish to expand on this, then I agree that a percentage based system should be in place for the lemmings themselves.
Regarding the conundrum of a level with minimal skills, versus a level with 50 of many skills (or even infinitely many skills available), surely the only metric worth measuring is "how many skills did the player use?"
It does not matter if the level provides infinitely many bashers. High scoring replays should be the ones where minimal skill usage occurs.
Similarly for time limits; it doesn't matter if a level has one minute or infinite time. A solution that takes 30 seconds is a better solution than one that takes 50 seconds.

What is something that needs to be taken into account is the relative weighting of all of these factors.

Suppose we have two replays, both save 100% of lemmings.
- Case A: This replay uses only 3 skill assignments, but takes 5 minutes to complete.
- Case B: This replay uses 25 skill assignments, but is completed in 45 seconds.

Which of these should be awarded more points? Which is more impressive?

Consider the following formula, with arbitrary constants inserted for example:

Score = (%Lemmings saved * 1000) - (Number of skills used * 10) - (Number of frames elapsed in replay)

Number of frames taken covers both time limit and infinite time levels. This also works in the case where a level is extremely minimal - taking less than one second. Negative scores are clamped at a score of zero.

Thus, players are incentivised to save as many lemmings as possible, whilst minimising both skill usage and time taken.
We can generally assume the frame count to be somewhat large in the case of an average level, though this particular metric is heavily biased in favour of shorter levels over longer ones.
Perhaps we could take the square root of the number of frames? Or apply some other kind of function that gives diminishing returns?

Alternatively, consider the following:
Upon spawning into a level, a lemming has a hidden "score count" variable that starts at 1000, and decreases by 1 each second that passes.
This score count decreases by some fixed amount each time a skill is assigned to that lemming. It could be something like -10 points per skill, -20 if a permanent skill is assigned.
When a lemming is saved, its score value is added to the total.

Thus each lemming contributes a different amount to the score, based on its age. Players are therefore incentivised to save lemmings as soon as possible - hastening their solutions.
This also favours solutions with fewer skill assignments too. Since these scores and their penalties are additive, then it does not matter which lemmings receive which skills.
This means that the classic "worker lemming" style levels are not penalised for having one lemming do all of the work.

There are many possible avenues for exploration here.

WillLem

#4
Quote from: Proxima on February 23, 2024, 07:05:33 PM
NeoLemmix used to

Really? What was the system?

Quote from: Proxima on February 23, 2024, 07:05:33 PM
Instead, the question I would ask is: does a scoring system need to be watertight? ... my point is, if you want to foster competition for scores, all you need is for a scoring system to exist and for it to have leaderboards

It doesn't need to be watertight, agreed. But it does need to be meaningful, otherwise players won't be interested in trying to improve it. Leaderboards can be a good way to encourage a bit of light competition, and they give a player an idea of their skill level as well (which can be equally compelling and demotivating). Which brings me to Simon's question:

Quote from: Simon on February 23, 2024, 10:00:32 PM
What is the use case?

Probably, a player wanting to get an idea of how well they've performed on a particular level; not necessarily as compared to other people, but according to the level's expectations/standards.

I think what I'm realising is that these expectations and standards evolve over time, and so are not possible to quantify at the outset. Even if we could assign a "maximum possible score" to a level, this would present the same problem as a leaderboard: if a player completes the level but falls way short of the maximum possible score, it could be demotivating.

Ideally, a score system should feel rewarding, motivating, and compel the player to come back and try again for better. I suppose that depends massively on the individual player, though.

Quote from: Simon on February 23, 2024, 10:00:32 PM
Does the score have to be an integer?

Not necessarily. S, A, B, C etc work nicely in many games to give players a bit of performance feedback.

Quote from: ∫tan x dx on February 23, 2024, 10:43:47 PM
surely the only metric worth measuring is "how many skills did the player use?"

Agreed; NL reports fewest skills, and SLX now displays it on the postview screen. It's an interesting thing to try and improve upon.

I suppose the downside of a scoring system is that the player doesn't necessarily know what contributed to the score. The individual stats definitely work better in this regard, and would still very much need to be prominently tracked and displayed.

Quote from: Proxima on February 23, 2024, 07:05:33 PM
Saving one skill on "No added colours or lemmings" ... is much more impressive than finding the solution that saves a builder on "With a twist of lemming, please". But how much more impressive? Does the question even make sense?

Quote from: ∫tan x dx on February 23, 2024, 10:43:47 PM
Suppose we have two replays, both save 100% of lemmings.
- Case A: This replay uses only 3 skill assignments, but takes 5 minutes to complete.
- Case B: This replay uses 25 skill assignments, but is completed in 45 seconds.

Which of these should be awarded more points? Which is more impressive?

I suppose that's the unanswerable question of Lemmings, and why this topic is intriguing.

We need a way of measuring "The L Factor" - how interesting, unique, clever, unexpected, innovative, and perhaps above all how elegant (with a capital "L" ;P) a solution is. Basically, how cool does it look in a replay? 8-)

Quote from: ∫tan x dx on February 23, 2024, 10:43:47 PM
Score = (%Lemmings saved * 1000) - (Number of skills used * 10) - (Number of frames elapsed in replay)
...
Thus, players are incentivised to save as many lemmings as possible, whilst minimising both skill usage and time taken.

Agreed.

Perhaps it would be most appropriate to gather this score from different playthroughs of the same level; then, we can reward both the effort that saves all lemmings but uses a million skills, and the one that only saves 50% but only uses 1 skill; as long as the same player did both, then their overall score would reflect this.

In NL/SLX, we already have ways to track a player's best performance fairly comprehensively in multiple game elements (maximum saved, quickest time, fewest skills, fewest skill types, fewest of each individual skill, etc.). It makes sense to make use of this data to calculate a player's score.

Such a system incentivises repeat play, and rewards finding multiple ways to solve the same level. In turn, this hopefully incentivises designers to provide multiple possible solutions as well (something I'm a huge advocate of, so I would hope any scoring system reflects this).

Quote from: ∫tan x dx on February 23, 2024, 10:43:47 PM
Perhaps we could take the square root of the number of frames? Or apply some other kind of function that gives diminishing returns?

Please elaborate!

Quote from: ∫tan x dx on February 23, 2024, 10:43:47 PM
Upon spawning into a level, a lemming has a hidden "score count" variable ...
When a lemming is saved, its score value is added to the total.

This would work excellently for calculating the score of a single playthrough, for sure. It seems important for a scoring system to take individual playthroughs into account, but this shouldn't necessarily affect a player's overall score for that level - for that, we're always interested in tracking what is essentially the "sum of best".

∫tan x dx

There's also the issue of a positive reward, versus a negative penalty.

It seems logical that each lemming saved contributes positively to the player's score. But what if each lemming lost instead contributed negatively?
For example, each lemming lost is 100 subtracted from the score. Perhaps if the lemming is explicitly killed by the player (like a bomber) the penalty is only -50?

Likewise, should the player be rewarded with points for skills not used, or penalised for skills used? The same goes for time taken.

Regarding the time factor, longer levels are always going to be lower scoring than shorter levels. There's also the issue of proportion:
For a longer level, suppose solution A takes 5 minutes, whereas solution B takes 4 minutes, 55 seconds.
For a shorter level, solution C takes 30 seconds, whereas solution D takes 25 seconds.

In each case, the difference in time taken is only 5 seconds, but that is a much larger proportion for the second pair than the first.
Should scores reflect this?


Quote from: WillLem on February 25, 2024, 03:25:39 AM
Please elaborate!

https://www.desmos.com/calculator/69uv3sgsfg

The graph in the above link is an example of a "diminishing returns" function.
If the input (x) is taken to be the number of frames of a solution, then the output (y) is the score calculated.
There are a few extra variables at play here: (a), (b) and (c).
The value of (a) comes from taking an arbitrary baseline of 2 minutes for an average solution time, we get (2 minutes) * (60 seconds/minute) * (17 frames/second) = 2040. This variable denotes the "max value" of the function.
Variable (b) is a vertical scale factor. Set (b) = 1, to see more clear behaviours of the function
Variable (c) is a kind of "drop off" factor - how quickly the function diminishes
Note that this function yields a higher score for shorter replays, while tapering off for longer ones.
Regarding the note on proportionality above, this kind of function gives a greater score difference for shorter replays than for longer ones.

5 minutes -> 6039 score
4 min 55 -> 6075 score
(36 point difference)

Whereas:

30 sec -> 11643
25 sec -> 12096
(453 point difference)

Of course, this function is merely an example, and I'm not suggesting this particular function even be used at all. My point is, that there can be many "clever" functions out there that take in the information and do wonderful things with it.

That said, however: do we even need to be "clever"?

What if each level starts out with some base score of say 50,000 and actions taken in a solution either add to it or subtract from it? This base score could differ from level to level, allowing users to set their own value for the base score.

namida

QuoteReally? What was the system?

Based on a skim of the code in an old commit that still had it (77ae002 is the one that removed it), you'd begin the level with 50,000 points, and your score would decrease based on certain things happening (but could never drop lower than 100 no matter what). Where one of the penalties implies another (eg. "level failed" also implies "didn't save 100%"), the lesser penalty still gets applied as well.
Level outcome
- Failure to meet the save requirement: -10000
- Failure to save every lemming (including Cloners): -1500, and a futher -100 per lemming not saved

Time penalties
- Prior to the first lemming being saved: -2 per frame
- On or after the frame where the first lemming gets counted as saved: -1 per frame

Skill usage - Each time a skill is used, the corresponding deduction is made.
- Walker: -10
- Climber: -25
- Swimmer: -35
- Floater: -20
- Glider: -55
- Mechanic: -40 (now known as "Disarmer")
- Bomber: -20 (plus the penalty for losing the lemming)
- Stoner: -15 (plus the penalty for losing the lemming)
- Blocker: -25
- Platformer: -90
- Builder: -120
- Stacker: -45
- Basher: -50
- Miner: -50
- Digger: -50
- Cloner: -200
- (The Fencer, let alone the Shimmier, Jumper, Slider and Laserer, didn't exist yet at the time scoring was removed)

Other
- Triggering a trap: -150 each time
- Increasing the release rate: -100 the first time, no penalty after that
- Decreasing the release rate: -200 the first time, no penalty after that


The score was displayed on the postview screen, and high scores were saved. No one ever really paid much attention to it, so it got culled (technically as part of the major cull / tidyup between 1.43 and 1.47, although it got merged into stable versions a tad earlier, with the score first removed in 1.43-C).
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

∫tan x dx

I've given this a bit more thought, and I have to ask a question:

What use is there in counting skill assignments?

Think of an arbitrary lemmings level, a level can be put into two distinct categories:
- Type A: There are exactly enough skills given to solve a level
- Type B: There are more than enough skills given to solve a level

Type A levels are simple enough, and they are plentiful in number; this is the default approach to creating a puzzle that is intended to be difficult.
Type B levels encompass levels such as the classic "20 of everything" style level. Sometimes a level might contain a minimal skillset, while still giving some extra superfluous skills - These levels are also type B.

My point is this:
For type A levels, all solutions will contain the exact same number of skill assignments, since that is the way the level is intended to be solved (backroutes notwithstanding). Thus in regards to score, the skill assignments will not play a significant role at all in determining which replay is "better".
For type B levels, it is highly likely that a solution will not use all of the skills provided. Therefore, in my opinion, the lemming saved count and the time taken would be better metrics for determining score.

Certainly, there is something to be said for a solution that uses minimal skills, but would this not be better suited for a talisman challenge instead of a score metric?

I find replays that save the most whilst also being very fast are more fun to watch. If a replay manages to save a builder, or basher, or whatever, this seems less impressive to me, unless doing so is very challenging. Again, I would consider such a challenge to be better suited to a talisman.

So, of the three obvious metrics - lemmings saved, time taken, skills assigned - should we discard skills and only focus on lemmings and time? If so, this certainly simplifies matters in regards to a scoring algorithm, and for many type A levels the skill assignments would make no difference to the score anyway.

WillLem

Quote from: ∫tan x dx on February 29, 2024, 08:05:13 PM
For type A levels, all solutions will contain the exact same number of skill assignments, since that is the way the level is intended to be solved (backroutes notwithstanding). Thus in regards to score, the skill assignments will not play a significant role at all in determining which replay is "better".

Agreed. Assuming that only the intended solution is possible, all players would end up with an identical score as far as skills are concerned.

Quote from: ∫tan x dx on February 29, 2024, 08:05:13 PM
Certainly, there is something to be said for a solution that uses minimal skills, but would this not be better suited for a talisman challenge instead of a score metric?

Of course. But then, all of the proposed score metrics are talismans already. Perhaps talismans make the whole idea of a score system obsolete?

It's one of the reasons I've proposed a collectible item - I'd ideally like to introduce this alongside a meaningful score system.

Quote from: ∫tan x dx on February 29, 2024, 08:05:13 PM
should we discard skills and only focus on lemmings and time? If so, this certainly simplifies matters in regards to a scoring algorithm, and for many type A levels the skill assignments would make no difference to the score anyway.

I agree that it makes no difference with type A levels, but it may make a significant difference with type B levels, particularly those that offer multiple possible solutions and replay value.

Also, the more metrics that contribute to the score, the better a reflection the score will be of a particular playthrough, perhaps.

Is there a reason you'd prefer to have less metrics?