Lemmings 2 File Formats

GuyPerfect · March 30, 2013, 09:06:52 PM

Like we did for Revolution, I'm starting a thread to document the Lemmings 2 data files and their contents. I am aware that there is http://www.camanis.net.ipv4.sixxs.org/lemmings/tools.php" class="bbc_link" target="_blank">existing documentation out there, but I don't feel that it's quite verbose enough to make full sense of what the files truly represent, so I'm starting from scratch. However, I would like to thank geoo, Mindless and any other contributors ahead of time, as I am using those documents for reference.
__________

Compression Format

Code: [Select]

Integer data types are little-endian.

File
===============================================================================
Identifier         String*4       ASCII "GSCM"
UnpackedSize       UInt32         Total size of decompressed data
Chunks             Chunk[]        Compressed data chunks
===============================================================================

Chunk
===============================================================================
LastChunkFlag      Byte           Last chunk = 0xFF; 0x00 otherwise
SymbolCount        UInt16         Number of symbol definitions
SymbolList         Byte[Count]    Destination symbol names
SymbolValuesA      Byte[Count]    First byte of symbol
SymbolValuesB      Byte[Count]    Second byte of symbol
DataSize           UInt16         Number of bytes in encoded data
Data               Byte[Size]     Encoded data bytes
===============================================================================

At the beginning of each chunk, a 256-symbol dictionary is initialized.
Dictionary entries are indexed 0x00 through 0xFF, and each symbol is
initialized to the single byte value that matches its index. The symbol
definitions are then modified by concatenating symbols as specified by the
lists in the chunk header.

For each entry according to the entries in SymbolList (by index) the symbol is
redefined by concatenating the values of the symbols according to
SymbolValuesA and SymbolValuesB (also by index). All three lists represent
items in parallel to one another: the lists are all the same length, and the
item at position X in one list is to be used with the item in position X in the
other two lists.

The following specifies the algorithm for each item X in the lists, from 0 to
SymbolCount - 1. Symbols must be redefined in that order. Let "+" represent a
byte-wise concatenation operator:

  Symbol[List[X]] = Symbol[ValuesA[X]] + Symbol[ValuesB[X]]

When bytes are processed from Data, they represent the indexes of symbols in
the dictionary. The symbols are copied wholesale, in the order they are
specified in Data, to the output. This takes place after the symbols are
redefined using the symbol lists.

Chunks will continue to be processed until a chunk with a non-zero
LastChunkFlag has finished processing.

In the event data is not compressed, it will not begin with the "GSCM"
identifier. Use the data as-is in this case. Both compressed and uncompressed
data can be used in most contexts, so they are processed according to whether
that "GSCM" identifier exists or not.

Hermann Schinagl made a LEMZIP utility for converting files to and from this compression format. It can be found at the following page:

http://www.camanis.net.ipv4.sixxs.org/lemmings/tools.php" class="bbc_link" target="_blank">http://www.camanis.net.ipv4.sixxs.org/lemmings/tools.php
__________

Data File

Many of the data files in the game are packed into a structured archive format, where the file is broken up into sections. This includes the .dat files for graphics and levels.

Code: [Select]

Integer data types are little-endian.

File
===============================================================================
Identifier         String*4       ASCII "FORM"
DataSize           UInt32         Size of the remainder of the file
DataType           String*4       ASCII identifier for the data file type
Section            Section[]      File sections containing additional data
===============================================================================

Section
===============================================================================
Identifier         String*4       ASCII identifier for the current section
DataSize           UInt32         Size of the remainder of the section
Data               Byte[Size]     Data specific to the type of section
===============================================================================

The number of sections in the file depends on the respective sizes of existing
sections as well as the total data size reported in the file header.

__________

Graphics Representation

For 256-color graphics, palettes are specified and pixels are plotted on the screen. Pixel information is specified in the "styles" data files as well as the full-screen background images. These images specify a particular pixel order, which is demonstrated with the following animation:

http://i7.photobucket.com/albums/y255/bgng/Pixels_zps26980696.gif" alt="" class="bbc_img" />

Pixels are drawn every fourth pixel, starting with the top-left pixel of the image. Pixels proceed left-to-right, skipping three pixels each time. All rows of pixels are drawn in this manner, from top to bottom. Once the end of the last row is reached, drawing moves back to to the top row, but starts from the second pixel from the left. As before, every fourth pixel is drawn for every row of pixels. After four passes, every pixel in the image has been drawn.

Let's say you have a byte buffer containing pixel data for an image. Each byte is one pixel. The pixel Byte within that buffer, designated as Data[Byte], can be translated to X and Y coordinates with the following general formula:

Code: [Select]

Operators:
= Assignment
/ Integer division
% Remainder division
* Multiplication
+ Addition

Stride = Width / 4
Pass   = Byte / (Stride * Height)
X      = Byte % Stride * 4 + Pass
Y      = Byte / Stride % Height

Here, Byte is the current byte in the buffer, starting at index 0. Stride represents the number of pixels drawn per scanline in a single pass. Pass indicates the current pass of drawing pixels for each scanline. Width and Height are the final dimensions of the image, and X and Y are the pixel coordinates within that image, relative to the top-left corner, of the pixel specified by the current byte.

Tiles in level data are 16x8 pixels in size. Plugging in those numbers produces the following formula:

Code: [Select]

Stride = 16 / 4
(Stride = 4)
Pass   = Byte / (4 * 8)
X      = Byte % 4 * 4 + Pass
Y      = Byte / 4 % 8

...

X = Byte % 4 * 4 + Byte / 32
Y = Byte / 4 % 8

This is a somewhat complex formula for the X and Y coordinates, but it cannot be simplified any. However, processing pixel data can be simplified by taking a different approach.

Let's now say that you read pixel data from the original byte buffer, called Data, and you want to re-order them and store them in a new byte buffer, called Output. To do this, we have the input byte index specified as Data[Byte], and the output byte index specified as Output[Pos]. From here, we get the simple end goal formula:

Code: [Select]

Output[Pos] = Data[Byte]
The calculation of Pos, therefore, is necessary. Using the earlier X and Y pixel coordinates, Pos can be calculated accordingly:

Code: [Select]

Pos = Y * Width + X
Substituting for X, Y and Width yields a pretty ugly expression, but it can be simplified. Take a look:

Code: [Select]

Operators:
<< Bitwise left shift
>> Bitwise right shift
|  Bitwise OR
&  Bitwise AND

Pos = (Byte / 4 % 8) * 16 +
      (Byte % 4 * 4 + Byte / 32)

...

Pos = (((Byte >> 2) & 7) << 4) +
      (( Byte       & 3) << 2) + (Byte >> 5)

...

Pos = ((Byte & 28) << 2) +
      ((Byte &  3) << 2) + (Byte >> 5)

...

Pos = ((Byte & 31) << 2) + (Byte >> 5)

...

Pos = Byte % 32 * 4 + Byte / 32

How 'bout them apples? Turns out both of those 32s in the expression come from the following expression:

Code: [Select]

Width * Height / 4
Meaning our final, unadulterated formula looks like this:

Code: [Select]

Quarter = Width * Height / 4
Output[Byte % Quarter * 4 + Byte / Quarter] = Data[Byte]

This works not only for the level tiles, but any of the graphics stored in the same general format:

http://i7.photobucket.com/albums/y255/bgng/CosyRoom_zpsed829079.png" alt="" class="bbc_img" />

Mindless · March 31, 2013, 03:43:01 AM

http://www.lemmingsforums.com/index.php?topic=765.msg16355#msg16355">Quote from: GuyPerfect on 2013-03-30 15:06:52

Hermann Schinagl made a LEMZIP utility for converting files to and from this compression format.

Just a note from when I contacted Hermann about LEMZIP:

Quote from: Hermann Schinagl

Decompression works well, but I didn't get the compression
work perfectly. Maybe you can do this part.

To be honest, I 'found' the decompression in Lemmings II code...
That's why it is assembly language. So be carefull with using it
officially.

So you might not want to use LEMZIP for compression -- there is however https://code.google.com/p/lemmings-tools/downloads/detail?name=lem2zip-0.2.0.exe" class="bbc_link" target="_blank">lem2zip which, as far as I know, compresses correctly.

GuyPerfect · April 03, 2013, 08:56:23 PM

I'm not fully done with the level styles yet, but I figured I'd post my progress before my eyes fall out.
__________

Style Palette - L2CL

This data section is found within a FORM data file and has the section name "L2CL"

The palette is expressed as 128 RGB triplets. The represent colors 0-127 in the global palette. The other 128 entries are used by the GUI.

Code: [Select]

Integer data types are little-endian.
Note that the FORM data stores integers as big-endian.

L2CL
===============================================================================
(Unknown)          UInt16         Unknown. Seems to always be 0x0001.
Palette            RGB[128]       RGB color data
===============================================================================

RGB
===============================================================================
Red                Byte           Red channel (lower 6 bits only)
Gree               Byte           Green channel (lower 6 bits only)
Blue               Byte           Blue channel (lower 6 bits only)
===============================================================================

The palette from MEDIEVAL.DAT:

http://i7.photobucket.com/albums/y255/bgng/L2CL_zpse9433c79.png" alt="" class="bbc_img" />
__________

Style Tiles - L2BL

This data section is found within a FORM data file and has the section name "L2BL"

Most map features, both animated and not, are build up from a number of 16x8-pixel "blocks" of tile data.

Pixels with color value 0 are considered to be "air", while all other pixels are "solid".

Code: [Select]

Integer data types are little-endian.
Note that the FORM data stores integers as big-endian.

L2BL
===============================================================================
TileCount          UInt16         The total number of tiles in the style.
Tiles              Tile[]         Tile pixel data
===============================================================================

Tile
===============================================================================
Pixels             Byte[128]      1 byte per pixel, refers to palette indexes.
===============================================================================

To re-order the pixel buffer so that pixels are stored linearly left-to-right
and top-to-bottom, process every byte B in the Pixels array accordingly:

    LinearPixels[(B % 32) * 4 + B / 32] = Pixels

In the above expression, % represents remainder division.

The tiles from MEDIEVAL.BAT:

http://i7.photobucket.com/albums/y255/bgng/L2BL_zps35e22749.png" alt="" class="bbc_img" />
__________

Style Presets - L2BE

This data section is found within a FORM data file and has the section name "L2BE"

Styles are packaged with "presets", rectangular groups of tiles that are useful for creating level terrain without specifying tiles individually. The game engine does not use these presets, however, but rather levels are stored as big groups of tiles. These presets are only useful to level editors.

Code: [Select]
Integer data types are little-endian. Note that the FORM data stores integers as big-endian. L2BE =============================================================================== PresetCount UInt16 The total number of presets in the style. Presets Preset[] Preset definitions. =============================================================================== Preset =============================================================================== (Unknown1) Byte Unknown. Possibly used by editor. (Unknown2) Byte Unknown. Possibly used by editor. Width Byte Number of tiles wide. Height Byte Number of tiles tall. DataSize UInt16 Total size of this Preset, including header. Tiles UInt16[] Tile indexes (from L2BL). =============================================================================== Tiles are arranged within presets in the order of left-to-right then top-to-bottom.
Here are some sample presets from MEDIEVAL.DAT:

http://i7.photobucket.com/albums/y255/bgng/L2BE_zps9953c0b6.png" alt="" class="bbc_img" />
__________

Style Sprites - L2SS

This data section is found within a FORM data file and has the section name "L2SS"

There is a handful of different objects in the game that can interact with Lemmings, yet themselves are not represented by tile graphics. These include the cannons and the Medieval dragon and catapult.

Code: [Select]
Integer data types are little-endian. Note that the FORM data stores integers as big-endian. L2SS =============================================================================== SpriteCount UInt16 The total number of sprites in the style. Sprites Sprite[] Sprite definitions. =============================================================================== Sprite =============================================================================== DataSize UInt16 Size in bytes of the remainder of the Sprite. Width UInt16 Width of the Sprite, in pixels. Height UInt16 Height of the Sprite, in pixels. ImagePointers UInt16[4] Pointers to the data for the 4 pixel layers. ImageData Byte[] Encoded data representing image content. =============================================================================== The values of the ImagePointers are relative to the first byte of the Sprites array. However, they do not account for the DataSize bytes, meaning these offsets need to be increased by 2 for each element in the Sprites array that came before it. The offset relative to the first byte in the Sprites array, therefore, within sprite S in the array (beginning at 0), can be calculated with the following formula: RealImagePointer = ImagePointer + 2 * (S + 1) As with the other graphics in the game, these sprite graphics are expressed as 4 layers, where each layer represents vertical "stripes" of pixels every 4 columns. The data pointed to by the first ImagePointer represents pixel columns 0, 4, 8, 12, etc. The second ImagePointer represents columns 1, 5, 9, 13, etc. This continues for the remaining pointers. The ImageData bytes encode pixel values in such a way that pixels with palette entry 0 (transparent pixels) are not directly expressed. This results in a moderate level of compression in most cases, but in some instances will cause the image data to bloat. When first processing the data for an image layer, as pointed to by the ImagePointer, the current X and Y drawing positions within the sprite are initialized to 0. This corresponds with the top-left pixel of the sprite. When bytes are read from the data to be used as pixel values, the X position will be incremented once per byte. A special "newline" command will reset the X position to 0 and increment Y by 1. Bytes are processed 4 bits at a time, starting with the high 4 bits of each byte. The resulting nibbles, here called the "high nibble" and "low nibble", are bit-packed values containing the following fields: Field mccc Bit 3 0 m = Mode 0 - Copy 1 - Skip c = Count Drawing stops when two specific conditions are met: the X position is 0, and the high nibble is 0xF. Once this occurs, no further drawing is processed for the current layer. The actual X position in the final image is equal to the X position within the layer, multiplied by 4, then with the layer number added (where the first layer is layer 0). In other words: FinalX = LayerX * 4 + LayerNum Pixels are processed first by the high nibble, then by the low nibble. If Mode is Copy, then Count bytes are read from ImageData and stored in the pixel buffer. If Mode is Skip, then the X position is increased by Count, but no bytes are read from ImageData. Should the value of the low nibble be 0, then, after processing both nibbles, a "newline" operation occurs: X is reset to 0, and Y is incremented by 1.
The special sprites from MEDIEVAL.DAT:

http://i7.photobucket.com/albums/y255/bgng/L2SS_zpsf0741fc7.png" alt="" class="bbc_img" />

GuyPerfect · April 03, 2013, 09:02:55 PM

Just a note that the L2SS pixel representation is rather different than in geoo's document. The format as I've documented it here is 100% correct (to the best of my research), and I have conducted extensive testing with the game engine to figure out how data gets processed.

geoo · April 04, 2013, 03:30:28 AM

I tested my L2SS docu by exporting all sections of the style graphics, VLEMMS and the .IFF files, and they display correctly (see here): http://www.lemmingsforums.com/index.php?topic=329.msg8284#msg8284" class="bbc_link" target="_blank">http://www.lemmingsforums.com/index.php?topic=329.msg8284#msg8284
I didn't test for any compression codes not used in the original files. Are you sure you read my documentation correctly (the descriptions are not very detailled).

The Lemmings 2 demo uses a somewhat different format, btw.

GuyPerfect · April 04, 2013, 05:54:00 PM

http://www.lemmingsforums.com/index.php?topic=765.msg16415#msg16415">Quote from: geoo on 2013-04-03 21:30:28

Are you sure you read my documentation correctly (the descriptions are not very detailled).

Yes I am. Did you read mine? (-:

Your work with the format was very helpful; I don't want to belittle your efforts at all. When I looked at the data, I couldn't make heads or tails of what it was representing, which meant I couldn't make modifications and go in the game to see what changed. Your document gave me a place to start, let me know what I was looking at, and enabled me to press on to find all of the details of how pixels are processed for these sprites.

Having said that, a few things do stand out in your document:

Nibble values are not 4-bit numeric values, but rather a 1-bit flag and a 3-bit numeric value. The stuff with "< 8" and ">= 8" doesn't accurately apply to how data is being processed.
Byte value 0xFF does not necessarily indicate end of stream. In fact, it's a perfectly valid encoding that effectively represents "skip 14".
- Additionally, byte values other than 0xFF can be used to indicate end of stream.
A high nibble value of 0xE is not a special-case value. It's just the code for "skip 6".

geoo · April 04, 2013, 08:41:32 PM

Nope, I actually haven't yet, takes a bit of time to read and make sense of it which I wasn't willing to spend this week. Perhaps next week I can give a somewhat more qualified statement. I just thought you meant something is wrong with its content when you said 'different', not just different in writing style, sorry about that. (You don't need to remind me that it's horribly written, I'm aware of that

http://www.lemmingsforums.com/Smileys/lemmings/XD.gif" alt=":XD:" title="XD" class="smiley" />, good thing you're giving it a proper writeup now.) It might be missing some cases that are not used in the game, and in hindsight might be very weirdly written (especially the special case E, which for some reason back then I thought didn't fit the pattern), but from the plain viewpoint of content, it should be subset of yours as it works correctly on the original data. I'll read yours next week, as it's probably a big improvement over this old horrible description of mine, writing-wise.

EricLang · January 06, 2014, 09:28:25 AM

Ok I don't understand the decompression....

At the beginning of each chunk, a 256-symbol dictionary is initialized.
Dictionary entries are indexed 0x00 through 0xFF, and each symbol is
initialized to the single byte value that matches its index

Code: [Select]

var dictionary: array[0..255] of byte;
for i := 0 to 255 do
  dictionary := i;

So we get a dictionary 0,1,2,3,4,5.....

What to do next?
if someone is trying to explain it please refer clearly to what is what.

we have
the chunk
- symbols[0.. symbolcount - 1]
- symbolvaluesa[0.. symbolcount - 1]
- symbolvaluesb[0..symbolcount - 1]
- compresseddata[0..datasize - 1]
and
- some dictionary[0..255]

namida · January 06, 2014, 09:38:44 AM

dictionary[0] = 0
dictionary[1] = 1
dictionary[2] = 2
etc

Initially, that is. I didn't read the thing in full, but I think it can change later?

EricLang · January 06, 2014, 02:17:41 PM

I had a look in the C-code that I found, so decompression is working

http://www.lemmingsforums.com/Smileys/lemmings/smiley.gif" alt=":)" title="Smiley" class="smiley" />

ccexplore · January 06, 2014, 08:19:20 PM

Glad to hear you got it worked out. I suppose the documentation could maybe use some pseudo-code and examples there to aid implementers?

namida · January 06, 2014, 10:23:05 PM

I found the examples quite helpful in your DOS formats documentation, definitely. (Though one thing I found quite confusing - maybe it's just me - is how, when you get a 9-bit value, you were writing it as say, 0x000, 0x008, 0x010, 0x018, etc... rather than just 0x00, 0x01, etc. I don't know, to some people that way probably makes more sense, but for me... :/ xD

ccexplore · January 06, 2014, 10:38:21 PM

@namida: I think you are actually referring to snippets of the DOS Lem1 LVL (ie. uncompressed level) file format documentation, which was actually written by rt (of Clones fame) and not me. I noticed the same thing there and agree with you on this.

EricLang · January 06, 2014, 11:00:59 PM

I just was wondering if the compression in dos lemmings (original, ohno etc.) is basically the same as in lemmings tribes. I never really studied the algorithm, but just 'blindly' translated the C or VB code that I encountered.

namida · January 06, 2014, 11:14:22 PM

http://www.lemmingsforums.com/index.php?topic=765.msg18576#msg18576">Quote from: ccexplore on 2014-01-06 16:38:21

@namida: I think you are actually referring to snippets of the DOS Lem1 LVL (ie. uncompressed level) file format documentation, which was actually written by rt (of Clones fame) and not me. I noticed the same thing there and agree with you on this.

To be honest, I'm mostly just going off the L1 documentations as a whole and I remember having problems with this, I don't remember specifically where it was implemented. And if it's the LVL format, that may very well be my fault, as the "TalveSturges" mentioned in the credits is yet another of my millions of past aliases.