Lemmings DAT Anticompressor

Started by Mindless, August 27, 2005, 07:23:00 AM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

Mindless

download link - major bugs, expect a new version within the next week...

This program allows you to "compress" files into .DAT files, however, in reality, the compressed files will be bigger than the originals.  Anyway, ccexplore could release a program that actually does some compression, because I'm not planning to, since I wasted way too much time debugging this anticompressor.
I can add an append feature if anyone wants it, since at the moment it will only overwrite DAT files.

This program is bound to have bugs since I can't possibly compress every possible file combination... ever.

ccexplore

I must say, "Anticompressor" is such an apt name for this.  ;)

Anyway, one way to gauge the reliability of your compression algorithm is as follows:  after you compressed stuff, decompress it back and check that it matches the original decompressed data byte by byte.  I have that verification build-in with every program I've released that uses compression (well, only myvgaspec so far).

This way, if there's a bug, at least you catch it before it ruins the user's files.

Then test your compression algorithm through the set of DAT files that comes with the game, making sure they all pass.  This is the extent of testing I've done on mine.  I won't say it achieves good coverage by any means, but it does at least give me a reasonable degree of confidence.

Mindless

Ok... anticompression apparently doesn't work... :'(
So... I rewrote my compression code so that it actually does some amount of compression (2x size of Psygnosis files) and neither LemEdit nor CustLemm will load the recompressed data...  =8O  I'd guess they, being DOS apps require the files to be a compressed somewhat well...

I've tested my compressor, and it seem to have 100% accuracy (though I should add the error-checking mentioned above) but the files still won't load, so I shall try to optimise my code...

ccexplore

Quote from: Mindless  link=1125127380/0#2 date=1125210245Ok... anticompression apparently doesn't work... :'(
So... I rewrote my compression code so that it actually does some amount of compression (2x size of Psygnosis files) and neither LemEdit nor CustLemm will load the recompressed data... &#A0;=8O &#A0;I'd guess they, being DOS apps require the files to be a compressed somewhat well...
That's a bit surprising.  LemEdit doesn't produce particularly well-compressed files either.  And a level is only about the size of 2k, which is peanuts even by DOS standards.

Send me what you have (uncompressed + what your program produces).  Not that I'd necessarily be able to fix anything, but I do want to see it for myself.

Incidentally, possible stupid but common mistakes (I committed mistake #2 on my first attempt):

1) you did remember to include the header, right?
2) the compressed_size field in the header includes the 10 bytes in the header itself (that is, if the pure data compresses down to 1000 bytes, the header should read 1010 for the compressed size, not 1000)
3) remember the header's compressed and decompressed sizes are stored in big-endian.  So a size of, say, 0x0400 (1024 bytes in decimal) should have its bytes be stored as 0x04, 0x00, not 0x00, 0x04.

Mindless

Fixed the problem, it was in my lazily coded raw chunk encoder... see what too many one byte raw chunks get you... :P Not sure why LemEdit and CustLemm can't handle repetitive raw one byte chunks as long as it follows the file format... Oh, well...
Hopefully I can work the bugs out of the far reference chunks routine...

ccexplore

Quote from: Mindless  link=1125127380/0#4 date=1125217947Not sure why LemEdit and CustLemm can't handle repetitive raw one byte chunks as long as it follows the file format...
It should.  At least CustLemm should.  If it were me I would be more inclined to look over the code you wrote more carefully.  (But if you truly believe that LemEdit/CustLemm is the problem, do send me the problematic DAT file your program produced, if possible, so I can take a more detailed look at how exactly CustLemm is choking on it.)

Incidentally, remember that, although closely modeled after the decompression routine in the game itself, ldecomp can serve as yet another test.  If ldecomp wouldn't accept it either, then you really should review your code.  Good luck!  ;)

ccexplore

Quote from: ccexplore (not logged in)  link=1125127380/0#5 date=1125218607Incidentally, remember that, although closely modeled after the decompression routine in the game itself, ldecomp can serve as yet another test.
Then again, I don't think I ever updated ldecomp to perform detailed error checking.  So to really see that it works, make sure that not only does ldecomp not crash, but also that the result it produces match the original uncompressed data.

Mindless

Here's by far the best example, since it doesn't crash CustLemm: a recompressed VGAGR0.DAT using lots of one byte raw chunks.  Instead of crashing, there are graphical errors.
http://it.travisbsd.org/lemmings/_misc/VGAGR0.recompressed.zip
And when decompressed by ldecomp, the data is identical to the decompressed data of the original.

Mindless

Well, it's kind'a working, but when it is, it passes your method's compression ratio.
Here's the example included with myvgaspec and a recompressesed version. My recompressed version is half the size of yours.  Now, if I can only get it to give me constant results.
http://it.travisbsd.org/lemmings/_misc/examplevgaspec.recompressed.zip

ccexplore

Quote from: Mindless  link=1125127380/0#7 date=1125250239Here's by far the best example, since it doesn't crash CustLemm: a recompressed VGAGR0.DAT using lots of one byte raw chunks. &#A0;Instead of crashing, there are graphical errors.
Hmm, that is a bit odd I must say.  I guess I could check it again later against a version (not yet done) of ldecomp that has more stringent checks added in, but the fact that it produces apparently correct data seem to indicate it should be doing the right thing.

Does this happen only with VGAGR# files, or have you gotten similar problems with levelpak files or VGASPEC# files?

I can imagine problems with VGAGR# and VGASPEC# files since they are somewhat large and the typical limits a DOS program can run into is either 32k or 64k.  But what your program produces is only around 20-25k.  Of course, it's perfectly possible that the game might have set aside smaller amounts of memory for holding compressed data.  (Doesn't help either that the amount can potentially be different for different kinds of files.)

I found that the ONML's vgagr2.dat does have a data section whose compressed size exceeds the 20k, namely the first section (22379 bytes).  This is actually smaller than your example file's first section which was only 20611 bytes.  On the other hand, it's definitely the case that in all the official vgagr# files from Lemmings and ONML, their second data sections were all under 20k.  Hmm...do you remember if the second data section is for terrain or object graphics?

==================

This is rather not good news for me, since it means I probably should look into improving my own algorithm as well in terms of the compression ratio, if Lemmings is indeed so unforgiving.  Good thing that at least so far no one has run into similar problems yet with myvgaspec.

Thanks for alerting me to this.

Mindless

You should only run into that problem if your compressor compresses in MANY one byte chunks, so basicly your compressor should never cause a problem.

---

Terrain is first in the vgagr# files, then objects.

---

On another note, I've decompressed and recompressed some of the Lemmings DATs, namely "main.dat" and "vgagr0.dat" and my compression method produces files that are a often few kB smaller than the Psynosis compression! :D

MAIN.DAT - 56,472 bytes
recompressed - 53,445 bytes

LEVEL000.DAT - 3,722 bytes
recompressed - 3,717 bytes

VGAGR0.DAT - 24,464 bytes
recompressed - 22,928 bytes

VGASPEC0.DAT - 28,655 bytes
recompressed - 28,146 bytes

ADLIB.DAT - 12,988 bytes
recompressed - 12,790 bytes

ccexplore

Quote from: Mindless  link=1125127380/0#10 date=1125277666You should only run into that problem if your compressor compresses in MANY one byte chunks, so basicly your compressor should never cause a problem.
I'm not 100% sure.  Keep in mind that all along, I simply test my compression algorithm by doing a comparison with the original decompressed data and the re-decompressed (is that a word?  ;P) data.  I haven't really tested it out in LemEdit/Custlemm like you did, with the exception of the example I made for myvgaspec.

From what I know about how the game decompresses, there should be no reason why it would choke on too many one-byte chunks.  So whether the problem is because the data is too large or because there are too many one-byte chunks, there's definitely something that is worth investigating.

QuoteOn another note, I've decompressed and recompressed some of the Lemmings DATs, namely "main.dat" and "vgagr0.dat" and my compression method produces files that are a few kB smaller than the Psynosis compression! :D
Good job!  B)

Of course, the computers back then are a lot slower and has less RAM, so it's quite possible that Psynosis has to stick with a less efficient algorithm in order to ship the game on time.  ;P

Mindless

Alright, I think I've worked out all the bugs... it probably needs more error checking, but I'm gonna release it anyway...
A working version of my ex-anticompressor: http://it.travisbsd.org/lemmings/files/lemmingsdatcompressor/lemmingsdatcompressor_1_1_1.zip

Mindless

I've been rummaging thru main.dat and here's what's what.
main.dat section 0:
  lemming animations

main.dat section 1:
  impending doom (nuke) counter
  terrain destruction masks (nuke, mine, bash, maybe others)

main.dat section 2:
  skill numbers

main.dat section 3:
  brown background
  lemmings holding signs
  music/fx sign

main.dat section 4:
  purple text
  blinking lemming eyes
  scroller (lemmings and reel)
  difficulty selector sign

main.dat section 5:
  unknown

main.dat section 6:
  skill panel graphics
  green numbers and letters

And here's an empty main.dat (all 0x00's).  Invisible Lemmings!
http://it.travisbsd.org/lemmings/_misc/empty_main.zip

ccexplore

Cool!  B)  The final missing piece of information finally falls into place.

Did you figure this out by examining the data in the actual file, or did you ask Mike for help?

If possible, I'd love for you to give us the offsets to the various bitmaps you found in main.dat.  I presume the format should be either some version of a planar bitmap.  Most likely a 3-bit one, possibly with an additional mask.

Is section #5 (decompressed) large?  If so it's most likely the intro screen (main menu) as a single bitmap.

Thanks!  B)

P.S.  Oh, and don't forget to do it for the Xmas Lemmings, which have different graphics in particular the lemmings animations.