Parsing custom-made archive

Question

Parsing custom-made archive

Levi Price

hey Any Forums, trying to fuck around with this game i played as a kid (pac man adventures in time) - the data for the game (maps/textures/models/etc) is stored in one big .pac archive - it's a custom made archive format (no compression im pretty sure), just filenames and then a value pointing to where that file begins in the archive.

Im very fluent in python but have not ever done something like this. Is it achievable to write a script that can unpack / repack .pac archives only using pure python like going open() and trying to write a script like this ? or are there libraries suited to this sort of thing? literally any sort of info at all helps because atm im not really even sure what to google

Attached: unknown.png (146x370, 34.05K)

June 5, 2022 - 18:10

Other urls found in this thread:

drive.google.com/file/d/1QgEkHVFlMjsoiHBmY_sQV1MXQq9MzGYr/view?usp=sharing
docs.hhdsoftware.com/hex/definitive-guide/structure-viewer/structure-editor/overview.html
twitter.com/NSFWRedditVideo

Aaron Kelly

sorry forgot to say that pic related is just a hex editor at the top of the archive i want to parse

June 5, 2022 - 18:11

Gavin James

Reverse engineer the executable to know how and from where data is being read

June 5, 2022 - 18:14

Connor Taylor

Yes, you can open it with python streams, read it into a struct, locate the extents of the data the header points to and read them out and back in.

June 5, 2022 - 18:16

Carter Brown

read file path
create a file in the aforementioned path
read the file contents at the address that comes after the filename
write them to the file you created before
do the same with the next file until you run out of files
???
profit

i imagine with python it would be very gruesome, id do this with c/c++ myself

C++ psuedocode:
FILE* f = fopen("file.pac", "r");
size_t fileSize = ...;

// load the file into memory, into buf specifically
uint8_t* buf = (uint8_t*)malloc(fileSize);
fread(buf, 1, fileSize, f);

// get the first file
const char* filePath = buf + ...;
// address of the contents of the file
uint8_t* contents;
FILE* output;

// repeat this until we have gone through every single file
while (...)
{
contents = ...; // probably filePath + strlen(filePath) + 1 or something
// create folders and open a file to write to
output = CreateFoldersAndFile(filePath);
fwrite(contents, 1, ..., output);
fclose(output);

// advance onto the next file
filePath = ...;
}

OP, if you want you can send me the file and i can spend 10m or so on it

June 5, 2022 - 18:42

Lincoln Watson

There used to be a hex editor where you could visually create structs within it and see if they "fit" the data. It would then output C code itself. I can't for the life of me remember what it was called.

June 5, 2022 - 18:57

Elijah Gray

good to know, thank you - was my first instinct and i'll try this

absolutely, im uploading it to google drive rn - appreciate this a lot

June 5, 2022 - 19:12

Aiden Jackson

here is the .pac
drive.google.com/file/d/1QgEkHVFlMjsoiHBmY_sQV1MXQq9MzGYr/view?usp=sharing

June 5, 2022 - 19:15

Michael Sanders

i forgot about this, ill look into this in a moment, but i might go to sleep

June 5, 2022 - 21:17

Leo James

no worries if u sleep, reply you already gave has been incredibly helpful

June 5, 2022 - 21:54

Robert Torres

I've written a parser or two for some binary formats before.
Reverse engineering the code in the game that parses this might help. Depends on how complex it all is and how much is custom. Like, of you find BMPs in there, then you can carve those out pretty easily, and have more hints at how it all works.
Just think logically about how the data could be structured. Like a header with some magic bytes indicating the file type and maybe some metadata. Maybe not.
That picture seems to indicate it has a table near the beginning with some binary data and an ASCII path. The binary data could be offsets (relative to the start of the file or the encoded offset itself, who knows), maybe the length of the section is next to the offset as well, could be some more metadata. maybe it's terminated by some sentinel value.
Just keep a calculator nearby and jump to any locations you think could be useful offsets, tweak your assumptions about the format until the part you're looking at lines up with what's in the file.
Remember to try both big- and small-endian values. You might even have to use both in the same file.
Could be a bit complex once you have the archive extracted and have a bunch of complex files you don't know anything about, but you can figure it out given enough time looking at the disassembled/decompiled version of the program that is supposed to parse it (Pacman).

June 5, 2022 - 21:58

Joshua Ramirez

BTW I think the hex editor for creating structures might be this:

docs.hhdsoftware.com/hex/definitive-guide/structure-viewer/structure-editor/overview.html

but it's been many years so I'm not sure

June 5, 2022 - 21:58

Isaiah Murphy

I've never played the game but if it has a .dll it might also be possible just to find entry points for data loading functions and call them directly.

June 5, 2022 - 22:00

Adam Morris

Format is retarded but I started making heads and tails outof it
I'm not sure about first 16 bytes, there is magic word + some data
then there is count for folders, each folder path is max 36 (32?) chars and it ssems like not used data is padded with zeroes, last 4 bytes is file count inside folder, looks like no included folders.
Once you past 4 first folders, files start, they are always 48 bytes sized blocks, strings as well null-terminaed and padded, not sure how much and what extra data means but most likely size in bytes so you know how much binary data to read per file. So for each 4 folders you have x files.
havent looked further, but I'm gonna bet the data itself comes for each file.
It is parseable I suppose

June 5, 2022 - 22:56

Alexander Watson

Nevermind, figured it all out.
The extra data appended to filenames blocks is offset in the file where data starts and 8byte long integer for size (???). The latter might be wrong as it makes no sense to have 4bytes for offset and 8 for size, last 4 bytes probably used for something. But I have confirmed offsets and they match.

Also figured out what extra data for folders means. As I said, each folder data is 40 bytes, null-term string paddded with zeroes and then 2 integers: which file count the folder data starts with and how many files for this specific folder.

Okay so the structure is:
char unk[16];
uint fldrCount;
uint fileCount;

struct FLD {
char path[32];
uint startIndex;
uint endIndex;
} Folders[fldrCount] ;

struct FLS {
char name[36];
uint absOffset;
uint Size;
uint unk;
} Files[fileCount] ;

The rest of binary data you will pars/write yourself.
Godspeed pilgrim!

June 5, 2022 - 23:15

Brayden Foster

Looks like you got it, not bad. Like you said other files in there will have to be figured out, some like .tga files already have readers for them so that's easy.

Don't have anything to add other than to say good job.

June 6, 2022 - 01:51

Jack Kelly

An example of a ripped file

Attached: file.png (256x256, 7.36K)

June 6, 2022 - 02:00

Jackson Sanders

What's the name for this in the file OP gave? I assume it's "font.tga" or something like that.

June 6, 2022 - 02:02

Nolan King

overlayfont1.tga

June 6, 2022 - 02:02

Blake Wood

Write a quickbms script instead of messing around trying to write your own program, quickbms can handle unpacking/repacking for you automatically once you've made a script for it, it's made for game archives like that

June 6, 2022 - 02:05

Landon Bennett

There's a tool called kaitai struct that will help you out. You define a structure using yaml and can export that to a python module to do the actual dumping.

June 6, 2022 - 02:09

1 2 3 Next

Parsing custom-made archive

Last threads