Daikatana PAK format
A complete reference for the Daikatana (Ion Storm, 2000) PAK archive format as implemented by pakka.
Daikatana paks share Quake’s PACK magic and 12-byte header but widen
the directory record from 64 to 72 bytes, adding a per-entry
compressed_length and is_compressed flag. Compressed entries use a
custom byte-codec (LZSS-style) documented in §3 below.
1. Sources
1.1 Primary references
- Daniel Gibson, “Daikatana .pak format” (gist) — https://gist.github.com/DanielGibson/8bde6241c93e5efe8b75e5e00d0b9858. This is the canonical narrative reference and the basis for §2 and the decoder opcode table in §3.
- yquake2/pakextract — reader implementation at
pakextract.c(BSD-2-Clause). Reference C decoder; pakka’spakka_dk_inflateinsrc/dk_codec.cmatches its behaviour exactly. The-dkflag selects the DK row. dktools_readme.htm(John Romero, Ion Storm 2000) — public documentation shipped with the Daikatana editing-tools distribution (dktools_11h). Section 4 coversdkpak.exe, the packer Ion Storm used to build the game’s shipped paks. Default mode auto-allocates files by type into three output paks (base/models/maps);-userpakmode collapses that into a single pak and preserves whatever staging directory tree the author provides. Mirror hosting the tools and readme: https://dk.toastednet.org/files.html.- pakka’s own implementation:
src/dk_codec.c(decoder + encoder),src/pakfile.c(directory I/O, add-path policy, on-disk-extent helper), andsrc/common.c(PAK-class geometry table).
1.2 Additional references
- Internet Archive Daikatana source collection — https://archive.org/details/daikatana-src-collection. Mirror of talonbrave’s archive of the Daikatana 1998 → 1.3 source tree (community-released, not first-party id-style GPL); useful for historical context on the engine fork lineage.
- atsb/daikatana-restoration-project — https://github.com/atsb/daikatana-restoration-project. Modern open-source restoration of the 1.2 source aimed at running on contemporary systems; alternative engine PAK reader.
- MacSourcePorts/daikatana — https://github.com/MacSourcePorts/daikatana. macOS-targeted port of the Daikatana 1.3 codebase (note: the upstream is not formally open source; this repo packages community changes).
- DaiPAK on ModDB —
https://www.moddb.com/games/daikatana/downloads/daipak. Archived
third-party PAK creation tool (separate from Ion Storm’s
dkpak.exe); useful for differential testing against pakka’s encoder output. - DeathEngine2 GitHub — https://github.com/DeathEngine2. Hosts community-maintained source snapshots of John Romero’s Daikatana code.
- Ion Storm tools mirror —
https://dk.toastednet.org/IonRadiant/installing.shtml. Workflow
documentation for the original editing tool chain (IonRadiant +
dkpak.exe); the in-archive layoutdkpakproduces is the upstream test of any third-party encoder’s compatibility.
The encoder strategy in §4 is pakka’s own design, justified by per-token
byte-cost analysis. The encoder produces streams that the decoder in
§3 accepts; correctness is checked at every layer by round-tripping
encoder output through the decoder in test/dk_codec_test.c, and
cross-validated against dkpak-produced output in
test/fixtures/dk/.
1.3 Preservation
Every live external reference in §1.1 and §1.2 was submitted to the
Wayback Machine on 2026-05-24. To fetch
a snapshot of any link above, prepend https://web.archive.org/web/
to its URL. Sources that carry their own archival guarantee (GitHub
source files, Internet Archive item pages) are excluded from the
submission set.
2. File layout
2.1 Header — 12 bytes
| Offset | Size | Field | Notes |
|---|---|---|---|
| 0 | 4 | signature |
"PACK" — bit-identical to Quake / Q2 PAK. |
| 4 | 4 | diroffset |
u32 LE — byte offset of the directory block. |
| 8 | 4 | dirlength |
u32 LE — directory size in bytes (72 × N). |
Because the magic collides with Quake PAK, an open-time layout probe
disambiguates: each candidate geometry (64-byte Quake row vs 72-byte
Daikatana row) is validated against every directory entry’s
offset + on-disk extent vs the file size. If exactly one parses
cleanly, it wins; if both parse, callers must pass --format
daikatana (or PAKKA_FORMAT_DAIKATANA to pakka_open_ex) to
disambiguate. Empty archives (dirlength == 0) bias to Quake PAK
unless the DK hint is explicit. See probe_pak_layout in
src/pakfile.c.
2.2 Directory entry — 72 bytes
| Offset | Size | Field | Notes |
|---|---|---|---|
| 0 | 56 | filename |
NUL-padded; forward-slash path separators. |
| 56 | 4 | file_pos |
u32 LE — offset of payload bytes in the archive. |
| 60 | 4 | file_length |
u32 LE — uncompressed size. |
| 64 | 4 | compressed_length |
u32 LE — on-disk byte extent for compressed entries; 0 for STORED. |
| 68 | 4 | is_compressed |
u32 LE — 0 = STORED (raw bytes), non-zero = encoded per §3. |
Mixed STORED + compressed in the same archive is legal. Compression is a per-entry decision; the same archive can carry both. pakka’s encoder uses this to apply compression selectively (§5).
file_length is always the uncompressed size. For STORED entries
this is also the on-disk byte extent; for compressed entries the
on-disk extent is compressed_length. Every site in pakka that walks
the byte layout — copy_between_paks, compute_payload_end,
write_pak_directory — routes through the
pak_entry_on_disk_extent(e) helper to pick the right value.
2.3 Endianness and limits
Every on-disk u32 is little-endian regardless of host byte order. In
pakka the directory I/O sites use pakka_read_u32_le /
pakka_write_u32_le from src/common.c; the CI matrix includes a
big-endian s390x job to keep this honest. Offsets and lengths are
capped at UINT32_MAX (4 GiB) at every write site.
2.4 Quake-vs-Daikatana disambiguation — decision table
Both Quake PAK and Daikatana use the "PACK" magic, so the open-time
layout probe in src/pakfile.c::probe_pak_layout has to disambiguate
geometry. The probe runs two checks per candidate row size (64-byte
Quake vs 72-byte Daikatana): first that dirlength divides evenly
into the row, then that every directory entry’s file_pos +
on-disk extent lands inside the captured file size and that
file_pos >= PAKFILE_HEADER_SIZE (12). The decision table:
| Quake probe (64-byte rows) | DK probe (72-byte rows) | Result |
|---|---|---|
| OK | divisibility-fail (% 72 ≠ 0) |
Quake (PAKKA_FORMAT_PAK). |
divisibility-fail (% 64 ≠ 0) |
OK | Daikatana (PAKKA_FORMAT_DAIKATANA). |
| OK | extent-fail (% 72 == 0 but entries don’t fit) |
Quake. |
extent-fail (% 64 == 0 but entries don’t fit) |
OK | Daikatana — the common shipping-DK case, since compressed_length only makes sense in 72-byte rows. |
| OK | OK | Ambiguous → PAKKA_ERR_FORMAT (“Ambiguous PACK archive (parses as both Quake and Daikatana); pass –format pak or –format daikatana”). |
| Fail (any reason) | Fail (any reason) | PAKKA_ERR_FORMAT (“PACK archive does not parse as Quake or Daikatana (entry offset/length out of range)”). |
dirlength == 0 (empty) |
dirlength == 0 (empty) |
Short-circuited to Quake (PAKKA_FORMAT_PAK) without running either probe — empties would trivially pass both. Pass --format daikatana to force DK identity. |
An explicit format_hint from pakka_open_ex or --format short-
circuits the probe entirely — the hint is asserted against the
on-disk magic and the row size is taken from the hint. This is the
safe path for callers who know what they have.
In practice every shipping Daikatana pak has at least one
compressed entry, so the 72-byte row’s extra compressed_length
field validates the on-disk extent and the 64-byte row’s extent
check fails. Synthetic fixtures live at test/dk_test.c:
test_ambiguous_576_rejected exercises the “both probes fail”
path (a 576-byte all-zero directory that divides into both row
sizes but whose zeroed file_pos fails the file_pos >= 12
check under both layouts); test_create_empty_then_open_with_hint
exercises the empty-archive short-circuit. There is currently no
fixture for the genuine both-probes-pass ambiguity — that would
need a hand-crafted directory whose row layout is consistent under
both 64- and 72-byte interpretations.
3. Wire format: byte-codec opcode table
Compressed entries (is_compressed != 0) carry a stream of variable-
length opcodes. The decoder reads one control byte c at a time and
dispatches:
| Control byte | Class | Length expression | Payload | Output |
|---|---|---|---|---|
0x00..0x3F |
literal run | c + 1 (1..64) |
c+1 raw bytes from input |
Copy them straight to output. |
0x40..0x7F |
zero run | c - 0x3E (2..65) |
(none) | Write c - 0x3E zero bytes. |
0x80..0xBF |
byte run | c - 0x7E (2..65) |
1 byte | Write the payload byte c - 0x7E times. |
0xC0..0xFD |
back-ref | c - 0xBE (2..63) |
1 byte (off) |
Copy c - 0xBE bytes from produced - (off + 2). |
0xFE |
invalid | — | — | Format error. |
0xFF |
terminator | — | (none) | End of stream. |
Back-reference distance is off + 2, range 2..257. The copy is into
already-decoded output, so LZ-style overlap is legal: when the copy
length exceeds the distance, the freshly-written tail bytes feed back
into the same copy operation, repeating a window of distance bytes.
The decoder must do this byte-by-byte (not memcpy / memmove) for
correctness.
Termination. A well-formed stream ends with 0xFF, but the decoder
also accepts clean input exhaustion provided produced == out_len. Any
under-fill (produced < out_len at end-of-input) is a format error.
Strict bounds. Every input read is checked against in_len, every
output write against out_len, every back-reference distance against
produced. Any violation returns PAKKA_ERR_FORMAT; the output tail
beyond the failure point is left uninitialised and callers must treat
partial output as garbage.
3.1 Implementation in pakka
pakka_dk_inflate(const unsigned char *in, size_t in_len,
unsigned char *out, size_t out_len, pakka_error_t *err)
in src/dk_codec.c. Whole-buffer decode — pakka allocates a buffer
of size entry->file_length and runs the codec to completion. The
per-archive max_decompressed cap (default 64 MiB, settable via
pakka_set_max_decompressed_size) bounds peak RSS against a malicious
or accidentally-oversized declared file_length.
4. Encoder strategy
pakka’s pakka_dk_deflate is a single-pass greedy matcher over the
back-reference distance range (2..257). At each input position it
gathers four candidate encodings, picks the best by token cost, and
emits.
4.1 Candidate tokens
For each input position i:
z— zero-run length. Largestnsuch thatin[i .. i+n-1]are all0x00, capped atDK_MAX_ZERO_RUN = 64.r— byte-RLE length. Largestnsuch thatin[i .. i+n-1]are allin[i], capped atDK_MAX_BYTE_RUN = 64.m/dist— best back-reference. Searches distancesdin[2, 257]for the longest match starting ati, with match length capped atmin(DK_MAX_BACK_REF, d, remaining)whereDK_MAX_BACK_REF = 62. The fast-path filterin[i - d] != in[i]skips most candidates without the inner loop.- literal byte — always available as a fallback (cost 1 byte amortised inside a 64-byte literal run).
4.2 Conservative caps
The encoder uses caps one step inside the decoder envelope:
| Token | Decoder accepts | Encoder emits |
|---|---|---|
| Zero run | 2..65 | 2..64 |
| Byte-RLE | 2..65 | 2..64 |
| Back-ref | 2..63 | 3..62 |
| Literal | 1..64 | 1..64 |
A length-2 byte-RLE or back-ref is a 2-byte opcode replacing 2 literal bytes — break-even on bytes, but it splits any in-progress literal run and forces a re-open downstream. Skipping these tokens avoids encodings that almost always cost more than they save.
4.3 No-overlap rule
The encoder caps each back-reference at length ≤ distance. The
decoder accepts longer back-refs via LZ-style overlap (§3), but the
encoder skips that case. Cost in compression ratio is small (overlap
matches only win for highly repetitive patterns like "ababab...") and
the benefit is a single-pass matcher with no overlap special case.
4.4 Decision tree
Once z, r, m are gathered, sub-threshold candidates are zeroed
out (the minimum profitable lengths are zero-run 2, byte-RLE and
back-ref 3, capturing the per-token byte cost). The remaining choice:
if (z == 0 && r == 0 && m == 0) {
emit literal byte (deferred — accumulated in a 64-byte run)
} else if (z * 2 > r && z * 2 > m) {
emit zero-run of length z
} else if (m > r) {
emit back-ref of length m at distance dist
} else {
emit byte-RLE of length r
}
Zero-run preference (the 2 × tiebreak). A zero-run is a 1-byte
opcode (no payload). Byte-RLE and back-ref are 2-byte opcodes (op +
payload). At equal length the zero-run saves one byte vs the
alternatives, so it deserves preference; doubling the score reflects
that asymmetry across a range of competing lengths.
Greedy. No deferred matching or lazy evaluation. Whichever encoding
wins at position i is committed immediately, and i advances by the
match length. Greedy is enough for typical Daikatana content
(palettized images, BSP files); the round-trip test suite catches any
encoder bug that would produce decoder-invalid output.
4.5 Worst-case output bound
Worst case is literal-only: every input byte costs 1 byte plus
ceil(in_len / 64) opcode bytes plus 1 terminator. In u64:
worst = in_len + (in_len + 63) / 64 + 1
For in_len = 4 GiB - 1 this slightly exceeds UINT32_MAX. Callers
(see pakka_add_file and pakka_add_memory in src/pakfile.c)
compute this bound in uint64_t, compare against UINT32_MAX, and
skip the encoder entirely if it would overflow — falling back to
STORED for the source. The on-disk compressed_length field is u32,
so anything that wouldn’t fit must be STORED.
4.6 STORED auto-fallback
After encoding, the caller compares encoded_size against
source_size. If the encoded form is not strictly smaller than the
source, the entry is rewritten as STORED instead — same policy as
info-zip and as pakka’s PK3 / PK4 DEFLATE add path. The directory’s
is_compressed flag is cleared and compressed_length left at 0.
5. Add-path extension policy
pakka applies the encoder automatically based on the entry name’s
extension. The five extensions Daniel Gibson’s gist documents as
“compressed when packing” are the policy in src/pakfile.c’s
dk_compressible_extension:
| Extension | Typical Daikatana content |
|---|---|
.tga |
Targa textures. |
.bmp |
Windows bitmap textures. |
.wal |
Quake 2-style wall textures (palettized). |
.pcx |
PCX images. |
.bsp |
Compiled map files. |
Match is case-insensitive, ASCII-only (no tolower() — locale-
dependent). Everything else is STORED. There is no user knob;
--compress stays a PK3 / PK4 affordance. DK callers select between
STORED and compressed by naming entries appropriately.
6. I/O integration
6.1 Reading
pakka_open_ex(path, PAKKA_OPEN_READ, PAKKA_FORMAT_DAIKATANA, ...) skips
the PACK ambiguity probe and asserts the on-disk magic is PACK. The
directory loader parses 72-byte rows including the two extra u32s; on
mixed-archive content the per-entry is_compressed flag drives whether
pakka_open_entry / pakka_reader_read route through the codec or read
raw bytes.
6.2 Writing
pakka_create(path, PAKKA_FORMAT_DAIKATANA, ...) opens a temp file and
stamps a 12-byte PACK header with dirlength = 0. Subsequent
pakka_add_file / pakka_add_memory calls run the encoder per the
extension policy (§5) and update the in-memory entry list with
dk_is_compressed / dk_compressed_size. Commit modes:
- Add-only.
pakka_commit(also called implicitly bypakka_close) writes the directory at byte-offsettail->offset + pak_entry_on_disk_extent(tail)and rewrites the header. - Rebuild. Any
pakka_deleteforces a rebuild via a scratch temp: surviving entries get copied (their on-disk extents, opaque to the rebuild — compressed entries pass through unchanged), a fresh directory is written, and the scratch is renamed over the original.
The 4 GiB append-limit check uses the physical write size (encoded
or STORED, whichever was committed), not entry->length. Set in
pakka_add_file / pakka_add_memory via physical_size.
6.3 Verify
pakka_verify walks every entry, runs the name-safety check, streams
the payload to confirm offset and on-disk extent point at readable
bytes, and flags portable-union collisions on extraction. With
PAKKA_VERIFY_DEEP, compressed entries are inflated through
pakka_dk_inflate to confirm the byte-codec stream is well-formed and
produces exactly file_length output bytes (the same exact-length
contract the on-disk format requires).
7. Cross-format notes
DK shares the PACK magic with Quake / Q2 PAK and the 56-byte
filename field. The differences are entirely in the directory layout
(72 vs 64 bytes) and the compression flag. pakka’s PAK-class geometry
table in src/common.c enumerates the three rows side by side:
signature name_field_len dir_entry_size has_compression
Quake / Q2 PAK "PACK" 56 64 no
SiN "SPAK" 120 128 no
Daikatana "PACK" 56 72 yes
Every PAK-class read/write site dispatches off the row, so adding a new variant means appending a row, not forking every code path.
8. Test coverage
test/dk_codec_test.c— direct codec exerciser. Decoder covers every opcode class, bounds violations, termination contracts. Encoder covers empty input, single literal, zero run, byte-RLE, alternating pattern (back-ref), high-entropy fallback, undersized output buffer, and a 4 KiB mixed-payload round-trip.test/dk_test.c— end-to-end CLI tests. STORED open / extract, compressed open / extract, create empty, add STORED (non-listed ext), add compressed (listed ext + redundant payload), add STORED fallback (listed ext + incompressible payload), mixed-extension layout check, delete-with-compressed-survivors round-trip, deep verify on both synthetic and pakka-built archives, ambiguity detection.test/c_api_test.c— public API round-trip:pakka_createfor DK +pakka_add_memoryof compressible bytes +pakka_close+pakka_open_exwith DK hint +pakka_read_entry_alloc+ memcmp.
Cross-validation against an external encoder is wired up at
test/fixtures/dk/: synthetic input bytes (inputs/*, generated by
inputs/generate.py) are packed into user.pak by dkpak;
test/dk_test.c consumes the result and byte-compares every
extracted payload against the committed source. The case hard-fails
when user.pak is missing so accidental deletion can’t silently
drop the coverage.