diff --git a/Computer Science MOC.md b/Computer Science MOC.md
index 6e0baae..4d7f56a 100644
--- a/Computer Science MOC.md	
+++ b/Computer Science MOC.md	
@@ -70,6 +70,15 @@ Fundamental CS concepts, data structures, algorithms, and system design.
 
 - [[Testing Strategies]] — Unit, integration, E2E, BDD, property-based
 
+### File Formats & Media
+
+- [[File Formats]] — Binary structure, magic bytes, headers, and trailers
+- [[Image Formats]] — JPEG, PNG, GIF, WebP, AVIF internals
+- [[File Metadata]] — EXIF, GPS, XMP, ID3 metadata systems
+- [[Audio and Video Formats]] — Codecs, containers, streaming
+- [[Archive and Compression Formats]] — ZIP, tar, gzip, zstd, brotli
+- [[Document Formats]] — PDF, DOCX, EPUB internals
+
 ### Reference
 
 - [[Technical Measurements]]
diff --git a/Computer Science/Archive and Compression Formats.md b/Computer Science/Archive and Compression Formats.md
new file mode 100644
index 0000000..48add0b
--- /dev/null
+++ b/Computer Science/Archive and Compression Formats.md	
@@ -0,0 +1,304 @@
+---
+title: Archive and Compression Formats
+aliases:
+  - Compression Formats
+  - Archive Formats
+  - ZIP Format
+  - Compression Algorithms
+tags:
+  - cs
+  - fundamentals
+  - file-formats
+type: concept
+status: complete
+difficulty: fundamentals
+created: "2026-02-19"
+---
+
+# Archive and Compression Formats
+
+How files are bundled together (archiving) and made smaller (compression) — from ZIP internals to modern algorithms like Zstandard and Brotli.
+
+## Archive vs Compression
+
+These are separate concepts, often combined:
+
+| Concept | What It Does | Examples |
+|---------|-------------|----------|
+| **Archive** | Bundles multiple files into one | TAR, CPIO, AR |
+| **Compression** | Reduces file size | gzip, bzip2, zstd, brotli, LZMA |
+| **Both** | Bundles + compresses | ZIP, 7z, RAR |
+
+TAR was designed for tape archives and handles **only** archiving. Compression is applied separately:
+
+```bash
+# Archive only (no compression)
+tar cf archive.tar files/
+
+# Archive + gzip
+tar czf archive.tar.gz files/
+
+# Archive + zstandard
+tar --zstd -cf archive.tar.zst files/
+
+# Archive + bzip2
+tar cjf archive.tar.bz2 files/
+
+# Archive + xz (LZMA2)
+tar cJf archive.tar.xz files/
+```
+
+---
+
+## Compression Algorithms
+
+### Comparison
+
+| Algorithm | Ratio | Compress Speed | Decompress Speed | Used In |
+|-----------|-------|----------------|------------------|---------|
+| DEFLATE | Good | Moderate | Fast | ZIP, gzip, PNG, HTTP |
+| LZ4 | Low | Very fast | Very fast | Filesystem compression, real-time |
+| Zstandard (zstd) | Excellent | Fast | Very fast | Kernel, packaging, databases |
+| Brotli | Excellent | Slow | Fast | HTTP (WOFF2, web assets) |
+| LZMA/LZMA2 | Best | Very slow | Moderate | 7z, xz |
+| bzip2 | Good | Slow | Slow | Legacy, some distro packages |
+| LZW | Moderate | Fast | Fast | GIF (legacy) |
+| Snappy | Low | Very fast | Very fast | Google internal, Hadoop |
+
+### How LZ77/DEFLATE Works
+
+Most general-purpose compression builds on LZ77, which replaces repeated sequences with back-references:
+
+```
+Input:  "ABCABCABCXYZ"
+                     ↓
+Step 1: Output literal "ABC"
+Step 2: See "ABC" repeats → output (distance=3, length=6)
+Step 3: Output literal "XYZ"
+
+Compressed: ABC <3,6> XYZ
+
+The decoder reads forward:
+  "ABC" → emit as-is
+  <3,6> → go back 3 chars, copy 6 chars → "ABCABC"
+  "XYZ" → emit as-is
+  Result: "ABCABCABCXYZ"
+```
+
+DEFLATE combines LZ77 with Huffman coding — after finding repeated patterns, it Huffman-encodes the literals and back-references for additional compression.
+
+### Zstandard (zstd)
+
+Facebook/Meta's modern replacement for gzip. Uses finite state entropy (ANS) instead of Huffman coding and has a dictionary mode for compressing many small items.
+
+```bash
+# Compress (default level 3)
+zstd file.dat
+
+# Compress with level 19 (max practical)
+zstd -19 file.dat
+
+# Train a dictionary on similar files (e.g., JSON logs)
+zstd --train samples/* -o dictionary
+
+# Compress using dictionary
+zstd --dict dictionary file.json
+```
+
+Key advantage: zstd decompression speed is nearly constant regardless of compression level. You can spend more time compressing (once) and decompress quickly (many times).
+
+### Brotli
+
+Google's algorithm optimized for web content. Includes a built-in dictionary of common web strings (HTML tags, CSS properties, JavaScript keywords).
+
+```bash
+# Compress for web serving (level 11 = max)
+brotli -q 11 styles.css
+
+# Content-Encoding header in HTTP
+Content-Encoding: br
+```
+
+Typical web asset savings over gzip: 15-25% smaller.
+
+---
+
+## ZIP
+
+The most widely used archive+compression format. Also the foundation for DOCX, XLSX, JAR, APK, EPUB, and many other formats.
+
+### ZIP File Structure
+
+ZIP is unusual — the authoritative file index is at the **end** of the file, not the beginning:
+
+```
+┌────────────────────────────────────────────┐
+│ Local File Header 1                         │
+│   50 4B 03 04 (PK..)  ← signature          │
+│   version, flags, compression method        │
+│   CRC-32, sizes, filename                   │
+│ [File Data 1 - compressed]                  │
+├────────────────────────────────────────────┤
+│ Local File Header 2                         │
+│ [File Data 2 - compressed]                  │
+├────────────────────────────────────────────┤
+│ ...more files...                            │
+├────────────────────────────────────────────┤
+│ Central Directory                           │  ← The actual index
+│   50 4B 01 02 (PK..)  ← entry signature    │
+│   Entry for File 1 (offset, size, name)     │
+│   Entry for File 2 (offset, size, name)     │
+│   ...                                       │
+├────────────────────────────────────────────┤
+│ End of Central Directory Record             │
+│   50 4B 05 06 (PK..)  ← EOCD signature     │
+│   Number of entries                         │
+│   Central directory offset                  │
+│   Comment                                   │
+└────────────────────────────────────────────┘
+```
+
+**Why the index is at the end:** ZIP was designed for appending files. You can add files to a ZIP without rewriting the entire archive — just append new local entries and write a new central directory.
+
+### ZIP Hex Walkthrough
+
+```
+Offset    Hex                                       Meaning
+00000000  50 4B 03 04                               Local file header signature
+00000004  14 00                                     Version needed: 2.0
+00000006  00 00                                     Flags: none
+00000008  08 00                                     Compression: DEFLATE
+0000000A  4A 7D                                     Mod time (MS-DOS format)
+0000000C  54 59                                     Mod date (MS-DOS format)
+0000000E  XX XX XX XX                               CRC-32
+00000012  XX XX XX XX                               Compressed size
+00000016  XX XX XX XX                               Uncompressed size
+0000001A  0A 00                                     Filename length: 10
+0000001C  00 00                                     Extra field length: 0
+0000001E  68 65 6C 6C 6F 2E 74 78 74 00             "hello.txt"
+00000028  [compressed data...]                      DEFLATE'd file content
+```
+
+### ZIP Compression Methods
+
+| Value | Method | Notes |
+|-------|--------|-------|
+| 0 | Stored | No compression (files already compressed) |
+| 8 | DEFLATE | Standard, universal support |
+| 9 | DEFLATE64 | Larger window, rare |
+| 12 | bzip2 | Better ratio, less common |
+| 14 | LZMA | 7-Zip format, uncommon in ZIP |
+| 93 | Zstandard | Modern, gaining support |
+| 95 | XZ (LZMA2) | Very high ratio |
+
+### ZIP-Based Formats
+
+| Format | Extension | Contents |
+|--------|-----------|----------|
+| Office Open XML | `.docx`, `.xlsx`, `.pptx` | XML + media files |
+| Java Archive | `.jar` | `.class` files + manifest |
+| Android Package | `.apk` | DEX + resources + manifest |
+| EPUB | `.epub` | XHTML + CSS + images |
+| OpenDocument | `.odt`, `.ods`, `.odp` | XML + media |
+| XPI (Firefox ext) | `.xpi` | Web extension files |
+| IPSW (iOS firmware) | `.ipsw` | Firmware images |
+
+---
+
+## TAR (Tape Archive)
+
+Unix archiving format from 1979. No compression — purely bundles files with metadata.
+
+### TAR Header Structure
+
+Each file is preceded by a 512-byte header block:
+
+```
+Offset  Size  Field
+0       100   Filename (null-terminated)
+100     8     File mode (octal ASCII)
+108     8     Owner UID (octal ASCII)
+116     8     Group GID (octal ASCII)
+124     12    File size (octal ASCII)
+136     12    Modification time (Unix epoch, octal)
+148     8     Header checksum
+156     1     Type flag ('0'=file, '5'=directory, '2'=symlink)
+157     100   Link target name
+257     6     "ustar" magic
+263     2     Version "00"
+265     32    Owner username
+297     32    Group name
+329     8     Device major
+337     8     Device minor
+345     155   Filename prefix (for paths > 100 chars)
+500     12    Padding to 512 bytes
+```
+
+File data follows immediately, padded to a 512-byte boundary. The archive ends with two consecutive 512-byte blocks of zeros.
+
+**Note:** TAR headers are entirely ASCII-encoded octal numbers, making them partially human-readable in a hex editor.
+
+---
+
+## Gzip
+
+The standard compression wrapper on Unix. Compresses a single stream using DEFLATE.
+
+### Gzip Header
+
+```
+1F 8B                  ← Magic number
+08                     ← Compression method: DEFLATE
+XX                     ← Flags (FTEXT, FHCRC, FEXTRA, FNAME, FCOMMENT)
+XX XX XX XX            ← Modification time (Unix epoch, LE)
+XX                     ← Extra flags (compression level hint)
+XX                     ← OS (0=FAT, 3=Unix, 7=macOS, 11=NTFS)
+[optional: original filename, null-terminated]
+[optional: comment, null-terminated]
+[DEFLATE compressed data]
+XX XX XX XX            ← CRC-32 of original data
+XX XX XX XX            ← Original size mod 2^32
+```
+
+The trailing CRC-32 and size allow integrity verification after decompression.
+
+---
+
+## 7z
+
+7-Zip's native format. Supports multiple compression methods and solid compression (compressing multiple files as a single stream for better ratio).
+
+### 7z Signature
+
+```
+37 7A BC AF 27 1C      ← Magic bytes: "7z" + 4 signature bytes
+00 04                  ← Format version
+[header with offsets to compressed streams and metadata]
+```
+
+7z achieves the best compression ratios among common formats by using LZMA2 with large dictionaries and solid compression, at the cost of slower compression speed and higher memory usage.
+
+---
+
+## Choosing a Format
+
+| Scenario | Recommended | Why |
+|----------|-------------|-----|
+| General file sharing | ZIP | Universal support, every OS handles it natively |
+| Unix/Linux packages | `.tar.gz` or `.tar.zst` | Standard convention, preserves permissions/ownership |
+| Maximum compression | `.tar.xz` or `.7z` | Best ratios, worth the slower compression |
+| Fast compression | `.tar.zst` or `.tar.lz4` | Near-instant compress/decompress |
+| Web assets | Brotli (`.br`) | Best ratio for HTTP, built-in web dictionary |
+| Incremental backups | TAR (append mode) | Add files without rewriting |
+| Cross-platform distribution | ZIP | Zero dependency on any platform |
+| Container/Docker layers | gzip or zstd | OCI standard, broad registry support |
+
+---
+
+## Related
+
+- [[File Formats]] — Parent overview of file format concepts
+- [[File Metadata]] — Metadata preserved in archives (timestamps, permissions)
+- [[Document Formats]] — PDF, DOCX, EPUB (ZIP-based formats)
+- [[Build Systems]] — Build tools that produce archives
+- [[Deployment]] — Container images and artifact packaging
diff --git a/Computer Science/Audio and Video Formats.md b/Computer Science/Audio and Video Formats.md
new file mode 100644
index 0000000..545e5b4
--- /dev/null
+++ b/Computer Science/Audio and Video Formats.md	
@@ -0,0 +1,294 @@
+---
+title: Audio and Video Formats
+aliases:
+  - Video Formats
+  - Audio Formats
+  - Media Formats
+  - Codecs
+  - Video Codecs
+  - Audio Codecs
+tags:
+  - cs
+  - fundamentals
+  - file-formats
+  - media
+type: concept
+status: complete
+difficulty: fundamentals
+created: "2026-02-19"
+---
+
+# Audio and Video Formats
+
+How audio and video are encoded, compressed, and packaged — the distinction between codecs (compression) and containers (packaging).
+
+## Codec vs Container
+
+The most important concept in media formats: **codecs** and **containers** are separate things.
+
+| Concept | What It Does | Examples |
+|---------|-------------|----------|
+| **Codec** | Compresses/decompresses audio or video data | H.264, H.265, AV1, VP9, AAC, Opus |
+| **Container** | Packages codec streams + metadata into a file | MP4, MKV, WebM, AVI, MOV, OGG |
+
+A container holds one or more streams (video, audio, subtitles, metadata) that can each use a different codec:
+
+```
+┌─ MP4 Container ──────────────────────────────┐
+│                                               │
+│  Stream 0: Video  → H.264 codec, 1080p 24fps │
+│  Stream 1: Audio  → AAC codec, 48kHz stereo   │
+│  Stream 2: Audio  → AAC codec, 48kHz 5.1      │
+│  Stream 3: Subtitle → SRT text                │
+│  Metadata: title, duration, GPS, chapters     │
+│                                               │
+└───────────────────────────────────────────────┘
+```
+
+---
+
+## Video Codecs
+
+### Codec Comparison
+
+| Codec | Standard | Compression | Licensing | Browser Support | Typical Use |
+|-------|----------|-------------|-----------|-----------------|-------------|
+| H.264 (AVC) | MPEG-4 Part 10 | Good | Patented (MPEG LA) | Universal | Web, streaming, Blu-ray |
+| H.265 (HEVC) | MPEG-H Part 2 | ~50% better than H.264 | Patented (expensive) | Safari, some others | 4K broadcast, Apple ecosystem |
+| AV1 | Alliance for Open Media | ~30% better than H.265 | Royalty-free | ~95% browsers | YouTube, Netflix, web |
+| VP9 | Google | ~similar to H.265 | Royalty-free | ~97% browsers | YouTube (legacy) |
+| VP8 | Google | ~similar to H.264 | Royalty-free | Wide | WebRTC (legacy) |
+| AV1 | AOMedia | Best current ratio | Royalty-free | Growing | Next-gen streaming |
+| ProRes | Apple | Visually lossless | Proprietary | N/A | Professional editing |
+
+### How Video Compression Works
+
+Video codecs exploit three types of redundancy:
+
+**Spatial** — Within a single frame, nearby pixels are similar (same as image compression).
+
+**Temporal** — Consecutive frames are mostly identical. Instead of storing every pixel for every frame, store the differences (motion vectors + residuals).
+
+**Perceptual** — Human vision is less sensitive to certain details. Quantize aggressively in areas the eye won't notice.
+
+### Frame Types
+
+| Type | Name | Description | Size |
+|------|------|-------------|------|
+| I-frame | Intra | Complete image (like a JPEG). Seek point. | Largest |
+| P-frame | Predicted | References previous frames. Stores only differences. | Medium |
+| B-frame | Bidirectional | References both past and future frames. | Smallest |
+
+```
+I ← P ← P ← B ← B ← P ← P ← I ← P ← P ...
+▲                              ▲
+Keyframe (seekable)            Keyframe (seekable)
+└──────── GOP (Group of Pictures) ────────┘
+```
+
+**GOP (Group of Pictures)** — The sequence between keyframes. Longer GOPs = better compression but slower seeking. Streaming services typically use 2-4 second GOPs.
+
+---
+
+## Audio Codecs
+
+### Codec Comparison
+
+| Codec | Type | Bitrate Range | Quality | Typical Use |
+|-------|------|---------------|---------|-------------|
+| MP3 (MPEG-1 Layer 3) | Lossy | 128-320 kbps | Good | Legacy music distribution |
+| AAC (Advanced Audio) | Lossy | 96-256 kbps | Better than MP3 | Streaming, Apple, YouTube |
+| Opus | Lossy | 6-510 kbps | Best lossy codec | VoIP, WebRTC, streaming |
+| Vorbis | Lossy | 64-500 kbps | Good | OGG containers, games |
+| FLAC | Lossless | 800-1400 kbps | Perfect | Archival, audiophile |
+| ALAC | Lossless | 800-1400 kbps | Perfect | Apple ecosystem |
+| WAV/PCM | Uncompressed | 1411 kbps (CD) | Perfect | Recording, editing |
+| AC-3 (Dolby Digital) | Lossy | 192-640 kbps | Good | DVD, Blu-ray, streaming surround |
+
+### How Audio Compression Works (MP3)
+
+```mermaid
+graph LR
+    A[PCM Audio] --> B[Subband Filter]
+    B --> C[Psychoacoustic Model]
+    C --> D[Quantization]
+    D --> E[Huffman Encoding]
+    E --> F[MP3 Frames]
+
+    style A fill:#E8E8E8
+    style F fill:#90EE90
+```
+
+1. **Subband filtering** — Split audio into 32 frequency subbands using a polyphase filter bank.
+2. **MDCT** — Modified Discrete Cosine Transform on each subband for finer frequency resolution.
+3. **Psychoacoustic model** — Determine which frequencies are inaudible due to masking:
+   - **Frequency masking:** A loud tone makes nearby quieter tones inaudible
+   - **Temporal masking:** A loud sound masks softer sounds just before and after it
+4. **Quantization** — Allocate bits based on the psychoacoustic model. Inaudible frequencies get fewer (or zero) bits.
+5. **Huffman encoding** — Entropy-code the quantized values.
+
+### MP3 Frame Structure
+
+MP3 is a **frame-based** format. Each frame is independently decodable (enabling seeking and streaming):
+
+```
+FF FB                    ← Sync word (11 bits all 1s) + header bits
+  Bits 12-13: MPEG version (11 = MPEG1)
+  Bits 14-15: Layer (01 = Layer III)
+  Bit  16:    CRC protection
+  Bits 17-20: Bitrate index
+  Bits 21-22: Sample rate (00 = 44100 Hz)
+  Bit  23:    Padding
+  Bit  24:    Private
+  Bits 25-26: Channel mode (00 = stereo)
+[Side information]       ← Huffman table selections, scalefactors
+[Main data]              ← Huffman-coded frequency data
+```
+
+Each frame at 128 kbps/44.1 kHz contains 1152 audio samples (~26ms of audio).
+
+---
+
+## Container Formats
+
+### Container Comparison
+
+| Container | Extension | Video Codecs | Audio Codecs | Features | Common Use |
+|-----------|-----------|-------------|-------------|----------|------------|
+| MP4 | `.mp4`, `.m4a`, `.m4v` | H.264, H.265, AV1 | AAC, AC-3, Opus | Chapters, subtitles, metadata | Web, streaming |
+| MKV | `.mkv`, `.mka` | Anything | Anything | Most flexible, multiple tracks | Desktop media |
+| WebM | `.webm` | VP8, VP9, AV1 | Vorbis, Opus | Web-optimized subset of MKV | Web video |
+| AVI | `.avi` | Legacy codecs | PCM, MP3 | Simple but limited | Legacy |
+| MOV | `.mov` | H.264, H.265, ProRes | AAC, ALAC | Apple's MP4 variant | Apple ecosystem, editing |
+| OGG | `.ogg`, `.ogv` | Theora | Vorbis, Opus | Open standard | Open source |
+| FLAC | `.flac` | N/A | FLAC only | Lossless audio | Audiophile, archival |
+| WAV | `.wav` | N/A | PCM (usually) | RIFF-based, uncompressed | Recording, editing |
+
+### MP4 Box Structure
+
+MP4 files are organized as nested "boxes" (atoms), each with a type and size:
+
+```
+Offset    Hex                          Meaning
+00000000  00 00 00 20                  Box size: 32 bytes
+00000004  66 74 79 70                  Box type: "ftyp" (file type)
+00000008  69 73 6F 6D                  Major brand: "isom"
+0000000C  00 00 02 00                  Minor version: 512
+00000010  69 73 6F 6D 69 73 6F 32     Compatible: "isomiso2"
+00000018  61 76 63 31 6D 70 34 31     Compatible: "avc1mp41"
+```
+
+Key boxes:
+
+```
+ftyp        ← File type / brand declaration
+moov        ← Movie metadata (MUST exist)
+├── mvhd    ← Movie header (duration, timescale)
+├── trak    ← Track (one per stream)
+│   ├── tkhd  ← Track header (dimensions, duration)
+│   └── mdia  ← Media data
+│       ├── mdhd  ← Media header (timescale, language)
+│       ├── hdlr  ← Handler (video/audio/subtitle)
+│       └── minf  ← Media information
+│           └── stbl  ← Sample table (codec config, offsets, sizes)
+└── udta    ← User data / metadata
+mdat        ← Actual compressed media data (bulk of the file)
+```
+
+**Fast-start (web streaming):** The `moov` box must appear **before** `mdat` for progressive download to work. Videos encoded without this require the entire file to download before playback starts. Fix with:
+
+```bash
+ffmpeg -i input.mp4 -movflags +faststart output.mp4
+```
+
+### RIFF / WAV Structure
+
+WAV files use the RIFF container — a simple chunk-based format:
+
+```
+52 49 46 46  [file size-8]   ← "RIFF" + remaining size
+57 41 56 45                  ← "WAVE" format identifier
+
+66 6D 74 20  [chunk size]    ← "fmt " chunk (audio format)
+  01 00                      ← Format: 1 (PCM)
+  02 00                      ← Channels: 2 (stereo)
+  44 AC 00 00                ← Sample rate: 44100
+  10 B1 02 00                ← Byte rate: 176400
+  04 00                      ← Block align: 4
+  10 00                      ← Bits per sample: 16
+
+64 61 74 61  [chunk size]    ← "data" chunk
+  [PCM audio samples...]     ← Raw audio data
+```
+
+At 16-bit stereo 44.1 kHz (CD quality), uncompressed audio is ~10 MB per minute.
+
+---
+
+## Streaming Formats
+
+Streaming video uses segmented delivery rather than single-file download:
+
+| Protocol | Format | Segments | Use Case |
+|----------|--------|----------|----------|
+| HLS | `.m3u8` playlist + `.ts` or `.mp4` segments | 2-10 second chunks | Apple, Safari, most CDNs |
+| DASH | `.mpd` manifest + `.mp4` segments | 2-10 second chunks | Cross-platform, YouTube |
+| WebRTC | Real-time packets | Per-frame | Video calls, live P2P |
+
+### Adaptive Bitrate
+
+Both HLS and DASH support **adaptive bitrate streaming** — multiple quality levels encoded, client switches based on bandwidth:
+
+```
+Master Playlist (HLS):
+  → 1080p @ 5 Mbps  (strong connection)
+  → 720p  @ 2.5 Mbps
+  → 480p  @ 1 Mbps
+  → 360p  @ 500 kbps (weak connection)
+
+Client monitors download speed and switches quality
+between segments to minimize buffering.
+```
+
+---
+
+## Practical Reference
+
+### Common Web Recommendations
+
+| Content | Format | Codec | Why |
+|---------|--------|-------|-----|
+| Video (broad support) | MP4 | H.264 + AAC | Universal browser support |
+| Video (modern) | MP4 or WebM | AV1 + Opus | Best compression, royalty-free |
+| Audio (music) | MP4 | AAC | Small, good quality |
+| Audio (speech) | WebM or OGG | Opus | Best at low bitrates |
+| Audio (lossless) | FLAC | FLAC | Open, widely supported |
+| Live/real-time | WebRTC | VP8/VP9/AV1 + Opus | Low latency |
+
+### Useful Commands
+
+```bash
+# Inspect media file
+ffprobe -v quiet -show_format -show_streams input.mp4
+
+# Convert video codec
+ffmpeg -i input.mov -c:v libx264 -c:a aac output.mp4
+
+# Extract audio from video
+ffmpeg -i video.mp4 -vn -c:a copy audio.m4a
+
+# Re-mux without re-encoding (change container)
+ffmpeg -i input.mkv -c copy output.mp4
+
+# Add fast-start for web streaming
+ffmpeg -i input.mp4 -c copy -movflags +faststart output.mp4
+```
+
+---
+
+## Related
+
+- [[File Formats]] — Parent overview of file format concepts
+- [[File Metadata]] — EXIF, ID3, XMP metadata systems
+- [[Image Formats]] — Still image format internals
+- [[Archive and Compression Formats]] — Compression algorithms shared with media
diff --git a/Computer Science/Document Formats.md b/Computer Science/Document Formats.md
new file mode 100644
index 0000000..ee0f237
--- /dev/null
+++ b/Computer Science/Document Formats.md	
@@ -0,0 +1,356 @@
+---
+title: Document Formats
+aliases:
+  - PDF Format
+  - Office Formats
+  - EPUB Format
+  - Document File Formats
+tags:
+  - cs
+  - fundamentals
+  - file-formats
+type: concept
+status: complete
+difficulty: fundamentals
+created: "2026-02-19"
+---
+
+# Document Formats
+
+How documents are stored digitally — from PDF's page description model to Office XML's ZIP-based structure and EPUB's web-standards approach.
+
+## Overview
+
+| Format | Structure | Editable | Layout | Use Case |
+|--------|-----------|----------|--------|----------|
+| PDF | Binary object graph | Difficult | Fixed (pixel-perfect) | Print, contracts, archival |
+| DOCX | ZIP of XML | Yes (Word) | Reflowable | Business documents |
+| ODT | ZIP of XML | Yes (LibreOffice) | Reflowable | Open-source documents |
+| EPUB | ZIP of XHTML + CSS | Yes | Reflowable | E-books |
+| RTF | Text markup | Yes | Basic | Legacy interchange |
+| Plain text | Raw bytes | Yes | None | Code, logs, notes |
+| LaTeX | Text markup | Yes (source) | Fixed (compiled) | Academic papers, math |
+
+---
+
+## PDF (Portable Document Format)
+
+Created by Adobe in 1993, now ISO standard 32000. Designed for **fixed-layout** documents that look identical everywhere.
+
+### How PDF Works
+
+A PDF is not a sequence of pages like you'd expect. It's an **object graph** — a collection of numbered objects that reference each other:
+
+```
+┌─────────────────────────────────────────────┐
+│ Header:  %PDF-1.7                            │
+├─────────────────────────────────────────────┤
+│ Body: Numbered objects                       │
+│                                              │
+│  1 0 obj  (Catalog - root of document)       │
+│    → points to Pages object                  │
+│  2 0 obj  (Pages - page tree)                │
+│    → points to individual Page objects        │
+│  3 0 obj  (Page 1)                           │
+│    → points to Content stream + Resources     │
+│  4 0 obj  (Content stream - drawing commands) │
+│    → moveto, lineto, show text, draw image    │
+│  5 0 obj  (Font - embedded or referenced)     │
+│    → TrueType/Type1/CID font data            │
+│  6 0 obj  (Image - embedded raster)           │
+│    → JPEG/CCITT/Flate compressed pixels       │
+│                                              │
+├─────────────────────────────────────────────┤
+│ Cross-Reference Table (xref)                 │
+│   Maps object numbers → byte offsets         │
+├─────────────────────────────────────────────┤
+│ Trailer                                      │
+│   Points to: Catalog, Info dict, xref offset │
+│   startxref [byte offset to xref]            │
+│   %%EOF                                      │
+└─────────────────────────────────────────────┘
+```
+
+### PDF Hex Walkthrough
+
+```
+Offset    Content                          Meaning
+00000000  25 50 44 46 2D 31 2E 37         "%PDF-1.7" header
+00000008  0A 25 E2 E3 CF D3 0A            Binary comment (signals binary content)
+
+          ...objects...
+
+          1 0 obj                          Object 1, generation 0
+          << /Type /Catalog               Catalog dictionary
+             /Pages 2 0 R >>              Reference to object 2
+          endobj
+
+          4 0 obj                          Content stream
+          << /Length 44 >>
+          stream
+          BT                               Begin Text
+          /F1 12 Tf                        Font F1, 12pt
+          100 700 Td                       Move to (100, 700)
+          (Hello, World!) Tj               Draw text
+          ET                               End Text
+          endstream
+          endobj
+```
+
+### PDF Content Streams
+
+Page content uses a PostScript-like drawing language:
+
+| Operator | Meaning | Example |
+|----------|---------|---------|
+| `BT` / `ET` | Begin/end text block | `BT ... ET` |
+| `Tf` | Set font and size | `/F1 12 Tf` |
+| `Td` | Move text position | `100 700 Td` |
+| `Tj` | Show text string | `(Hello) Tj` |
+| `m` | Move to point | `100 200 m` |
+| `l` | Line to point | `300 400 l` |
+| `S` | Stroke path | `S` |
+| `f` | Fill path | `f` |
+| `re` | Rectangle | `50 50 200 100 re` |
+| `Do` | Draw XObject (image) | `/Img1 Do` |
+| `cm` | Transform matrix | `1 0 0 1 100 200 cm` |
+
+### PDF Incremental Updates
+
+PDFs can be updated **without rewriting** — new objects and a new xref table are appended to the end:
+
+```
+[Original PDF content]
+[Original xref]
+[Original trailer + %%EOF]
+
+[New/modified objects]        ← Appended
+[New xref (references new objects)]
+[New trailer + %%EOF]
+```
+
+This is how form filling and digital signatures work — the original content is preserved and new data is layered on. It also means "deleted" content may still exist in the file.
+
+### PDF Versions and Features
+
+| Feature | PDF Version |
+|---------|-------------|
+| Basic text and images | 1.0 (1993) |
+| Interactive forms | 1.2 |
+| JavaScript | 1.3 |
+| Transparency | 1.4 |
+| Embedded multimedia | 1.5 |
+| AES encryption | 1.6 |
+| 3D content | 1.6 |
+| XFA forms | 1.5 |
+| PDF/A (archival) | Based on 1.4-1.7 |
+| PDF 2.0 (ISO 32000-2) | 2.0 (2017) |
+
+---
+
+## Office Open XML (DOCX, XLSX, PPTX)
+
+Microsoft Office's format since 2007. A **ZIP archive** containing XML files, media, and relationships.
+
+### DOCX Structure
+
+```bash
+$ unzip -l document.docx
+  [Content_Types].xml          ← MIME type registry
+  _rels/.rels                  ← Root relationships
+  word/document.xml            ← Main document content
+  word/styles.xml              ← Style definitions
+  word/settings.xml            ← Document settings
+  word/fontTable.xml           ← Font declarations
+  word/theme/theme1.xml        ← Theme (colors, fonts)
+  word/media/image1.png        ← Embedded images
+  word/_rels/document.xml.rels ← Document relationships
+  docProps/core.xml            ← Dublin Core metadata
+  docProps/app.xml             ← Application metadata
+```
+
+### Document.xml Content
+
+```xml
+<w:document xmlns:w="http://schemas.openxmlformats.org/.../wordprocessingml">
+  <w:body>
+    <w:p>                              <!-- Paragraph -->
+      <w:pPr>                          <!-- Paragraph properties -->
+        <w:pStyle w:val="Heading1"/>
+      </w:pPr>
+      <w:r>                            <!-- Run (text span) -->
+        <w:rPr>                        <!-- Run properties -->
+          <w:b/>                       <!-- Bold -->
+        </w:rPr>
+        <w:t>Introduction</w:t>        <!-- Text content -->
+      </w:r>
+    </w:p>
+  </w:body>
+</w:document>
+```
+
+### XLSX Structure
+
+Spreadsheets split data across multiple XML files:
+
+```
+xl/worksheets/sheet1.xml     ← Cell data (row/col/value)
+xl/sharedStrings.xml         ← String table (cells reference by index)
+xl/styles.xml                ← Number formats, fonts, fills
+xl/workbook.xml              ← Sheet names, defined names
+```
+
+Cell values reference the shared strings table by index, so repeated strings are stored once:
+
+```xml
+<!-- xl/sharedStrings.xml -->
+<sst count="3" uniqueCount="2">
+  <si><t>Name</t></si>       <!-- index 0 -->
+  <si><t>Revenue</t></si>    <!-- index 1 -->
+</sst>
+
+<!-- xl/worksheets/sheet1.xml -->
+<row r="1">
+  <c r="A1" t="s"><v>0</v></c>   <!-- "Name" (string index 0) -->
+  <c r="B1" t="s"><v>1</v></c>   <!-- "Revenue" (string index 1) -->
+  <c r="B2"><v>50000</v></c>      <!-- Numeric value (no type = number) -->
+</row>
+```
+
+---
+
+## OpenDocument Format (ODF)
+
+ISO standard, used by LibreOffice and other open-source suites. Also ZIP-based with XML content, but uses different schemas.
+
+| Office XML | ODF Equivalent |
+|-----------|----------------|
+| `.docx` | `.odt` (text) |
+| `.xlsx` | `.ods` (spreadsheet) |
+| `.pptx` | `.odp` (presentation) |
+
+Structure is similar: ZIP containing `content.xml`, `styles.xml`, `meta.xml`, `META-INF/manifest.xml`.
+
+---
+
+## EPUB
+
+The standard e-book format. Essentially a **website in a ZIP file** — XHTML pages + CSS + images + metadata.
+
+### EPUB Structure
+
+```bash
+$ unzip -l book.epub
+  mimetype                     ← Must be first, uncompressed: "application/epub+zip"
+  META-INF/container.xml       ← Points to the OPF file
+  OEBPS/content.opf            ← Package document (manifest + spine)
+  OEBPS/toc.ncx                ← Table of contents (EPUB 2)
+  OEBPS/nav.xhtml              ← Navigation document (EPUB 3)
+  OEBPS/chapter1.xhtml         ← Content pages
+  OEBPS/chapter2.xhtml
+  OEBPS/styles/main.css        ← Stylesheets
+  OEBPS/images/cover.jpg       ← Images
+```
+
+### Key EPUB Files
+
+**content.opf** — The manifest and reading order:
+
+```xml
+<package xmlns="http://www.idpf.org/2007/opf" version="3.0">
+  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
+    <dc:title>The Great Novel</dc:title>
+    <dc:creator>Author Name</dc:creator>
+    <dc:language>en</dc:language>
+    <dc:identifier id="uid">isbn:978-0-123456-78-9</dc:identifier>
+  </metadata>
+
+  <manifest>
+    <item id="ch1" href="chapter1.xhtml" media-type="application/xhtml+xml"/>
+    <item id="ch2" href="chapter2.xhtml" media-type="application/xhtml+xml"/>
+    <item id="css" href="styles/main.css" media-type="text/css"/>
+    <item id="cover" href="images/cover.jpg" media-type="image/jpeg"/>
+  </manifest>
+
+  <spine>
+    <itemref idref="ch1"/>
+    <itemref idref="ch2"/>
+  </spine>
+</package>
+```
+
+**mimetype file** — The `mimetype` entry MUST be the first file in the ZIP, stored without compression, at byte offset 38. This allows identification without full ZIP parsing:
+
+```
+Offset    Content
+00000000  50 4B 03 04                   ZIP local file header
+...
+00000026  6D 69 6D 65 74 79 70 65       "mimetype"
+00000030  61 70 70 6C 69 63 61 74       "application/epub+zip"
+          69 6F 6E 2F 65 70 75 62
+          2B 7A 69 70
+```
+
+### EPUB 2 vs EPUB 3
+
+| Feature | EPUB 2 | EPUB 3 |
+|---------|--------|--------|
+| Content | XHTML 1.1 | HTML5 |
+| Styling | CSS 2.1 | CSS3 |
+| Navigation | NCX (XML) | nav.xhtml (HTML5) |
+| Scripting | No | JavaScript (limited) |
+| Audio/Video | No | HTML5 `<audio>` / `<video>` |
+| MathML | Limited | Full support |
+| Accessibility | Basic | WCAG integration |
+
+---
+
+## RTF (Rich Text Format)
+
+Microsoft's legacy document interchange format. Plain-text markup (not binary, not XML):
+
+```
+{\rtf1\ansi\deff0
+{\fonttbl{\f0 Times New Roman;}}
+{\colortbl;\red255\green0\blue0;}
+\f0\fs24 Normal text. \b Bold text.\b0  \cf1 Red text.\cf0
+\par New paragraph.
+}
+```
+
+RTF is rarely used for new documents but remains relevant as an interchange format — nearly every word processor can read and write it.
+
+---
+
+## Format Comparison for Developers
+
+| Need | Best Format | Why |
+|------|-------------|-----|
+| Pixel-perfect printing | PDF | Fixed layout, embeds fonts |
+| Programmatic doc generation | PDF (via libraries) | wkhtmltopdf, Puppeteer, WeasyPrint, reportlab |
+| Editable business docs | DOCX | Universal Office compatibility |
+| Open standard docs | ODF | ISO standard, no vendor lock-in |
+| E-books | EPUB | Reflowable, e-reader standard |
+| Technical/academic papers | LaTeX → PDF | Best math/citation support |
+| Web-first docs | HTML/Markdown | Already web-native |
+| Data export (tabular) | CSV or XLSX | Depends on whether formatting matters |
+
+### Programmatic PDF Generation
+
+Common approaches for generating PDFs in code:
+
+| Approach | Libraries | Notes |
+|----------|-----------|-------|
+| HTML → PDF | Puppeteer, Playwright, WeasyPrint, wkhtmltopdf | Write HTML/CSS, render to PDF |
+| Direct PDF API | reportlab (Python), iText (Java), PDFKit (Node) | Full control, more complex |
+| Template-based | Typst, LaTeX, Pandoc | Write in markup, compile to PDF |
+| Fill forms | pdf-lib (JS), PyPDF, iText | Populate existing PDF templates |
+
+---
+
+## Related
+
+- [[File Formats]] — Parent overview of file format concepts
+- [[File Metadata]] — Metadata in documents (author, dates, tracked changes)
+- [[Archive and Compression Formats]] — ZIP internals that underlie DOCX/EPUB
+- [[Serialization]] — JSON, YAML, and structured data formats
diff --git a/Computer Science/File Formats.md b/Computer Science/File Formats.md
new file mode 100644
index 0000000..79f9d75
--- /dev/null
+++ b/Computer Science/File Formats.md	
@@ -0,0 +1,278 @@
+---
+title: File Formats
+aliases:
+  - File Format
+  - Binary Formats
+  - Data Formats
+tags:
+  - cs
+  - fundamentals
+  - file-formats
+type: concept
+status: complete
+difficulty: fundamentals
+created: "2026-02-19"
+---
+
+# File Formats
+
+How computers organize bytes into meaningful structures — from plain text to images, archives, and executables.
+
+## Why It Matters
+
+Every file is just bytes. The **format** defines what those bytes mean. Understanding file formats helps you:
+
+- Debug corrupt files by reading raw hex
+- Choose the right image/video/document format for a use case
+- Build parsers, converters, and tooling
+- Understand security implications (metadata leaks, polyglot attacks)
+
+---
+
+## Anatomy of a Binary File
+
+Most binary files follow a common structural pattern:
+
+```
+┌──────────────────────────────────────────────────┐
+│  Magic Bytes / Signature   (identifies format)   │
+├──────────────────────────────────────────────────┤
+│  Header                    (metadata, offsets)    │
+├──────────────────────────────────────────────────┤
+│  Body / Chunks / Segments  (actual data)          │
+├──────────────────────────────────────────────────┤
+│  Trailer / Footer          (checksums, EOF mark)  │
+└──────────────────────────────────────────────────┘
+```
+
+### Magic Bytes
+
+The first few bytes of a file that identify its format. Operating systems and tools use these to detect file types regardless of extension.
+
+| Format | Magic Bytes (hex) | ASCII (if readable) |
+|--------|-------------------|---------------------|
+| PNG | `89 50 4E 47 0D 0A 1A 0A` | `.PNG....` |
+| JPEG | `FF D8 FF` | n/a |
+| GIF87a | `47 49 46 38 37 61` | `GIF87a` |
+| GIF89a | `47 49 46 38 39 61` | `GIF89a` |
+| PDF | `25 50 44 46 2D` | `%PDF-` |
+| ZIP | `50 4B 03 04` | `PK..` |
+| SQLite | `53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00` | `SQLite format 3.` |
+| ELF (Linux binary) | `7F 45 4C 46` | `.ELF` |
+| Mach-O (macOS) | `FE ED FA CE` or `FE ED FA CF` | n/a |
+| PE (Windows .exe) | `4D 5A` | `MZ` |
+| WebAssembly | `00 61 73 6D` | `.asm` |
+| FLAC | `66 4C 61 43` | `fLaC` |
+| MP3 (ID3v2) | `49 44 33` | `ID3` |
+| OGG | `4F 67 67 53` | `OggS` |
+| RIFF (WAV/AVI) | `52 49 46 46` | `RIFF` |
+| Gzip | `1F 8B` | n/a |
+| Bzip2 | `42 5A 68` | `BZh` |
+| 7z | `37 7A BC AF 27 1C` | `7z...` |
+| TIFF (LE) | `49 49 2A 00` | `II*.` |
+| TIFF (BE) | `4D 4D 00 2A` | `MM.*` |
+
+The `file` command on Unix uses these signatures (via libmagic) to identify files:
+
+```bash
+$ file photo.jpg
+photo.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI)...
+$ file mystery_file
+mystery_file: ELF 64-bit LSB executable, x86-64
+```
+
+---
+
+## Format Categories
+
+### Text-Based Formats
+
+Stored as human-readable character sequences. Can be opened in any text editor.
+
+| Format | Structure | Use Case |
+|--------|-----------|----------|
+| CSV/TSV | Delimited rows | Tabular data exchange |
+| JSON | Key-value tree | APIs, config |
+| YAML | Indented key-value | Config, CI/CD |
+| TOML | INI-like sections | Config (Cargo, pyproject) |
+| XML | Nested tags | Legacy APIs, SOAP, SVG, HTML |
+| Markdown | Lightweight markup | Docs, READMEs |
+| Protocol Buffers (text) | Schema definition | `.proto` files |
+
+**Plain text is not always simple.** Encoding matters — [[Character Encoding]] (UTF-8 vs UTF-16 vs ASCII) determines how characters map to bytes. A BOM (`EF BB BF` in UTF-8) can appear at the start of text files, which some tools mishandle.
+
+### Binary Formats
+
+Not human-readable. Require specialized parsers or hex editors.
+
+| Category | Examples | See Also |
+|----------|----------|----------|
+| Images | JPEG, PNG, GIF, WebP, AVIF, TIFF, BMP | [[Image Formats]] |
+| Audio | MP3, FLAC, WAV, AAC, OGG, Opus | [[Audio and Video Formats]] |
+| Video | MP4, WebM, MKV, AVI, MOV | [[Audio and Video Formats]] |
+| Archives | ZIP, tar, gzip, 7z, brotli, zstd | [[Archive and Compression Formats]] |
+| Documents | PDF, DOCX, XLSX, ODF | [[Document Formats]] |
+| Databases | SQLite, LevelDB | [[Database Engines]] |
+| Executables | ELF, PE, Mach-O, WASM | — |
+| Serialization | Protobuf, MessagePack, CBOR, FlatBuffers | [[Serialization]] |
+
+### Container Formats
+
+Some formats are actually **containers** that hold multiple formats inside:
+
+| Container | Contains | Notes |
+|-----------|----------|-------|
+| ZIP | Arbitrary files | Also the basis for DOCX, XLSX, JAR, APK, EPUB |
+| MP4 (MPEG-4 Part 14) | Video + audio + subtitles + metadata | ISO base media file format |
+| MKV (Matroska) | Any codec combination | Extremely flexible |
+| RIFF | Chunks of typed data | WAV (audio), AVI (video) |
+| OGG | Vorbis, Opus, Theora streams | Open container format |
+| TAR | Files + directory structure | No compression (pair with gzip/bzip2) |
+| TIFF | Multiple images + metadata | Can embed JPEG-compressed frames |
+
+**ZIP-based formats** are surprisingly common. You can rename and unzip them:
+
+```bash
+$ cp document.docx document.zip && unzip document.zip
+Archive:  document.zip
+  inflating: [Content_Types].xml
+  inflating: _rels/.rels
+  inflating: word/document.xml
+  inflating: word/styles.xml
+  ...
+```
+
+Same applies to `.jar`, `.apk`, `.epub`, `.odt`, `.xlsx`.
+
+---
+
+## Metadata
+
+Files carry metadata beyond their primary content — creation dates, authorship, device info, and notably **GPS coordinates** in photos.
+
+See [[File Metadata]] for deep coverage of EXIF, XMP, ID3, and other metadata systems.
+
+### Quick Metadata Overview
+
+| Domain | Standard | Found In | Notable Fields |
+|--------|----------|----------|----------------|
+| Photos | EXIF | JPEG, TIFF, HEIF | GPS coordinates, camera model, exposure settings |
+| Photos | XMP | JPEG, PNG, PDF, TIFF | Extensible, XML-based, edit history |
+| Photos | IPTC-IIM | JPEG, TIFF | Caption, credit, copyright (press/journalism) |
+| Audio | ID3v2 | MP3 | Artist, album, track, cover art |
+| Audio | Vorbis Comment | FLAC, OGG | Flexible key-value tags |
+| Video | MP4 metadata | MP4, MOV | Duration, codec info, GPS, creation time |
+| Documents | PDF metadata | PDF | Author, title, creation tool, XMP |
+| Archives | Filesystem metadata | ZIP, TAR | File permissions, timestamps, paths |
+
+---
+
+## Endianness
+
+Binary formats must decide the **byte order** for multi-byte values:
+
+| Term | Order | Example (0x01020304) | Common In |
+|------|-------|----------------------|-----------|
+| Big-endian (BE) | MSB first | `01 02 03 04` | Network protocols, TIFF (Motorola), Java `.class` |
+| Little-endian (LE) | LSB first | `04 03 02 01` | x86/x64, TIFF (Intel), WASM, most modern formats |
+
+TIFF files declare their endianness in the first two bytes: `II` = Intel (LE), `MM` = Motorola (BE). Many modern formats standardize on little-endian to match dominant CPU architecture.
+
+---
+
+## Hex Dump Walkthrough
+
+Reading a hex dump is the most direct way to understand file structure. Here is the start of a real PNG file:
+
+```
+Offset    00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F   ASCII
+00000000  89 50 4E 47 0D 0A 1A 0A  00 00 00 0D 49 48 44 52   .PNG........IHDR
+00000010  00 00 02 00 00 00 02 00  08 06 00 00 00 F4 78 D4   ..............x.
+00000020  FA 00 00 00 04 73 42 49  54 08 08 08 08 7C 08 64   .....sBIT....|.d
+```
+
+Breaking this down:
+
+```
+89 50 4E 47 0D 0A 1A 0A   ← PNG signature (magic bytes)
+00 00 00 0D                ← Chunk length: 13 bytes
+49 48 44 52                ← Chunk type: "IHDR" (image header)
+00 00 02 00                ← Width: 512px
+00 00 02 00                ← Height: 512px
+08                         ← Bit depth: 8
+06                         ← Color type: 6 (RGBA)
+00                         ← Compression: deflate
+00                         ← Filter method: adaptive
+00                         ← Interlace: none
+```
+
+See [[Image Formats]] for full format breakdowns of JPEG, PNG, GIF, WebP, and more.
+
+---
+
+## Common Patterns Across Formats
+
+### Chunked / Tagged Structure
+
+Many formats organize data as a sequence of labeled chunks, each with a type and length:
+
+```
+┌──────────┬──────────┬──────────────────────┐
+│ Type/Tag │  Length   │   Data (Length bytes) │
+│ (4 bytes)│ (4 bytes)│                       │
+└──────────┴──────────┴──────────────────────┘
+```
+
+Used by: PNG, RIFF (WAV/AVI), IFF, TIFF (IFDs), MP4 (atoms/boxes)
+
+### Offset Tables / Index
+
+Rather than reading sequentially, some formats include an index pointing to data locations:
+
+- **ZIP** — Central directory at end of file points to each file entry
+- **PDF** — Cross-reference table (xref) maps objects to byte offsets
+- **SQLite** — B-tree page index for row lookup
+- **ELF** — Section header table and program header table
+
+### Compression
+
+Most modern binary formats compress their payload:
+
+| Algorithm | Used By | Ratio | Speed |
+|-----------|---------|-------|-------|
+| DEFLATE | PNG, ZIP, gzip, HTTP | Good | Moderate |
+| LZ77/LZ78 | GIF (LZW), many others | Moderate | Fast |
+| Brotli | Web (WOFF2, HTTP) | Excellent | Slow compress, fast decompress |
+| Zstandard | Newer archives, databases | Excellent | Fast |
+| LZMA/LZMA2 | 7z, xz | Best ratio | Slowest |
+
+---
+
+## Tools for Inspecting Files
+
+| Tool | Purpose |
+|------|---------|
+| `xxd` | Hex dump with ASCII sidebar |
+| `hexdump -C` | Classic hex dump |
+| `file` | Identify format via magic bytes |
+| `exiftool` | Read/write image and media metadata |
+| `ffprobe` | Inspect audio/video container structure |
+| `pdfinfo` | PDF metadata |
+| `zipinfo` | ZIP archive contents |
+| `readelf` | ELF binary structure |
+| `otool` | Mach-O binary structure (macOS) |
+| `strings` | Extract ASCII strings from binary |
+| `binwalk` | Scan for embedded files/signatures |
+
+---
+
+## Related
+
+- [[Image Formats]] — JPEG, PNG, GIF, WebP, AVIF deep dive
+- [[File Metadata]] — EXIF, GPS, XMP, ID3 metadata systems
+- [[Audio and Video Formats]] — Codecs, containers, streaming
+- [[Archive and Compression Formats]] — ZIP, tar, gzip, zstd, brotli
+- [[Document Formats]] — PDF, Office XML, EPUB internals
+- [[Serialization]] — Protobuf, MessagePack, structured binary data
+- [[Character Encoding]] — UTF-8, ASCII, Unicode
+- [[Database Engines]] — SQLite file format and others
diff --git a/Computer Science/File Metadata.md b/Computer Science/File Metadata.md
new file mode 100644
index 0000000..5c009f4
--- /dev/null
+++ b/Computer Science/File Metadata.md	
@@ -0,0 +1,503 @@
+---
+title: File Metadata
+aliases:
+  - EXIF
+  - EXIF Data
+  - Image Metadata
+  - Photo Metadata
+  - XMP
+tags:
+  - cs
+  - fundamentals
+  - file-formats
+  - media
+  - security
+type: concept
+status: complete
+difficulty: fundamentals
+created: "2026-02-19"
+---
+
+# File Metadata
+
+Data about data — how files carry information beyond their primary content, including camera settings, GPS coordinates, authorship, and edit history.
+
+## Why It Matters
+
+- **Privacy** — Photos from smartphones embed GPS coordinates by default. Sharing a photo can reveal your home address, workplace, or travel patterns.
+- **Forensics** — Metadata reveals what device created a file, when, and sometimes edit history.
+- **Workflow** — Photo editors, DAMs, and media libraries rely on metadata for organization.
+- **Security** — Metadata can leak sensitive information. Many organizations strip metadata before publishing.
+
+---
+
+## EXIF (Exchangeable Image File Format)
+
+The dominant metadata standard for photos. Embedded directly in JPEG, TIFF, HEIF, and WebP files.
+
+### Where EXIF Lives in a JPEG
+
+EXIF data is stored in the APP1 marker segment, immediately after the SOI marker:
+
+```
+FF D8                    ← SOI (Start of Image)
+FF E1 [length]           ← APP1 marker (EXIF container)
+  45 78 69 66 00 00      ← "Exif\0\0" identifier
+  ┌─────────────────────────────────────────────┐
+  │ TIFF Header                                  │
+  │   49 49 (II = little-endian)                 │
+  │   2A 00 (TIFF magic number 42)               │
+  │   08 00 00 00 (offset to first IFD)          │
+  ├─────────────────────────────────────────────┤
+  │ IFD0 (Image File Directory - main image)     │
+  │   [count] [entry] [entry] [entry] ...        │
+  │   [pointer to IFD1]                          │
+  ├─────────────────────────────────────────────┤
+  │ Sub-IFD (EXIF-specific tags)                 │
+  │   [detailed camera settings]                 │
+  ├─────────────────────────────────────────────┤
+  │ GPS IFD (location data)                      │
+  │   [latitude, longitude, altitude, timestamp] │
+  ├─────────────────────────────────────────────┤
+  │ IFD1 (thumbnail image)                       │
+  │   [embedded JPEG thumbnail]                  │
+  └─────────────────────────────────────────────┘
+FF DB ...                ← Rest of JPEG follows
+```
+
+### IFD Entry Format
+
+Each EXIF tag is stored as a 12-byte entry in an Image File Directory:
+
+```
+┌────────────┬────────────┬────────────┬──────────────────────┐
+│ Tag ID     │ Data Type  │ Count      │ Value / Offset       │
+│ (2 bytes)  │ (2 bytes)  │ (4 bytes)  │ (4 bytes)            │
+└────────────┴────────────┴────────────┴──────────────────────┘
+```
+
+If the value fits in 4 bytes, it's stored inline. Otherwise, the field contains an offset pointing to the value elsewhere in the EXIF block.
+
+### Common EXIF Tags
+
+#### Camera & Capture
+
+| Tag ID | Name | Example Value |
+|--------|------|---------------|
+| 0x010F | Make | `Apple` |
+| 0x0110 | Model | `iPhone 15 Pro` |
+| 0x829A | ExposureTime | `1/125` |
+| 0x829D | FNumber | `f/1.8` |
+| 0x8827 | ISOSpeedRatings | `100` |
+| 0x9003 | DateTimeOriginal | `2026:01:15 14:30:22` |
+| 0x920A | FocalLength | `6.86 mm` |
+| 0xA405 | FocalLengthIn35mm | `24 mm` |
+| 0xA433 | LensMake | `Apple` |
+| 0xA434 | LensModel | `iPhone 15 Pro back triple camera 6.86mm f/1.78` |
+| 0x0112 | Orientation | `6` (rotated 90 CW) |
+| 0xA002 | PixelXDimension | `4032` |
+| 0xA003 | PixelYDimension | `3024` |
+
+#### Image Processing
+
+| Tag ID | Name | Example Value |
+|--------|------|---------------|
+| 0x0131 | Software | `Adobe Photoshop 25.0` |
+| 0x9286 | UserComment | Free-form text |
+| 0xA001 | ColorSpace | `1` (sRGB) |
+| 0xA300 | FileSource | `3` (digital camera) |
+| 0xA301 | SceneType | `1` (directly photographed) |
+| 0xA401 | CustomRendered | `0` (normal) or `1` (custom) |
+| 0xA402 | ExposureMode | `0` (auto) |
+| 0xA403 | WhiteBalance | `0` (auto) |
+
+---
+
+## GPS Metadata
+
+The most privacy-sensitive metadata. Smartphones embed GPS coordinates by default unless the user explicitly disables it.
+
+### GPS IFD Structure
+
+GPS data uses its own set of tags in a dedicated IFD (Image File Directory):
+
+| Tag ID | Name | Format | Example |
+|--------|------|--------|---------|
+| 0x0000 | GPSVersionID | 4 bytes | `2 3 0 0` |
+| 0x0001 | GPSLatitudeRef | ASCII | `N` or `S` |
+| 0x0002 | GPSLatitude | 3 rationals | `47/1, 36/1, 2174/100` |
+| 0x0003 | GPSLongitudeRef | ASCII | `E` or `W` |
+| 0x0004 | GPSLongitude | 3 rationals | `122/1, 19/1, 4522/100` |
+| 0x0005 | GPSAltitudeRef | byte | `0` (above sea level) |
+| 0x0006 | GPSAltitude | rational | `56/1` (meters) |
+| 0x0007 | GPSTimeStamp | 3 rationals | `14/1, 30/1, 22/1` (UTC) |
+| 0x001D | GPSDateStamp | ASCII | `2026:01:15` |
+| 0x000C | GPSSpeedRef | ASCII | `K` (km/h) |
+| 0x000D | GPSSpeed | rational | `0/1` |
+| 0x000E | GPSTrackRef | ASCII | `T` (true north) |
+| 0x000F | GPSTrack | rational | `275/1` (degrees) |
+
+### Reading GPS Coordinates
+
+GPS latitude and longitude are stored as three rational numbers: degrees, minutes, seconds.
+
+```
+GPSLatitude:     47/1, 36/1, 2174/100
+GPSLatitudeRef:  N
+GPSLongitude:    122/1, 19/1, 4522/100
+GPSLongitudeRef: W
+
+Conversion to decimal:
+  Lat  = 47 + 36/60 + 21.74/3600 = 47.6060°N
+  Long = 122 + 19/60 + 45.22/3600 = 122.3292°W
+  → (47.6060, -122.3292) ≈ Seattle, WA
+```
+
+### Viewing and Stripping GPS Data
+
+```bash
+# View all EXIF data including GPS
+exiftool photo.jpg
+
+# View only GPS data
+exiftool -GPS* photo.jpg
+
+# Strip all GPS data
+exiftool -GPS*= photo.jpg
+
+# Strip ALL metadata
+exiftool -all= photo.jpg
+
+# Strip metadata from all JPEGs in a directory
+exiftool -all= -overwrite_original *.jpg
+```
+
+### Privacy Implications
+
+| Platform | Strips GPS on upload? |
+|----------|----------------------|
+| Twitter/X | Yes |
+| Facebook | Yes (but may use internally) |
+| Instagram | Yes |
+| Imgur | Yes |
+| Discord | No (as of 2024) |
+| Email | No |
+| Slack | No |
+| iMessage | Depends on share settings |
+| Google Photos shared links | Configurable |
+
+**Best practice:** Disable GPS tagging in your camera app, or strip metadata before sharing files directly.
+
+---
+
+## EXIF Orientation Tag
+
+One of the most commonly mishandled pieces of metadata. Cameras and phones store photos in sensor orientation and use the Orientation tag to indicate how to display them.
+
+| Value | Transform | Common Cause |
+|-------|-----------|-------------|
+| 1 | None (normal) | Landscape, home button right |
+| 3 | Rotate 180 | Upside down |
+| 6 | Rotate 90 CW | Portrait, home button bottom |
+| 8 | Rotate 90 CCW | Portrait, home button top |
+| 2, 4, 5, 7 | Mirrored variants | Front-facing camera |
+
+**Common bug:** Applications that ignore the Orientation tag display photos sideways or upside down. The pixel data itself is not rotated — only the metadata says how to display it.
+
+---
+
+## XMP (Extensible Metadata Platform)
+
+Adobe's XML-based metadata standard. More flexible than EXIF — supports arbitrary namespaces, nested structures, and arrays.
+
+### Where XMP Lives
+
+| File Type | Location |
+|-----------|----------|
+| JPEG | APP1 marker (separate from EXIF APP1) |
+| PNG | `iTXt` chunk with keyword "XML:com.adobe.xmp" |
+| TIFF | Tag 700 in IFD |
+| PDF | Metadata stream object |
+| MP4 | `uuid` box with XMP UUID |
+| Sidecar | `.xmp` file alongside the original |
+
+### XMP Structure
+
+```xml
+<x:xmpmeta xmlns:x="adobe:ns:meta/">
+  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+    <rdf:Description
+      xmlns:dc="http://purl.org/dc/elements/1.1/"
+      xmlns:xmp="http://ns.adobe.com/xap/1.0/"
+      xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"
+      xmlns:Iptc4xmpCore="http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/">
+
+      <dc:title>Sunset at Gas Works Park</dc:title>
+      <dc:creator>Jane Smith</dc:creator>
+      <dc:rights>Copyright 2026 Jane Smith</dc:rights>
+
+      <xmp:CreatorTool>Adobe Lightroom Classic 14.0</xmp:CreatorTool>
+      <xmp:CreateDate>2026-01-15T14:30:22-08:00</xmp:CreateDate>
+      <xmp:ModifyDate>2026-01-16T09:15:00-08:00</xmp:ModifyDate>
+      <xmp:Rating>4</xmp:Rating>
+
+      <photoshop:City>Seattle</photoshop:City>
+      <photoshop:State>Washington</photoshop:State>
+      <photoshop:Country>United States</photoshop:Country>
+
+      <dc:subject>
+        <rdf:Bag>
+          <rdf:li>sunset</rdf:li>
+          <rdf:li>cityscape</rdf:li>
+          <rdf:li>Seattle</rdf:li>
+        </rdf:Bag>
+      </dc:subject>
+
+    </rdf:Description>
+  </rdf:RDF>
+</x:xmpmeta>
+```
+
+### EXIF vs XMP
+
+| Aspect | EXIF | XMP |
+|--------|------|-----|
+| Format | Binary (TIFF-based) | XML text |
+| Extensibility | Fixed tag set | Arbitrary namespaces |
+| Size limit | 64KB in JPEG | Unlimited (practically) |
+| Edit history | No | Yes (with sidecar) |
+| Readability | Needs parser | Human-readable XML |
+| Standardized by | JEITA/CIPA | Adobe (ISO 16684) |
+| Typical use | Camera capture data | Post-processing, cataloging |
+
+In practice, both coexist — EXIF for camera data, XMP for editorial metadata and tags.
+
+---
+
+## Audio Metadata: ID3
+
+The metadata standard for MP3 files, though some fields have been adopted more broadly.
+
+### ID3v1 (Legacy)
+
+Fixed-size block at the **end** of the MP3 file:
+
+```
+Offset from EOF   Size   Field
+-128               3     Tag identifier: "TAG"
+-125              30     Title
+-95               30     Artist
+-65               30     Album
+-35                4     Year
+-31               30     Comment (or 28 + track number in v1.1)
+-1                 1     Genre (index into predefined list)
+```
+
+Severely limited: 30 characters per field, 80 predefined genres, no album art, no Unicode.
+
+### ID3v2
+
+Variable-length block at the **start** of the MP3 file (before audio data):
+
+```
+49 44 33           ← "ID3" magic bytes
+03 00              ← Version: 2.3.0
+00                 ← Flags
+XX XX XX XX        ← Tag size (syncsafe integer)
+[Frame] [Frame] ... ← Variable number of tagged frames
+```
+
+Each frame:
+
+```
+┌────────────┬────────────┬───────┬──────────────────┐
+│ Frame ID   │ Size       │ Flags │ Data             │
+│ (4 bytes)  │ (4 bytes)  │ (2B)  │ (variable)       │
+└────────────┴────────────┴───────┴──────────────────┘
+```
+
+Common frame IDs:
+
+| Frame | Name | Contents |
+|-------|------|----------|
+| `TIT2` | Title | Song name |
+| `TPE1` | Lead Artist | Primary performer |
+| `TALB` | Album | Album name |
+| `TRCK` | Track | Track number (e.g., "3/12") |
+| `TDRC` | Recording Date | ISO 8601 date |
+| `TCON` | Genre | Genre name or "(index)" |
+| `COMM` | Comments | Description + text |
+| `APIC` | Attached Picture | Cover art (embedded JPEG or PNG) |
+| `USLT` | Unsynced Lyrics | Song lyrics |
+| `TXXX` | User-defined text | Custom key-value pairs |
+
+### Syncsafe Integers
+
+ID3v2 uses "syncsafe" integers where the high bit of each byte is always 0. This prevents the metadata from being mistaken for an MP3 sync word (`FF FB`, `FF FA`, etc.):
+
+```
+Normal integer:     0x00021000 = 135168
+As syncsafe bytes:  00 04 20 00
+  Bit layout: 0AAAAAAA 0BBBBBBB 0CCCCCCC 0DDDDDDD
+  Reassemble: AAAAAAABBBBBBBCCCCCCCDDDDDDD = 28-bit value
+```
+
+### Vorbis Comments (FLAC, OGG)
+
+An alternative to ID3, used by open formats. Simple key=value pairs with no predefined tag set:
+
+```
+ARTIST=Pink Floyd
+ALBUM=The Dark Side of the Moon
+TITLE=Time
+TRACKNUMBER=4
+DATE=1973
+GENRE=Progressive Rock
+```
+
+Stored in FLAC's `VORBIS_COMMENT` metadata block or OGG's header packets.
+
+---
+
+## Video Metadata
+
+### MP4/MOV Metadata
+
+MP4 files (based on ISO Base Media File Format) store metadata in "boxes" (also called "atoms"):
+
+```
+[moov box]
+  └─[udta box]           ← User data
+      └─[meta box]       ← Metadata container
+          ├─[hdlr]       ← Handler (declares metadata type)
+          ├─[ilst box]   ← iTunes-style metadata
+          │   ├─©nam     ← Title
+          │   ├─©ART     ← Artist
+          │   ├─©day     ← Year
+          │   ├─©gen     ← Genre
+          │   └─covr     ← Cover art
+          └─[XMP_ box]   ← XMP metadata (if present)
+
+  └─[trak box]           ← Track metadata
+      └─[mdia box]
+          └─[minf box]   ← Media info (codec, dimensions, bitrate)
+```
+
+GPS data in MP4 can appear in multiple places:
+
+- `©xyz` atom in `udta` — Apple's location format (`+47.6060-122.3292/`)
+- XMP metadata box
+- GPS track in a dedicated metadata track
+
+### Inspecting Video Metadata
+
+```bash
+# Full metadata dump
+ffprobe -v quiet -show_format -show_streams video.mp4
+
+# Just format-level metadata
+ffprobe -v quiet -show_entries format_tags video.mp4
+
+# EXIF/XMP from video
+exiftool video.mp4
+```
+
+---
+
+## Document Metadata
+
+### PDF Metadata
+
+PDF files carry metadata in two places:
+
+**Info Dictionary** (legacy):
+
+```
+<< /Title (Quarterly Report)
+   /Author (Jane Smith)
+   /Subject (Q4 2025 Financial Results)
+   /Creator (Microsoft Word)
+   /Producer (macOS Quartz PDFContext)
+   /CreationDate (D:20260115143022-08'00')
+   /ModDate (D:20260116091500-08'00')
+>>
+```
+
+**XMP Metadata Stream** (modern):
+
+Embedded as an XML stream object, following the same XMP structure described above. Most modern PDF tools write both for backward compatibility.
+
+```bash
+# View PDF metadata
+pdfinfo document.pdf
+exiftool document.pdf
+
+# Remove metadata
+exiftool -all= document.pdf
+qpdf --linearize --replace-input document.pdf  # also removes hidden data
+```
+
+### Office Documents (DOCX, XLSX, PPTX)
+
+These are ZIP archives containing XML files. Metadata lives in:
+
+- `docProps/core.xml` — Dublin Core metadata (title, author, dates)
+- `docProps/app.xml` — Application metadata (word count, company, app version)
+- `docProps/custom.xml` — Custom properties
+
+```xml
+<!-- docProps/core.xml -->
+<cp:coreProperties>
+  <dc:title>Project Proposal</dc:title>
+  <dc:creator>Jane Smith</dc:creator>
+  <cp:lastModifiedBy>John Doe</cp:lastModifiedBy>
+  <dcterms:created>2026-01-15T14:30:22Z</dcterms:created>
+  <dcterms:modified>2026-01-16T09:15:00Z</dcterms:modified>
+  <cp:revision>7</cp:revision>
+</cp:coreProperties>
+```
+
+**Hidden data risks in Office docs:**
+
+- Track changes / revision history
+- Comments and annotations
+- Hidden rows/columns in spreadsheets
+- Embedded file paths (can reveal username and directory structure)
+- Previous authors visible in revision metadata
+
+---
+
+## Metadata Security Checklist
+
+| Risk | Mitigation |
+|------|-----------|
+| GPS coordinates in photos | Disable location in camera settings, strip before sharing |
+| Author/username in documents | Use Document Inspector (Office) or `exiftool -all=` |
+| Camera serial number in EXIF | Strip EXIF if publishing anonymously |
+| Embedded thumbnails | EXIF thumbnails may show original un-cropped image |
+| Edit history in XMP | Strip XMP sidecar data |
+| File paths in Office XML | Run Document Inspector before distributing |
+| Creation timestamps | Strip or normalize if anonymity is needed |
+
+---
+
+## Tools
+
+| Tool | Purpose | Formats |
+|------|---------|---------|
+| `exiftool` | Swiss army knife for metadata | JPEG, PNG, TIFF, MP4, PDF, Office, 400+ formats |
+| `identify -verbose` | ImageMagick metadata dump | All image formats |
+| `ffprobe` | Media container metadata | MP4, MKV, WebM, audio |
+| `pdfinfo` | PDF metadata | PDF |
+| `mat2` | Metadata removal tool | Images, docs, audio, video |
+| `jhead` | JPEG header manipulation | JPEG |
+
+---
+
+## Related
+
+- [[File Formats]] — Parent overview of file format concepts
+- [[Image Formats]] — JPEG, PNG, GIF, WebP format internals
+- [[Audio and Video Formats]] — Codec and container formats
+- [[Document Formats]] — PDF, Office XML, EPUB structure
+- [[Web Security]] — Privacy and information leakage
diff --git a/Computer Science/Image Formats.md b/Computer Science/Image Formats.md
new file mode 100644
index 0000000..204de05
--- /dev/null
+++ b/Computer Science/Image Formats.md	
@@ -0,0 +1,359 @@
+---
+title: Image Formats
+aliases:
+  - Image File Formats
+  - Picture Formats
+tags:
+  - cs
+  - fundamentals
+  - file-formats
+  - media
+type: concept
+status: complete
+difficulty: fundamentals
+created: "2026-02-19"
+---
+
+# Image Formats
+
+How digital images are stored — from raw pixel grids to compressed representations, with deep dives into JPEG, PNG, GIF, WebP, and AVIF.
+
+## Overview
+
+| Format | Compression | Transparency | Animation | Lossy/Lossless | Typical Use |
+|--------|-------------|-------------|-----------|----------------|-------------|
+| JPEG | DCT-based | No | No | Lossy | Photos |
+| PNG | DEFLATE | Yes (alpha) | No | Lossless | UI, screenshots, graphics |
+| GIF | LZW | Yes (1-bit) | Yes | Lossless (256 colors) | Simple animations |
+| WebP | VP8/VP8L | Yes | Yes | Both | Web (modern replacement) |
+| AVIF | AV1 intra-frame | Yes | Yes | Both | Web (next-gen) |
+| BMP | None (usually) | No | No | Uncompressed | Legacy, simple storage |
+| TIFF | Various | Yes | No | Both | Print, medical, GIS |
+| SVG | N/A (vector) | Yes | Yes (SMIL) | N/A | Icons, logos, diagrams |
+| HEIF/HEIC | HEVC intra-frame | Yes | Yes | Both | Apple photos |
+| ICO | PNG or BMP | Yes | No | Both | Favicons, Windows icons |
+
+---
+
+## JPEG (Joint Photographic Experts Group)
+
+The most common photo format on the web and in cameras. Designed for photographic content.
+
+### How JPEG Compression Works
+
+```mermaid
+graph LR
+    A[RGB Pixels] --> B[Convert to YCbCr]
+    B --> C[Chroma Subsampling]
+    C --> D[8x8 Block DCT]
+    D --> E[Quantization]
+    E --> F[Entropy Encoding]
+    F --> G[JPEG File]
+
+    style A fill:#E8E8E8
+    style G fill:#90EE90
+```
+
+1. **Color space conversion** — RGB to YCbCr (luminance + two chrominance channels). Human eyes are more sensitive to brightness than color, so chrominance can be compressed more aggressively.
+
+2. **Chroma subsampling** — Reduce chrominance resolution (4:2:0 means half resolution in each axis for color, keeping full resolution for brightness). This alone cuts data by ~50% with minimal perceptual loss.
+
+3. **Block splitting** — Divide each channel into 8x8 pixel blocks.
+
+4. **DCT (Discrete Cosine Transform)** — Transform each block from spatial domain to frequency domain. Low frequencies (smooth gradients) cluster in the top-left; high frequencies (sharp edges) cluster in the bottom-right.
+
+5. **Quantization** — Divide DCT coefficients by a quantization matrix and round. **This is the lossy step.** Higher quality = smaller divisors = more preserved detail. Many high-frequency coefficients become zero.
+
+6. **Entropy encoding** — Huffman or arithmetic coding compresses the quantized coefficients. Zigzag scan order groups zeros together for efficient run-length encoding.
+
+### JPEG Binary Structure
+
+```
+FF D8 FF                     ← SOI (Start of Image) + marker prefix
+FF E0 [len] "JFIF"          ← APP0: JFIF header (or FF E1 for EXIF)
+FF DB [len] [tables]         ← DQT: Quantization tables
+FF C0 [len] [params]        ← SOF0: Start of Frame (dimensions, channels)
+FF C4 [len] [tables]        ← DHT: Huffman tables
+FF DA [len] [params] [data] ← SOS: Start of Scan (compressed image data)
+... compressed scan data ...
+FF D9                        ← EOI (End of Image)
+```
+
+Hex dump of a JPEG start:
+
+```
+00000000  FF D8 FF E0 00 10 4A 46  49 46 00 01 01 01 00 48   ......JFIF.....H
+00000010  00 48 00 00 FF DB 00 43  00 08 06 06 07 06 05 08   .H.....C........
+```
+
+- `FF D8` — Every JPEG starts with this (SOI marker)
+- `FF E0` — APP0 marker (JFIF metadata)
+- `4A 46 49 46 00` — "JFIF\0" identifier string
+- `FF DB` — Quantization table follows
+- `FF D9` — Every JPEG ends with this (EOI marker)
+
+### JPEG Variants
+
+| Variant | How It Differs |
+|---------|---------------|
+| Progressive JPEG | Multiple scans, coarse-to-fine rendering. Better perceived load time on slow connections. |
+| JPEG 2000 | Wavelet-based (not DCT). Better quality at low bitrates. Rare on the web but used in medical imaging and cinema (DCI). |
+| JPEG XL | Modern successor. Supports lossless, HDR, animation. Can losslessly recompress existing JPEG. Adoption stalled (dropped from Chrome). |
+| Motion JPEG | Each video frame is an independent JPEG. Simple but inefficient (no inter-frame compression). |
+
+### JPEG Artifacts
+
+Visible compression artifacts at low quality settings:
+
+- **Blocking** — Visible 8x8 grid boundaries (quantization too aggressive)
+- **Ringing** — Halos around sharp edges (high-frequency loss)
+- **Color banding** — Smooth gradients become stepped (chroma subsampling + quantization)
+- **Mosquito noise** — Flickering artifacts near edges in video MJPEG
+
+---
+
+## PNG (Portable Network Graphics)
+
+Lossless format designed to replace GIF. Supports full alpha transparency.
+
+### How PNG Works
+
+```mermaid
+graph LR
+    A[Pixel Data] --> B[Filtering per row]
+    B --> C[DEFLATE Compression]
+    C --> D[PNG Chunks]
+
+    style A fill:#E8E8E8
+    style D fill:#90EE90
+```
+
+1. **Filtering** — Each row is filtered to improve compressibility. Five filter types (None, Sub, Up, Average, Paeth) encode each pixel relative to its neighbors instead of as absolute values.
+
+2. **DEFLATE compression** — LZ77 + Huffman coding on the filtered data. Same algorithm as gzip/ZIP.
+
+3. **Chunked storage** — Data organized as typed chunks with length, type, data, and CRC checksum.
+
+### PNG Chunk Structure
+
+Every PNG chunk follows this layout:
+
+```
+┌──────────────┬───────────────┬────────────┬──────────────┐
+│ Length (4B)   │ Type (4B)     │ Data (var) │ CRC (4B)     │
+│ big-endian   │ ASCII name    │            │ of type+data │
+└──────────────┴───────────────┴────────────┴──────────────┘
+```
+
+Critical chunks (must be present):
+
+| Chunk | Name | Contents |
+|-------|------|----------|
+| `IHDR` | Image Header | Width, height, bit depth, color type |
+| `IDAT` | Image Data | Compressed pixel data (can be multiple) |
+| `IEND` | Image End | Empty, marks end of file |
+
+Ancillary chunks (optional):
+
+| Chunk | Name | Contents |
+|-------|------|----------|
+| `PLTE` | Palette | Color palette for indexed-color images |
+| `tRNS` | Transparency | Simple transparency without full alpha |
+| `tEXt` | Text | Key-value metadata (Latin-1) |
+| `iTXt` | International Text | Key-value metadata (UTF-8) |
+| `gAMA` | Gamma | Display gamma value |
+| `cHRM` | Chromaticity | Color space white point and primaries |
+| `iCCP` | ICC Profile | Embedded color profile |
+| `pHYs` | Physical Dimensions | Pixels per unit (DPI) |
+| `tIME` | Timestamp | Last modification time |
+| `eXIf` | EXIF data | Embedded EXIF metadata (since 2017) |
+
+### PNG Hex Walkthrough
+
+```
+Offset    Hex                                       Meaning
+00000000  89 50 4E 47 0D 0A 1A 0A                   PNG signature
+          │  │           │  │
+          │  "PNG"       │  SUB (stops type display)
+          │              CR LF (detects line ending conversion)
+          0x89 (non-ASCII, detects 7-bit stripping)
+
+00000008  00 00 00 0D                               IHDR chunk length: 13 bytes
+0000000C  49 48 44 52                               Chunk type: "IHDR"
+00000010  00 00 04 00                               Width: 1024
+00000014  00 00 03 00                               Height: 768
+00000018  08                                        Bit depth: 8
+00000019  02                                        Color type: 2 (RGB)
+0000001A  00                                        Compression: 0 (deflate)
+0000001B  00                                        Filter: 0 (adaptive)
+0000001C  00                                        Interlace: 0 (none)
+0000001D  XX XX XX XX                               CRC-32 checksum
+```
+
+The PNG signature is cleverly designed — each byte serves a detection purpose:
+
+- `0x89` — Non-ASCII, detects systems that strip the high bit
+- `PNG` — Human-readable identification
+- `0D 0A` — CR LF, detects Unix-to-DOS line ending conversion
+- `1A` — Ctrl+Z, stops file display on DOS `type` command
+- `0A` — LF, detects DOS-to-Unix line ending conversion
+
+### PNG Color Types
+
+| Value | Type | Channels | Common Use |
+|-------|------|----------|------------|
+| 0 | Grayscale | 1 | Black and white images |
+| 2 | RGB | 3 | Full-color photos |
+| 3 | Indexed | 1 (palette lookup) | Simple graphics, small files |
+| 4 | Grayscale + Alpha | 2 | Transparent grayscale |
+| 6 | RGBA | 4 | Full color + transparency |
+
+---
+
+## GIF (Graphics Interchange Format)
+
+The original web animation format from 1987. Limited but ubiquitous.
+
+### GIF Structure
+
+```
+47 49 46 38 39 61          ← "GIF89a" signature
+[Logical Screen Descriptor]  ← Canvas size, global color table flag
+[Global Color Table]         ← Up to 256 RGB entries
+[Extension Blocks]           ← Animation control, comments, etc.
+[Image Descriptor + Data]*   ← One per frame, LZW-compressed
+3B                           ← Trailer byte (end of file)
+```
+
+### Key Limitations
+
+| Aspect | Limitation |
+|--------|-----------|
+| Colors | 256 per frame (8-bit palette) |
+| Transparency | 1-bit only (fully transparent or fully opaque) |
+| Compression | LZW (patented until 2004, which prompted PNG's creation) |
+| Color depth | No 16-bit, no HDR |
+| Animation | Millisecond timing, disposal methods, no audio |
+
+### Why GIF Persists
+
+Despite being technically inferior to APNG and WebP for animation, GIF remains dominant because of universal support — every browser, messaging app, and social platform handles GIF. The format is effectively a social media lingua franca.
+
+---
+
+## WebP
+
+Developed by Google, based on VP8 video codec technology. Designed as a universal web image format.
+
+### WebP Structure
+
+WebP uses the RIFF container format:
+
+```
+52 49 46 46  [size]         ← "RIFF" + file size
+57 45 42 50                 ← "WEBP" format identifier
+[VP8 chunk]                 ← Lossy data (VP8 codec), OR
+[VP8L chunk]                ← Lossless data (VP8L codec), OR
+[VP8X chunk + sub-chunks]   ← Extended format (alpha, EXIF, animation)
+```
+
+| Mode | Chunk | Compression |
+|------|-------|-------------|
+| Lossy | `VP8` | VP8 intra-frame (similar to JPEG but with better prediction) |
+| Lossless | `VP8L` | Predictive coding + entropy coding (better than PNG) |
+| Extended | `VP8X` | Lossy or lossless + alpha + animation + metadata |
+
+### WebP vs JPEG vs PNG
+
+| Metric | JPEG | PNG | WebP |
+|--------|------|-----|------|
+| Lossy photos | Baseline | N/A | 25-35% smaller than JPEG |
+| Lossless graphics | N/A | Baseline | 26% smaller than PNG |
+| Transparency | No | Yes | Yes |
+| Animation | No | APNG (limited support) | Yes |
+| Browser support | Universal | Universal | ~97% (all modern browsers) |
+| Metadata | EXIF, XMP | tEXt, eXIf | EXIF, XMP |
+| Max dimensions | 65,535 x 65,535 | 2,147,483,647 x 2,147,483,647 | 16,383 x 16,383 |
+
+---
+
+## AVIF (AV1 Image File Format)
+
+Next-generation format based on the AV1 video codec. Backed by the Alliance for Open Media (Google, Apple, Mozilla, Netflix, etc.).
+
+### How AVIF Works
+
+Uses AV1 intra-frame coding — the same techniques that make AV1 video efficient, applied to single frames:
+
+- Advanced prediction modes (directional intra, cross-component)
+- Larger block sizes (up to 128x128 vs JPEG's 8x8)
+- Film grain synthesis (store grain parameters instead of grain pixels)
+- HDR and wide color gamut support (10/12-bit, BT.2020)
+
+### AVIF vs WebP vs JPEG
+
+| Aspect | JPEG | WebP | AVIF |
+|--------|------|------|------|
+| Compression efficiency | Baseline | ~30% better | ~50% better |
+| Encode speed | Fast | Moderate | Slow |
+| Decode speed | Fast | Fast | Moderate |
+| HDR | No | No | Yes (10/12-bit) |
+| Browser support | Universal | ~97% | ~92% |
+| Max dimensions | 65K | 16K | 8K (tiled for larger) |
+| Animation | No | Yes | Yes |
+
+Tradeoff: AVIF achieves the best compression but is significantly slower to encode, making it better suited for static assets than real-time generation.
+
+---
+
+## BMP (Bitmap)
+
+The simplest raster format — essentially raw pixel data with a header.
+
+### BMP Header Structure
+
+```
+Offset  Size  Field
+0x00    2     Signature: "BM" (42 4D)
+0x02    4     File size in bytes
+0x06    4     Reserved (zero)
+0x0A    4     Offset to pixel data
+0x0E    4     DIB header size (40 for BITMAPINFOHEADER)
+0x12    4     Width in pixels
+0x16    4     Height in pixels (negative = top-down)
+0x1A    2     Color planes (always 1)
+0x1C    2     Bits per pixel (1, 4, 8, 16, 24, 32)
+0x1E    4     Compression method (0 = none)
+0x22    4     Image data size
+0x26    4     Horizontal resolution (pixels/meter)
+0x2A    4     Vertical resolution (pixels/meter)
+0x2E    4     Colors in palette
+0x32    4     Important colors
+```
+
+Pixel data is stored **bottom-up** by default (first row in the file is the bottom row of the image) and each row is padded to a 4-byte boundary.
+
+---
+
+## Choosing a Format
+
+| Scenario | Recommended | Why |
+|----------|-------------|-----|
+| Photos on the web | WebP with JPEG fallback | Best size/quality, near-universal support |
+| UI screenshots | PNG | Lossless, sharp edges preserved |
+| Photos with transparency | WebP or PNG | JPEG has no alpha channel |
+| Simple animations | WebP or GIF | WebP is smaller, GIF is more universal |
+| Print/archival | TIFF or PNG | Lossless, high bit depth |
+| Icons/logos | SVG | Scalable, tiny file size, editable |
+| Maximum compression | AVIF | Best ratio but slower encode and less support |
+| Camera raw editing | DNG, CR3, NEF | Unprocessed sensor data, maximum editing latitude |
+| Apple ecosystem | HEIF/HEIC | Default camera format since iOS 11 |
+
+---
+
+## Related
+
+- [[File Formats]] — Parent overview of all file format concepts
+- [[File Metadata]] — EXIF, GPS, XMP metadata systems
+- [[Audio and Video Formats]] — Codecs, containers, streaming formats
+- [[Character Encoding]] — UTF-8, Unicode, text encoding
diff --git a/Tools MOC.md b/Tools MOC.md
index ed37dda..2165776 100644
--- a/Tools MOC.md	
+++ b/Tools MOC.md	
@@ -67,6 +67,7 @@ Development tools, libraries, and infrastructure across languages.
 - [[File Storage]] — Object, block, file, distributed storage
 - [[ORMs & Database Access]] — Data access patterns
 - [[Serialization]] — JSON, YAML, binary formats
+- [[File Formats]] — Binary structure, magic bytes, image/audio/video/document formats
 
 ### Observability