← Blog

Color in PDF — why “just compress the image” isn’t always possible

PDF color models and whether JPEG can be applied DeviceRGB / CMYK / Gray "just numbers" ✓ JPEG CalRGB / CalGray calibrated ✗ Flate only Lab scientific, absolute ✗ range doesn't fit ICCBased with ICC profile ≈ depends on profile Separation / DeviceN Pantone, spot inks ✗ do not touch Indexed palette up to 256 ✗ Flate only

PDF defines at least five different ways to describe color, and a compressor that conflates them will turn a carefully prepared document into garbage.

Why several kinds of color exist

Each scenario needs different color information:

Five ways to describe color

DeviceGray, DeviceRGB, DeviceCMYK — just numbers

A pixel is one, three, or four numbers from 0 to 255, with no tie to a physical color: “red 200” looks bluer on one monitor, yellower on another. This is device-dependent color.

For compression: JPEG fine, no constraints. There’s no calibration to lose.

CalRGB and CalGray — calibrated RGB

The same RGB plus explicit information about how the device should render those numbers — three white-point coordinates in CIE space, a gamma, an RGB matrix. With that, the reader can convert values into an absolute physical color.

For compression: be careful. A standard JPEG encoder converts RGB to YCbCr using ITU-R BT.601 coefficients that assume ordinary DeviceRGB. For CalRGB the conversion produces a color shift.

pdfcompressor disables JPEG for images in CalRGB and CalGray, leaving Flate. Larger files, exact shades.

Lab — physically defined color

Lab* is device-independent. L is brightness on a 0–100 scale; a and b are color-difference axes with a sign (roughly -128 to +127). It covers the entire visible spectrum (more than sRGB and Adobe RGB) and shows up in scientific and archival applications.

For compression: JPEG doesn’t work. JPEG itself is colorspace-agnostic, but the standard encoder pipeline (libjpeg, jpegli) assumes 8-bit unsigned channels and applies an RGB→YCbCr transform. Lab — 0–100 range, signed a/b — doesn’t fit those assumptions; after a naive round-trip the numbers drift, and Lab values come out wrong.

pdfcompressor uses only Flate for Lab. Compression is more modest (3–4× rather than JPEG’s 10–15×), with no color distortion.

ICCBased — same approach with an ICC profile

The most common “correct” model in modern PDFs. The image is described in standard RGB, CMYK, or another space, with an ICC profile attached — a file describing the precise transform into absolute color.

This is what makes pre-press work: the printer receives the PDF, reads its embedded ICC profiles, simulates how the file will look on its presses, and corrects. The profile must be embedded inside the PDF.

For compression:

In pre-press mode (when the file shows signs of being print-ready), pdfcompressor enables JPEG only for standard RGB profiles; everything else uses Flate.

Indexed — palette

Pixels are indices into a palette, the palette itself defined in any of the spaces above. Compress with Flate only; no other codec is safe.

Pattern

Not raster data but a fill pattern — either a tiling pattern (a repeating image) or a shading pattern (a gradient), made of drawing commands and possibly embedded rasters. The compressor walks the pattern and applies the usual rules to its contents; the Pattern colorspace itself is not re-encoded.

Spot colors: Separation and DeviceN

The most important category for the print industry:

Push such an image through a YCbCr-based JPEG encoder:

  1. The single-channel image becomes three-channel.
  2. The press no longer sees “spot Pantone 186 C.”
  3. It prints a CMYK approximation of red.
  4. The shade is wrong. On a magazine cover, that’s a defect.

DeviceN generalizes this to multi-channel images for printing in several spot inks at once — black plus gold plus varnish, for example.

pdfcompressor doesn’t touch Separation or DeviceN at all. No JPEG, no downsampling. Optional Flate recompression at most, always lossless.

Rendering intents

ICC-managed PDFs carry a rendering intent — the rule for how the reader should reproduce a color when the source space is richer than the target:

This metadata is preserved.

The decision logic

Simplified, per image:

if colorspace ∈ {Separation, DeviceN}:
    leave alone
if colorspace ∈ {Lab, CalRGB, CalGray}:
    Flate only, no downsampling
if colorspace = ICCBased:
    if profile is well-known and standard (sRGB, Adobe RGB):
        JPEG ok
    else:
        Flate
if colorspace ∈ {DeviceRGB, DeviceGray, DeviceCMYK}:
    anything goes
if colorspace = Indexed:
    Flate only

Numbers

On 1000-document sets in DeviceRGB the JPEG path always fires. On graphic and print PDFs (magazine layouts, labels, catalogs) up to 30% of images fall into restricted categories — savings are more modest there, but no shade shifts. The same compressor produces -60% on an office document and -30% on a catalog; in both cases the document afterwards looks exactly the same as before.