‘When Math Meets Art’: How JPEG Magic Works

In the world of digital imagery, we rarely stop to think about how photos and graphics appear on our screens in high quality while taking up so little storage space. The secret lies in the JPEG format – an algorithm that skillfully combines mathematics, visual perception, and engineering ingenuity. JPEG doesn’t just compress images; it “understands” what the human eye focuses on and what can be discarded without noticeably affecting perceived quality.

Exploring JPEG reveals not only the technical details but also the underlying philosophy of digital compression. In this article, we’ll examine how the format works, why it was created, and what makes it remain a standard even in today’s technology landscape. We’ll trace the path from pixels to calculations, from signals to complete images, to understand the mechanics – and the subtle magic – behind lossy compression.

TABLE OF CONTENTS:

How JPEG image compression works

In the age of artificial intelligence, it’s easy to forget that most computational systems are ultimately built for people. And people are imperfect – we make mistakes, have limited perception, and see the world through the lens of our own physiology.

The pioneers of computing understood this well. They recognized that a machine doesn’t always need to be perfectly accurate to be useful. In many cases, approximate methods or heuristic approaches yield better results precisely because they account for human perception. The balance between mathematical precision and practical relevance has been the driving force behind many technological breakthroughs.

One of the clearest examples of this balance is JPEG – a format so familiar that we rarely even notice it. It’s a remarkable piece of engineering born at the intersection of mathematics, psychology, and aesthetics. JPEG doesn’t just compress images; it encodes an understanding of how humans see.

At its core, JPEG relies on lossy compression: it intentionally discards part of the data to dramatically reduce file size while keeping the image “good enough” for the human eye. Behind this apparent simplicity lies a refined science – a blend of signal theory, visual perception, and engineering ingenuity. Even decades after its creation, JPEG remains relevant. But why?

Human perception and computation

We interact with technology through our senses. Fingers press buttons, ears catch notifications, and eyes focus on a mosaic of pixels that form text, color, and motion. These sensory channels serve as our primary interface between biology and electronics. By understanding their limitations, engineers are able to design more intuitive technologies. They know how to make use of the “blind spots” in human perception to create systems that feel closer to how we naturally interpret the world.

Take vision, for example. Our eyes aren’t cameras but complex biological sensors with their own imperfections. When a display refreshes too slowly, we perceive frame “tearing,” and the illusion of motion breaks down. Researchers have determined that a refresh rate of about 30–60 frames per second is enough to trick the brain, which is why most videos appear smooth to us.

Another key factor is brightness, measured in nits or candelas per square meter. One candela corresponds to the brightness of a single candle. So, a 1000-nit display – like that of a modern iPhone – emits roughly the light of a thousand candles over the area of one square meter.

JPEG as the art of compromise

By understanding the limits of human vision, engineers learned how to turn them into an advantage when developing efficient compression algorithms. JPEG intentionally discards visual data that the human eye is unlikely to notice – these are the “losses” that allow for dramatically smaller file sizes without a visible drop in quality.

Incidentally, JPEG and JPG refer to the same format. The three-letter version originated from older Windows file systems that restricted file extensions to three characters.

Other image formats, such as PNG, use lossless compression. This means that no data is discarded during saving – every bit of information is preserved, making the image identical to the original. However, this precision comes at a cost: file sizes can be many times larger. For example, a 2592×1944-pixel photo saved as PNG takes up about 15 MB, while the same image in JPEG format is roughly 0.75 MB – with little to no visible difference to the human eye.

When JPEG emerged in the early 1990s, it wasn’t just a technical improvement – it was a practical solution to the limitations of the time. Computers were slow, memory was scarce, and formats like Microsoft’s BMP produced enormous files. Transmitting such images over the early Internet was a real challenge.

Engineers found a compromise: a method that drastically reduced file sizes while keeping images visually appealing to the human eye. That’s how JPEG was born – a format designed to “add a touch of magic” by keeping only what we actually see and discarding the rest.

Теж цікаво:Diary of a Grumpy Old Geek: Laughter, Frustration, and the Xiaomi 17 Pro Max

How JPEG works

Since JPEG relies on lossy compression, the key question is what information can be safely discarded. The answer to that allows file sizes to be reduced by a factor of 10 to 20 without noticeable loss of quality. This principle is rooted in the physiology of human vision – we are far more sensitive to changes in brightness than to subtle shifts in color.

A classic example is an optical illusion where two tiles appear to be different colors, even though they are actually the same (note: you can verify this by covering the middle of the tiles with a finger – the colors match). The brain automatically compensates for contrast and “fills in” a difference that doesn’t exist. JPEG engineers exploited this feature to preserve important information while discarding what the eye barely notices.

An image consists of millions of pixels, and in the standard RGB format, each pixel is described by three numbers – red, green, and blue. Each color channel uses 8 bits, so a pixel is 24 bits in total, allowing for over 16 million possible color combinations. In RGB, (0, 0, 0) represents black, (255, 255, 255) white, and the remaining values form the full color spectrum.

JPEG engineers took a clever approach: separating brightness from color information, because light and contrast primarily determine how we perceive detail. This gave rise to the YCbCr format: Y represents luminance (brightness), while Cb and Cr represent the blue and red color deviations. This transformation doesn’t change the pixels themselves but allows for more compact and efficient storage, preparing the data for further compression.

With luminance handled separately, JPEG takes its first step toward reducing data size by performing chroma subsampling. The idea is simple: some color information can be discarded, because the human eye is far less sensitive to subtle changes in hue than to variations in brightness. The image is divided into blocks, typically 8×8 pixels, within which color components are averaged. As a result, several neighboring pixels share a common color value. While this reduces the amount of information, the visual appearance remains essentially unchanged. This small but clever step is where the real “magic” of JPEG begins.

For example, one approach is to take the average value of a 2×2 pixel block and apply that average to each pixel. In the standard JPEG method, the top-left pixel of the block is chosen, and its color is applied to the other three pixels.

Essentially, this step starts with three channels (Y, Cb, and Cr) and preserves full detail in a single channel (the luminance component Y) while reducing the resolution of the other two channels (Cb and Cr) by a factor of four. In other words, the image goes from three full channels to 1 + ¼ + ¼ = 1.5 channels, meaning that JPEG now uses roughly 50% of the data compared to the original image.

If we chose to perform even more aggressive subsampling – for instance, averaging Cb and Cr over a 4×4 block instead of 2×2 – the color channels would be reduced by a factor of 16 rather than 4. This would result in roughly 1 + ¹⁄₁₆ + ¹⁄₁₆ = 1.125 channels, or about 37.5% of the original data, corresponding to a compression ratio of approximately 2.67:1.

This is just the first stage of the JPEG compression process. Next, we’ll look at how brightness and color information can be transformed into signals, allowing us to apply mathematical tools for further compression.

Converting images into signals

A bit of imagination – and some basic math – allows us to think of an image as a signal. The goal is to transform the image from the spatial domain into the frequency domain. If you take a row of pixels from an image and plot their values on a graph, you can effectively visualize the color information as a signal of points. Since each pixel has a value from 0 to 255, this kind of graph provides an intuitive way to understand this stage of the process.

Looking at this graph, you can see that rapid changes between pixels correspond to high-frequency signals, while gradual changes correspond to low-frequency signals. Classifying signals by frequency allows us to exploit another feature of human perception: our visual system is less sensitive to high-frequency details. Additionally, most photographs contain far more low-frequency components than high-frequency ones, so it’s often possible to discard some of the high-frequency data without noticeably affecting visual quality.

But how can this be done?

This is achieved using the Discrete Cosine Transform (DCT). Its purpose is to convert a set of pixel values from the spatial domain into a sum of cosine wave components – effectively representing the data in the frequency domain. A cosine wave has three parameters: frequency, amplitude, and phase, all of which are used to encode information. By expressing the image in this way, it becomes easier to separate low-frequency components from high-frequency ones, preserving the most visually important elements while preparing the data for compression.

DCT became the foundation of JPEG not only for its mathematical advantages but also because it was open and free from patent restrictions, making it a practical choice for an international standard.

When DCT is applied to an 8×8 pixel block, it produces a set of 64 cosine waves, each corresponding to a specific frequency. This allows us to assess how much each wave contributes to the overall appearance of the block. This transformation from the spatial domain to the frequency domain is a fundamental step in JPEG compression.

In this two-dimensional space, the top-left corner of the DCT represents low-frequency information, while the bottom-right corner corresponds to high-frequency information. The values within each 8×8 matrix block encode this data: larger values indicate stronger cosine components, while smaller values indicate weaker ones. Since low frequencies typically dominate, the top-left corner usually contains higher values, whereas the bottom-right corner has much smaller ones.

For example, when converting a grayscale image to DCT coefficients, we first take pixel values ranging from 0 (black) to 255 (white), then subtract 128 to center them around zero. Next, the 2D DCT coefficients are calculated using the standard formula. The result shows that the top-left corner contains the largest values, corresponding to low-frequency components, while the bottom-right corner holds much smaller values. This distribution forms the basis for the next stage of JPEG compression: quantization.

For a deeper understanding of the transition from pixels to DCT coefficients, it can be helpful to watch videos that visualize how an image’s frequency structure is formed. It’s important to note that the DCT itself is lossless: if the transformation is reversed, the original data is fully restored. The compromise comes at the quantization stage, where some high-frequency information is discarded to save space.

Quantisation and loss of redundant data

Quantization is a process in which a large range of values is mapped to a smaller, predefined set, usually through rounding or scaling. A simple analogy is pricing: instead of counting exact cents (e.g., 19.99 UAH), we round to whole units (20 UAH). Some precision is lost, but the overall information remains understandable and useful.

In JPEG, this approach is used to reduce the weight of high-frequency components while leaving low-frequency ones largely intact. DCT organizes frequencies in an 8×8 matrix, which makes it straightforward to divide each element by the corresponding value in a quantization table and round the result.

Quantization tables are created manually based on experiments with human image perception. After division and rounding, most high-frequency values in the bottom-right corner become zero, while the values in the top-left corner are preserved, representing the main visual details.

During JPEG decoding, the same quantization table is used. This step is considered “lossy”: multiplying by zero permanently removes some information. At the same time, the image remains largely unaffected to the human eye, because the discarded data corresponds to details that are barely noticeable anyway.

Quantization is applied to both luminance and color, but the color channels are compressed more aggressively. This is because the human eye is more sensitive to brightness than to color. By experimenting with different quantization tables, you can see how increasing compression affects image quality – yet even at high levels, the image often still appears acceptable to the eye.

Sequence encoding and Huffman coding

To improve efficiency, JPEG first rearranges the quantized DCT numbers in a zigzag order – this places many zeros at the end of the sequence. The algorithm then counts repeated values to make the sequence more compact. For example, instead of storing all 64 numbers in an 8×8 block, only 17 might need to be saved, yielding substantial savings.

But the process doesn’t stop there. JPEG encodes each number as a triplet: the number of preceding zeros, the number of bits required to represent the value, and the value itself. Some triplets occur more frequently, so fewer bits are used to encode them. Imagine a common sequence like “0 0 0 0 0 0 0 0” being replaced with a short symbol, such as “A,” while rarer sequences are left unchanged. This approach achieves efficient compression without losing essential information.

Цей принцип лежить в основі кодування Хаффмана, яке JPEG поєднує з кодуванням довжини послідовності. Разом вони дозволяють максимально ущільнити дані після квантування. Важливо, що цей крок не є втратним – він просто кодує вже скорочену інформацію, не змінюючи її значення.

JPEG decompression

JPEG decompression is essentially the reverse process. First, the Huffman and run-length codes are read, then the quantized DCT matrix is reconstructed. Each element is multiplied by the quantization table, the inverse DCT is applied, and 128 is added back to each value. The result is an image that is very close to the original – the differences are minimal and barely noticeable. Even after high compression, JPEG delivers images with remarkably good visual quality.

This algorithm remains highly effective and continues to form the foundation of digital imaging. It doesn’t just compress data – it “sees through human eyes,” preserving only what is important. This is why JPEG has endured for decades and remains a widely used standard, even with the emergence of alternatives like WebP, AVIF, and JPEG XL.

What is the reason for JPEG’s success?

Looking more closely, JPEG feels less like a file format and more like a philosophy. Its defining feature is that it embraces imperfection and turns it to its advantage. Instead of recording every single pixel, JPEG asks, “What really matters to the human eye?” – and discards the rest. This is the “magic” that allows us to view high-quality images without wasting excessive resources. Precision isn’t always necessary; what matters is perception. It’s this balance that has helped make modern computers and imaging technologies so efficient.

The success of JPEG can be attributed to three simple principles. First, not all data is equally important – sometimes losing a little isn’t a problem, and can even be beneficial. Second, the human eye naturally “optimizes” details – it doesn’t notice every subtle shade, allowing color information to be compressed while preserving brightness. Mathematics helps here: the Discrete Cosine Transform (DCT) identifies what can be discarded and what should be kept. Third, repetitive patterns and zeros can be compressed very efficiently, which is how JPEG reduces megabytes to tiny files while keeping the image visually sharp.

The same principles apply in other domains. MP3 compresses music, Dolby and AAC organize sound, and H.264, H.265, and AV1 make video easier to stream. The underlying idea is consistent: compression isn’t just about reducing data – it’s about taking human perception into account.

Conclusions

The same principles apply in other domains. MP3 compresses music, Dolby and AAC organize sound, and H.264, H.265, and AV1 make video easier to stream. The underlying idea is consistent: compression isn’t just about reducing data – it’s about taking human perception into account.

Even with the emergence of modern formats, JPEG’s achievements remain impressive. Its engineering precision is paired with a deep understanding of human perception, and the acronym itself has entered everyday language. Despite numerous new standards and technologies, JPEG continues to stand out. Its simplicity, versatility, and optimal balance between image quality and file size have established it as a benchmark of the digital era. And even if new formats eventually take over, the spirit of JPEG – the concept of “intelligent compression that machines see but humans barely notice” – will continue to underpin image processing technologies.

Read also:

When Math Meets Art: How JPEG Magic Works

How JPEG image compression works

Human perception and computation

JPEG as the art of compromise

How JPEG works

Converting images into signals

But how can this be done?

Quantisation and loss of redundant data

Sequence encoding and Huffman coding

JPEG decompression

What is the reason for JPEG’s success?

Conclusions

New comments