How Big Is Your Video Again? Square Vs Rectangular Pixels

[Alexwlchan] noticed something funny. He knew that not putting a size for a video embedded in a web page would cause his page to jump around after the video loaded. So he put the right numbers in. But with some videos, the page would still refresh its layout. He learned that not all video sizes are equal and not all pixels are square.

For a variety of reasons, some videos have pixels that are rectangular, and it is up to your software to take this into account. For example, when he put one of the suspect videos into QuickTime Player, it showed the resolution was 1920×1080 (1350×1080). That’s the non-square pixel.

So just pulling the size out of a video isn’t always sufficient to get a real idea of how it looks. [Alex] shows his old Python code that returns the incorrect number and how he managed to make it right. The mediainfo library seems promising, but suffers from some rounding issues. Instead, he calls out to ffprobe, an external program that ships with ffmpeg. So even if you don’t use Python, you can do the same trick, or you could go read the ffprobe source code.

[Alex] admits that there are not many videos that have rectangular pixels, but they do show up.

If you like playing with ffmpeg and videos, try this in your browser. Think rectangular pixels are radical? There has been work for variable-shaped pixels.

11 thoughts on “How Big Is Your Video Again? Square Vs Rectangular Pixels

    1. Also one should expect that the monitor has rectangular pixels. At the beginning of this century, 15″ computer monitors with with 1024768 (4:3) and 12801024 (5:4) resolutions had LCD panels that were physically more like 3:2. Such panels are still in widespread use in industrial gear.

  1. Indeed, resolution doesn’t tell the whole story. Pixel aspect ratio describes the shape of the pixels and display aspect ratio describes the shape of the frame as a whole.

    I was on a big digitization project and one of the requirements was to use square pixels in our encoding to make the aspect ratio math easier

  2. The question is, why wouldn’t you use square pixels? If the nominal line resolution is 1920 why would you have 1350 stretched pixels instead?

    One good guess is the Kell factor, because 1350/1920 = 0.7 which happens to be the “standard” Kell factor commonly used in video production. If the video had an actual resolution of 1920 pixels per line for 1920 pixels on screen, you would need to blur it by the same amount in order to mitigate beat pattern artifacts with grid-like objects or fence posts etc.

    If you’re going to do this anyways, you might as well have fewer pixels to begin with. Less data to encode.

    https://en.wikipedia.org/wiki/Kell_factor

    1. I think your logic is a bit circular here. After reading the page you linked, I understand that the Kell factor is a subjective measure of the spacial frequencies you can encode without creating distracting interference patterns, as a ratio against the Nyquist limit for the resolution. If you decrease the resolution of your storage medium, then you would have to decrease the encoded/captured frequencies as well.

      1. It’s a bit difficult to wrap your head around, but it works like this: the display can reproduce an image with a resolution of N times the nominal number of pixels with acceptable interference patterning. N is arbitrarily chosen to be 0.7 as a compromise to balance visual quality.

        What that means is, if you start with a 1920 pixel line, you need to blur it until the effective number of pixels of information it can resolve is 1350. Or, you start with a 1350 pixel line and upscale it to 1920 pixels and then display that.

        Either way, you end up with a 1920 pixel line that contains a blurred image with an effective resolution of 1350 pixels.

        1. If you decrease the resolution of your storage medium, then you would have to decrease the encoded/captured frequencies as well.

          Note: the actual image you’re sending to the display is always 1920 pixels wide.

          Your source video file might have any number of pixels per line. When displayed, it simply gets resampled to the 1920 pixels. You don’t blur the 1350 pixel version, because it IS the number of real pixels you can display.

    2. Notably, with an electron beam scanning display (CRT) you can’t make sure that each pixel always lands perfectly on each triad of phosphors on the screen. As the beam scans across, it would have to speed up and slow down to account for slight variations in the curvature of the screen, and you’ll never get it absolutely perfect, or to remain absolutely stable and repeatable over time. The effect is that the true location of every pixel “creeps” around.

      With an analog signal on a digital display (LCD), the same problem presents in slight shifts between the input signal timing to the display’s sample timing. It’s better, so you can push the Kell factor up, but there’s still some discrepancy.

      With a digital signal on a digital screen, the timing becomes perfect, so you can push the Kell factor even closer to 1, but you can still get some of the same screen-door effect because the pattern still needs to land in-phase with the pixel grid. You can’t get perfectly up to the Nyquist frequency unless you’re doing “pixel art”.

  3. This is the pixel/sample duality kicking in again.

    There’s two ways to look at a video field – either it’s pixels, representing the colour of a region of the screen at output time, or it’s samples, representing the colour you were presented at the sample location with after applying your anti-alias filter to a continuous 2D image that’s been frozen in time.

    In the sample view, all samples are infinitely small points, and we apply a reconstruction filter to determine what to display. If you’ve done digital audio stuff, this is going to sound familiar – where an audio signal is sampled in one time dimension only, getting you intensity versus time, a video is sampled in the time dimension, two spatial dimensions, and also (for colour) with three different transfer curves (to match the L, M and S cones in the eye), to get you three intensity values (for L, M, S cones) for each point in time and space.

    You get square pixels by sampling horizontally and vertically at the same frequency in space; non-square pixels are the case where you’ve chosen different spatial sampling frequencies for each spatial dimension. We know that the maximum detail we can reproduce is limited by the spatial sampling frequency – so increasing the spatial sampling frequency increases the amount of data we have to handle, but also the detail you can see in the image.

    This then leads to an interesting tradeoff with lossy compression; if the codec will remove high frequency information to fit within the bitrate, why encode at a higher sample rate than you need? If 1,200 samples per line is enough to fully reconstruct all the detail that you can encode in the bitrate available, why make the encoder encode 1,920 samples, when you can reduce to 1,280 samples and get the same quality of picture from the decoder after the reconstruction filter?

    With block-based codecs (MPEG-2, H.264/265/266, AV1), you also have to account for the minimum number of blocks needed to supply an I-frame; at 1920×1080 progressive, for example, H.264 needs at least 8,100 blocks to supply an I-frame, whereas 1280×1080 non-square samples reduces that to 5,400 blocks. As there’s a per-block cost in terms of bits, if you know that the extra 2,700 blocks in a 1920 wide frame aren’t going to improve picture quality, you might as well use the non-square 1280×1080 sampling, and free up the bits that the extra blocks would have required for improved picture quality.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.