<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Apr 17, 2015 at 1:57 PM, Björn Blissing <span dir="ltr"><<a href="mailto:bjorn.blissing@vti.se" target="_blank">bjorn.blissing@vti.se</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Looking at this plot it seems like the latency times are varying according to a pattern. My guess is that the screen runs asynchronously with the GPU.</blockquote><div><br></div><div>That it certainly does. Modern LCDs are not tied to the VSYNC/HSYNC and pixel clock signals anymore as the old CRTs were, because there is always an image processing logic inbetween. If for nothing else then for scaling the image to the full resolution of the LCD panel. With packet oriented digital connections like DisplayPort it gets decoupled even further (DVI/HDMI is still essentially emulating the old VGA which in turns emulates the old TV conventions for video, even though the way the signals are encoded electrically is very different). Oh and it gets better - in fact, some monitors completely ignore
VSYNC/HSYNC signals when connected by a digital connection
(HDMI/DisplayPort), because they know how many pixels they need to draw,
so they simply count the incoming bytes, something that was not possible with the purely analog VGA signals and analog monitors.<br><br></div><div>There are basically 4 completely asynchronous systems here:<br><br></div><div>- your code <br></div><div>- the GPU <br></div><div>- the monitor processor<br></div><div>- the LCD panel itself<br></div><div><br></div><div>The first two can be synchronized using VSYNC and/or fences, the second two using the HDMI/DVI/VGA electrical signals, the monitor processor is synchronized with the LCD panel, but there is no mutual synchronisation between these pairs, there are, in fact, buffers between them to account for the differing processing speeds. And you can typically only observe the first and the last one ...<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> So the screen polls the GPU at 60Hz and sometimes the GPU just happens to have a frame ready when the screen starts a scanout. </blockquote><div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The
lower limit is just the pure scanout time of the display. But this is
just my theory right now. I do not have the detailed knowledge of the
inner workings of a LCD display.</blockquote> </div><div>The way it works is the opposite - the GPU generates the data at whatever resolution and refresh rate the screen declares that it supports (determined via EDID, unless manually overriden in the driver settings) and sends these to the processor in the display. The display then does any processing it needs and only actually flips the pixels on the panel when ready, independently from the GPU. Which is always delayed a bit - depending on how much processing the display does. There is no polling, the video connection is essentially one way only (not counting service data like EDID). <br><br>In comparison, the old CRTs had low latency, because the analog signal from the GPU was driving the deflection coils steering the electron beam practically directly, with no buffering or processing. When the GPU started a new frame by sending VSYNC, the monitor really made the beam jump to the upper left corner at that moment. That also explains why you could generate "weird", unsupported resolutions out of an analog CRT screen and why you could potentially fry it with resolution or refresh too high for it to handle - the deflection coild electronics would typically overheat, drawing too much current. <br></div><div> <br></div><div>I think the problem you are seeing with the "jitter" that sometimes you have very low latency and sometimes it is over a frame comes from a different source - namely your program and the VSYNC handling by the GPU. The GPU will always generate the output signal the same way, VSYNC or no VSYNC, otherwise the monitor may not be able to handle it and sync to it. What happens is that sometimes your program "gets lucky" and tells the GPU to swap buffers "just in time" before the start of the next frame - then you have very little latency, because the change gets visible almost immediately (modulo the input latency of the monitor above). On the other hand, sometimes you get unlucky, you swap buffers right after the scanout of the framebuffer has started and then the GPU will hold your image until the next frame cycle - poof, one frame of latency extra ... And you can have everything in between these two extreme cases.<br><br></div><div>When VSYNC is on, it gets even more complicated, because then you are telling the GPU to synchronize the userspace code with the frame scanout start (not the start of the physical frame on the monitor when your sensor reacts - remember, the GPU has no control at all over the image processor in the monitor!). This is typically an extremely inefficient thing to do from the driver's point of view because you are stalling the GPU until the new frame is due, so the drivers often "play games" here - like not really blocking your program waiting for the frame start but return right away and buffer your frame internally. The frame then gets sent out later when convenient (i.e. on the next scanout cycle). They can even hold several frames back like that and block only when this frame queue runs out of space (you were really rendering too fast). Especially Nvidia is known for these VSYNC "shenanigans" in their driver. This is what Robert was talking about with the fences - that is a relatively recent feature that allows you to force the driver to wait until a certain event occurs - e.g. a new frame start or sync event from an external source (e.g. genlock). The regular VSYNC ON/OFF will not guarantee this anymore on today's hardware, it only guarantees that you will not change the framebuffer in the middle of it being drawn (tearing). <br><br></div><div>This has a lot of consequences for "pro" applications requiring active stereo or synchronization to external sources (CAVE, TV studios, etc.). However, an average desktop application (e.g. a 3D game) benefits, because it is typically going to be able to run faster and smoother when not having to block on VSYNC.<br></div><div><br></div><div>This blog explains the issue quite well: <br><a href="http://joostdevblog.blogspot.fr/2011/10/what-no-one-told-you-about-videocard.html">http://joostdevblog.blogspot.fr/2011/10/what-no-one-told-you-about-videocard.html</a><br><br></div><div>Regards,<br><br></div><div>Jan<br></div><div><br><br></div><br></div><br></div></div>