Tighten up the sync

Doing initial tests on a desktop computer (Firefox) and an iPad (Safari, iOS 12), the video does sync, but it’s hundreds of milliseconds (and sometimes full seconds) off.

If I make a suggestion to tighten this up:

Take a page from NTP’s book, and measure the amount of time it takes for the network request to get there, then compensate locally.

From www.ntp .org/ntpfaq/NTP-s-algo.htm
“Synchronizing a client to a network server consists of several packet exchanges where each exchange is a pair of request and reply. When sending out a request, the client stores its own time (originate timestamp) into the packet being sent. When a server receives such a packet, it will in turn store its own time (receive timestamp) into the packet, and the packet will be returned after putting a transmit timestamp into the packet. When receiving the reply, the receiver will once more log its own receipt time to estimate the travelling time of the packet. The travelling time (delay) is estimated to be half of “the total delay minus remote processing time”, assuming symmetrical delays.”
(I would personally suggest video time as the timestamp, but I have no knowledge of the back-end code)

Typically NTP waits for 5 exchanges before it has enough to trust the numbers, but even doing a single exchange on each command would likely drop the possible difference to under 100ms.

It occurs to me while testing different video providers (dailymotion being an especially bad example of this) that there may also need to be some allowance for players snapping to b-frames and other encoding ‘chunks’ (tell the player to seek to 12:02.321 and it may snap to the b-frame after at 12:03.400)

Thanks for your feedback. It seems like you have quite some technical knowledge! Watch2Gether syncs the video playback to the extend that is required to create a social watching experience. It’s not syncing to the exact frame since that would require an enormous effort and overhead. Do you have a use case that would require such a close sync?

Thanks for your kind words @florian.
I would say that the use case is the same, which is to say that the lower the latency between two experiences, the more connected the people feel who are experiencing them.

As a rough example I’m sure we’ve all experienced an audio call with significant delay, and how much that can disrupt the flow of communication (especially if it’s bad enough to cause both people to interrupt each other when they try to pause and wait for the other before continuing) and how disconnected that can feel.

According to en.wikipedia .org/wiki/Latency_(audio) for example, they say “measurable quality of a call degrades rapidly where the mouth-to-ear delay latency exceeds 200 milliseconds”

Now, there’s not much we can do about the actual transmission of audio/video (in the case of the “voip” call) over the internet as that’s hard-limited by the speed of the signal travelling through the network, but since the video on each side is played from a local buffer there’s no hard limit to how precise the syncing can be.

I suspect that with a more tightly-synced playback time, people could react at the same time to events happening on the video and get a closer feedback loop between them. Those who really wanted to sync up even more closely could call each other through a low-latency network (say landlines/cell) and watch the video through watch2gether.

Thanks or your message. You are right about the feedback loop. The question is how does a user perceive the playback position of another user. When they enable their webcams they might hear the audio and see the reaction from the other side but then the delay of the webcam link comes into play as well. But i agree that these are interesting thoughts, very much to the core of what Watch2Gether is about!

I prefer not resurrecting old threads, but this is the exact issue I am trying to research and correct. The use case is having 2 or 3 or 7 people all hanging out, trading videos and socializing. Inevitably they will try to sing along as a group and make an effort to ratchet up the interactivity of the event through the feeling the music or video gives them. Unfortunately, it all goes to hell rather quickly because it can’t done. Everyone is “off” time. It does take away from the full enjoyment of the experience and the effort to create a seamless virtual Happy Hour experience.