Linking to a post I made in Head-fi: Kindly ignore the sonic conclusions/inferences I made. I’m still debating if it’s placebo or am I really hearing changes. If there’s an error on the normal cable. It shouldn’t sound the same every time. Error is random. Not constant. So it contradicts my inferences. However I don’t know whats happening inside the dac and how it manages missing packets, so there could be ways this might be masked in the later stages in DAC filters/interpolation stages.

What I understand is that USB has different supported modes for transfer of data.

The normal mode for network/file transfer supports error correction and re-transmission. However it doesn’t seem true for audio. We used to have isosynchronous protocols (where there is a specific timing maintained and it sends data uniformly over time) of 125us per transfer and 8000 frames per transfer. This would have meant alternating frame count every second – 5 in first second for 40000 samples and 6 in next 48000 samples averaging 88000 for two seconds (the 200 samples will have to be interpolated). If you had audio in 48000hz, no interpolation would have been necessary.

Important thing to note is, this 125us polling time is an override mode. The general pollling time of usb is 1ms for the other transfers, so we are overriding it by atleast 8x and it becomes way more stressful and jitter sensitive.

For asynchronous audio (after usb 2.0), it uses 1ms poll time with tonnes of buffer (defined between the dac and host during initial communication) and the clock is now determined by the dac. The dac requests data when it desires and the usb bus is supposed to buffer it and send it following the clock of the USB interface in the dac. So we have been able to remove the effect of jitter quite a bit. But now the computer needs to make sure it responds to this request timely. It is still not error correcting since there’s not much time for re-transmission. Also, USB transfer is through serial data packets. All your volume control info, info for both the audio channels are sent together with a specific framing structure. The Xmos and other interface thing is supposed to decode back the different channel info and send it to the i2s interface of actual dac chip. No one knows how it manages if it gets a erroneous data package or if the stereo information is messed up (just shuffling, glitch or jitter is enough to create sampling artefacts and mess things up).

I do think the channel separation, that I subjectively perceive has an explanation. – 3.4 Inter Channel Synchronization – “It is up to the host software to synchronize the different audio streams by scheduling the correct packets at the correct moment, taking into account the internal delays of all audio functions involved”. I think this is an old USB spec, but I still think the content should still hold.

USB has got 4 pins – power, ground, data+, data-.. so data is sent as a differential. There’s little chance of noise issues. All issues can now be safely traced back to “timing” and transistor pull up/impedance matching. Basically all digital signal are still an analog waveform with an eye pattern. They are just discretized in terms of usable states. They still need to be sampled by transistors at the phy layer. And there’s concepts of impedance matching to be optimal point in load curve for those transistors. 90ohm is the recommended one iirc. I think the difference has got not much to do with noise and isolation since USB is a differential signal. It’s about stress on Phy layer. Phy layer is the one that samples the signal back into the circuit.

Can anyone help me understand how the phy layer is designed physically, since I believe the reason might lie there. I think if I have access to a xmos/other usb interface that mimics the dac function, except replacing the Dac with a logger that stores data from i2s streams in a memory, It is possible to test what is happening. Or even a bitrate error tester ( BERT) that interfaces to the usb in asynchronous mode for audio.

I am honestly less interested in actually improving audio, but only to understand how the protocol works and how it can fail. What are the general numbers of fault/retransmission rates, etc, from a system architecture point of view.

Thanks and Regards,
Manuel Jenkin.

Leave a Reply

Your email address will not be published. Required fields are marked *