Odd audio alignment issues in recordings

this one’s going to get a bit Wonky, but I’m hoping for a logic check, or some advice.

I have a monitor_rec_option set to:
BR (record the received audio, once the call is bridged) to recv.wav
BT (record the transmitted audio, once the call is bridged) to trans.wav

both if these files are identical in filesize, which is good, as I’m trying to record both legs.

I’m trying to merge them via:
sox -M trans.wav recv.wav output.wav

however:
the output filesize is almost always 2*N-1000b (2x the input filesize, minus 1kb). a wav file should only have 44 bytes of header. that’s… concerning.

the Output file has mis-aligned audio. it sounds like I’m answering questions before asking them.
This leads me to wonder if somehow, sox is stripping leading silence from one of the audio files, or some other funny business.

I know this is asterisk or raw janky shell script magic and not FreePBX Specific, but this community has been exceptionally helpful on similar issues, and wanted to pose it here.

  • if the input files are identical in size, the should be identical in duration?
  • does it seem logical that sox would strip silence off of the call?
  • any suggestions for how to rectify, so that I can properly get my resultant output audio file?

I assume you mean B, not b. Exactly 1kB (as against 1KB sounds odd.

However, .wav file overhead is not fixed at 44 byte; there can additional meta data chunks, the audio may be in multiple chunks, and, if I correctly interpreted a quick scan of secondary sources the meta data doesn’t have to come first. I think you can assume that Asterisk always writes .wav with 44 bytes of overhead, but I don’t know what sox does.

Did you really mean to merge, rather than mix?

My guess is that Asterisk is allocating up to the next multiple of 512B or 1KB, but sox is stripping some of the padding.

Also, I wouldn’t expect the media to start simultaneously in both directions, and I wouldn’t expect asterisk to pad out the later starter. I’d expect media to start at the time that OK is sent, from the UAS, and when ACK is sent, from the UAC, but this could be complicated by what is happening upstream, the use of late offer SDP, and configurations that require comedia to be used to learn the other party’s media address. I’m not sure when ICE processing happens but, conceivably that add delays to the media start. If MixMonitor is being used, it looks like the -S option might help mitigate this.

apologies for that.
in ls, the filesize difference is exactly 1,000

i do indeed believe i mean Merge.
I want my transmit on the Left Ear, and Receive on the right ear.
that being said, I’m doing both mix And merge, and i have the same audio issue (overlapping / misaligned) on both stereo(merge) and mono (mix)

to make this a bit uglier, I’m recording via setting this:

MONITOR_REC_OPTION=br(${SS}{MIXMON_DIR}${SS}{YEAR}/${SS}{MONTH}/${SS}{DAY}/recv_${SS}{CALLFILENAME}.${SS}{MON_FMT})bt(${SS}{MIXMON_DIR}${SS}{YEAR}/${SS}{MONTH}/${SS}{DAY}/trans_${SS}{CALLFILENAME}.${SS}{MON_FMT})

specifically the br and bt options.
which Should be “don’t start until the call is bridged”. a few ms either way i don’t really care about, but I’m hearing reports of ~1 to 1.5 seconds of misalignment.
it appears it is favoring our agent as well, if that is relevant. or… our audio is ~1 to 1.5 seconds Earlier than the remote party audio

mixmonitor ?

  • S - When combined with the r or t option, inserts silence when necessary to maintain synchronization between the receive and transmit audio streams.

curious. do you have any fleshed out examples of this?

No , but I discard empty wav file ( 44 bytes) but I always had the ‘S’ option added, I never had a complaint, actually I am using ffmpeg now which can use timestamps.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.