AWS Polly Recordings .mp3

I am importing system recordings originally done in Amazon Polly. Currently I export them out as .mp3 (
22050Hz) files, before importing them as .wav in the System Recordings FreePBX GUI.

I notice when playing the file back the quality goes down somewhat. In System Recordings, is there another format that would sound a little better?

Polly also lets me export as:
.mp3: 22050Hz, 16000Hz, 8000Hz
.ogg: 22050Hz, 16000Hz, 8000Hz,
.PCM: 16000Hz, 8000Hz

Thanks for any insight here!

using pcm negates any trancoding needs, it’s the lowest cost format for asterisk.

@dicko does the sample rate matter?

one of thousands of explanations

I’m assuming that your target audience is external callers. If they are on landlines, the G.711 codec has an 8000 Hz sample rate. Possibly, your trunking provider and their CLEC supports and uses a codec, e.g. G.722 or Opus, at 16000 Hz on calls from mobiles connected via VoLTE “HD voice”. At the present time, this is hit-or-miss. If your business deals with consumers, most callers are on mobile and you may wish to do some testing to see if 16 kHz is useful in your situation.

Conversion to a lower sample rate is a significant, unavoidable quality hit, because the higher frequencies disappear. The theoretical limit is half the sample rate; practical systems achieve ~3500 Hz max for G.711.

I would avoid the .mp3 and .ogg, because those are compression formats with additional quality loss. The simplest solution is to get the 8 kHz PCM from Amazon (as well as the 16 kHz PCM, if applicable to your system).

If you are very fussy:

Download Amazon’s files to your local PC and edit them with Audacity (or a commercial audio editor if you have one), adjusting level, equalization, etc.

Also, consider using a human voice, which will sound more natural and pleasant, even if it’s not more intelligible. Listen to samples at https://www.fiverr.com/gigs/search?query=ivr . Many of these freelancers will do 100 words for as little as $6 and actually sound pretty good. At the high end, there are some professional voice actresses on the site, typically charging $120 for the first 200 words.

Maybe sometime in the future g722 will be universal, that time has not yet come, use the traditional 64kb g711 codec standard ulaw or alaw. It is a 70 odd year old standard that won’t change anytime soon. 99.99% of your calls use it now. Unless you want to ‘boldly go’. Don’t mess with it

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.