This has been working like clockwork (no pun intended) since mid-2017 but since last week when I applied the 2 most recent available module updates (which I believe were security fixes) it has stopped working and started growing 100’s and 100’s of the following at the end of the call file…
If I delete the link from the spool directory, clean up all of that crap from the bottom it simply does the same thing the next hour. However, strangely, if I delete the link and then manually run the shell script from the commandline i get the time announcement just fine. The automated version still fails in the same way on the next invocation though.
I can only assume that since it’s been working since mid-2017 until these two most recent module updates were installed last week that there’s something changed within those two updates causing this. No idea how to troubleshoot beyond what I’ve done though.
and asterisk can obviously read/write to it since its tacking on all of the retry stuff.
I’ve again sanitized the original callfile to remove the StartRetry that got added by the last manual test… will see if I get a one-time at 10:00 when the cron job does its thing and then followed by failures again. Almost seems as if something’s happened that is causing that StartRetry to get written into the call file where perhaps it wasn’t previously.
It fires off all the call files in the queue and will immediately show wtf is going on (you won’t have to wait until the cuckoo tweets), ideally only one pending file so only one call to watch.
My 10am announcement did work, but nothing after that. Tried your touch command and got a big old nothing happening.
I took a deeper look and 2 things seem to be happening
a) the startretry and delayedretry lines are being written into the callfile
b) the callfile is not getting deleted from the spool directory once it executes the first time
I have managed to work around the issue by changing the script that’s being called by cron. Based on some notes I found in some of the callfile documentation online I replaced the ‘ln’ of the callfile with a copy of the source callfile into /tmp and subsquently moving the file into the spool directory. Even though its an extra step it guarantees that we’ll have a pristine callfile going into the spool whenever the script runs regardless of whatever might have goofed up on the previous run.
So essentially i’ve replaced ln /var/lib/asterisk/callfiles/clock.call /var/spool/asterisk/outgoing/
with cp -ap /var/lib/asterisk/callfiles/clock.call /tmp/cf && mv /tmp/cf /var/spool/asterisk/outgoing/
It would be nice to know what changed to cause this suddenly, but at least the workaround has the system working again. I’ll have to dig in and verify that some of the other things on the system that rely on callfiles are not equally broken, but thats a job for another day.