Scheduled call file suddenly stopped working

mcisar · December 17, 2019, 4:06pm

I have a system running 10.13.66-22. No configuration changes to the system in at least 2 years, but updated relatively regularly so it is up-to-date.

The system is configured to do a multicast page announcing the time once per hour by calling this shell script…

#!/bin/bash
ln /var/lib/asterisk/callfiles/clock.call /var/spool/asterisk/outgoing/;

The contents of the call file are thus…

channel: Local/397@from-internal
application: SayUnixTime
data: ,,\'silence/2\' \'current-time-is\' IMp
maxretries: 0
retrytime: 60
waittime: 30
callerid: Talking Clock <60>
priority: 1
Setvar: ALERT_INFO=<Bellcore-dr4>

This has been working like clockwork (no pun intended) since mid-2017 but since last week when I applied the 2 most recent available module updates (which I believe were security fixes) it has stopped working and started growing 100’s and 100’s of the following at the end of the call file…

StartRetry: 2620 1 (1576568623)
DelayedRetry: 2620 0 (1576591203)
DelayedRetry: 2620 0 (1576591263)
DelayedRetry: 2620 0 (1576591323)

If I delete the link from the spool directory, clean up all of that crap from the bottom it simply does the same thing the next hour. However, strangely, if I delete the link and then manually run the shell script from the commandline i get the time announcement just fine. The automated version still fails in the same way on the next invocation though.

I can only assume that since it’s been working since mid-2017 until these two most recent module updates were installed last week that there’s something changed within those two updates causing this. No idea how to troubleshoot beyond what I’ve done though.

Any ideas?

Mike

dicko · December 17, 2019, 4:15pm

Check that your system clock is correct

mcisar · December 17, 2019, 4:15pm

Just a quick update… have noticed that even when running the script manually I get a

StartRetry: 2620 1 (1576599220)

line added to the bottom of the call file, and this prevents even the manual run from operating a second time until the call file is sanitized.

Mike

mcisar · December 17, 2019, 4:18pm

Yes, the clock is fine… and the callfile link is being made in the spool directory at the time it should be, it just never fires once there.

dicko · December 17, 2019, 4:19pm

Ownership?

mcisar · December 17, 2019, 4:26pm

644 asterisk asterisk

and asterisk can obviously read/write to it since its tacking on all of the retry stuff.

I’ve again sanitized the original callfile to remove the StartRetry that got added by the last manual test… will see if I get a one-time at 10:00 when the cron job does its thing and then followed by failures again. Almost seems as if something’s happened that is causing that StartRetry to get written into the call file where perhaps it wasn’t previously.

dicko · December 17, 2019, 4:42pm

From the asterisk cli run

! touch /var/spool/asterisk/outgoing/*

mcisar · December 17, 2019, 4:47pm

Will try that once I get back to my office… what does that command accomplish?

dicko · December 17, 2019, 6:58pm

It fires off all the call files in the queue and will immediately show wtf is going on (you won’t have to wait until the cuckoo tweets), ideally only one pending file so only one call to watch.

mcisar · December 18, 2019, 6:39am

My 10am announcement did work, but nothing after that. Tried your touch command and got a big old nothing happening.

I took a deeper look and 2 things seem to be happening
a) the startretry and delayedretry lines are being written into the callfile
b) the callfile is not getting deleted from the spool directory once it executes the first time

I have managed to work around the issue by changing the script that’s being called by cron. Based on some notes I found in some of the callfile documentation online I replaced the ‘ln’ of the callfile with a copy of the source callfile into /tmp and subsquently moving the file into the spool directory. Even though its an extra step it guarantees that we’ll have a pristine callfile going into the spool whenever the script runs regardless of whatever might have goofed up on the previous run.

So essentially i’ve replaced
ln /var/lib/asterisk/callfiles/clock.call /var/spool/asterisk/outgoing/

with
cp -ap /var/lib/asterisk/callfiles/clock.call /tmp/cf && mv /tmp/cf /var/spool/asterisk/outgoing/

It would be nice to know what changed to cause this suddenly, but at least the workaround has the system working again. I’ll have to dig in and verify that some of the other things on the system that rely on callfiles are not equally broken, but thats a job for another day.

system · December 25, 2019, 6:39am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.