Voice pjsip quality issue

ok I will updated this tomorrow. all ip phones have an ac adapter and are connected on 48 ports quanta l4b gigabit enterprise solution with every possible voip enhancement enabled. every fanvil is connected then to a pc thorugh the dedicated port. but I repeat I have good quality inside my network, only sometimes there is a little delay but is acceptable and I think that is related to the bad quality of some eth cables.

so I made 3 captures directly from the same fanvil utilities menu:

  • an echo test (there is a looot of jitter but no loss packets, in a first moment there is no delay, around 30s I’m receiving around 1/2s of delay and this disappears after another 30s)
    echot_test.tgz (478,1 KB)
  • an external incoming call (quality is 60% of the expected in a good one)
    external_inbound.tgz (1019,8 KB)
  • and the last one is an internal call through 2 fanvil endpoints (here the quality is ok)
    internal_call.tgz (339,3 KB)

There seems to be a bug in Fanvil’s packet capture logic, causing the packet timestamps (not the RTP timestamps) to jump forward by one second and then back a few packets later. This occurs once per second.

For example, in the internal call capture, packet 75 has a timestamp 1020 ms after packet 74.

Unfortunately, that is causing Wireshark’s analysis to return bogus results.

Do you have any more recent firmware that may have fixed this? If not, I’ll try to come up with a way to adjust the files.

unfortunately all fanvil are already at the last version that they sends to me before some weeks(fanvil provides firmware updates through email, the official site is not updated)

Can you use the port monitor/mirror features on the Quanta so your PC can ‘see’ all the packets to and from one Fanvil and capture them with Wireshark?

Or, if that’s tough, post a capture from the PBX.

thank you for your support, I’m sorry again but unfortunately I cannot find something like tcpdump into the telnet interface of these switches… as you suggested I’m attaching the tcpdump obtained directly through my freepbx, I used the command

tcpdump -s0 -T rtp -w /tmp/external.pcap -C50 udp and host 192.168.25.176

where 192.168.25.176 is the fanvil on my desk, and again the archives are for echo, internal, and external incoming calls.
many thanks
external.tgz (1,2 MB)
echo.tgz (420,6 KB)
internal.tgz (354,8 KB)

The RTP stream from ehiweb has some (relatively minor) anomalies, but it appears that the Fanvil phones handle them poorly.

The above graph shows the delta in arrival times from the trunk (actually, transmission times to the phone). It appears as if there are 11 lost packets and 7 duplicates, and the RTP timestamps are consistent with that. However, the sequence numbers are generated by Asterisk and there are none missing or duplicated.

I strongly suspect that the Fanvil is using sequence numbers rather than timestamps to control its jitter buffer, so these events look like sudden changes in latency. The phone tries to adjust by speeding up or slowing down playback, causing the distorted sound you hear.

We should capture some RTP on the trunk side, which may give a clue as to whether this is an ehiweb issue or is caused by the connectivity between your server and theirs. I suspect the former, because duplicate packets are normally quite rare (though if there is some hardware in your connection that automatically retransmits unacknowledged packets, lost acks will result in duplicates).

If you want to test with alternate providers, one option is a free number from VoIP - Internet calls - MessageNet . It’s limited to one channel but should be adequate for testing. For outgoing, look at http://www.anveodirect.com/ (US based) or https://www.voxbeam.com/ (UK based). Both offer a small credit at signup, so you can test without making a payment.

ok thanks for your analysys, you were very clear. For the moment is really difficoult for me to have this kind of test, cause this is a production environment and even if I try with another provider I cannot be sure of the results until I try with a massive number of incoming calls. For sure I will try to got a different ip phone model like a cisco or something else and recheck the rtp stream for differences. Only one question, as I already told you the internal calls and echo test have a really good quality but I have this strange delay that slowly
disappears, I mean for the first 20 seconds the echo test is perfect 1 on 1, then around 20sec a delay is inserted and around 50 seconds slowly disappears, do you think that this is related? and is there a possible fix? I’ve always thought that this was a poor quality cable problem, but honestly with the softphone this doesn’t happens, maybe these fanvils were a wrong choice…or at least are not so friendly with freepbx

But in your second post you implied that 70% of your external calls were bad. If ten consecutive calls to a test number on another provider are all good, the probability of that occurring by chance is only ~0.0006%.

Anyhow, you should at least get a capture of RTP from the trunk on a bad call, as there may be data implicating the provider or the network. Tests with MTR or other network tools may also be useful.

A sudden increase in delay, followed by a gradual decrease, is the normal response of the ‘adaptive jitter buffer’ in most VoIP phones to receiving one or more RTP packets after they were scheduled to be played. While it’s plausible that packets could be occasionally delayed by Asterisk or by your LAN, the fact that it always occurs ~20 seconds into the call leads me to believe that it’s a Fanvil bug. As a test, you could set up a second SIP account on one phone, pointing to the IP address of another. You could then call from one to the other, bypassing the PBX. With a small workgroup switch, you could also bypass the main LAN.

yes you are right this is good point of view, I will try with messagenet in the afternoon.

sorry can you clarify this? what do you mean with " pointing to the IP address of another"??

honestly as you already told from the beginnig, I think that this is a fanvil’s bugs related problem, the simply fact that problems disappers with cisco or x-lite can proove the bugs.
I wrote to the support reporting all of this post analysis and they wrote to me that will investigate about this.
I will never buy again fanvil, it wasn’t my choice unfortunately.

You set up a second SIP account on phone A, setting SIP Proxy Server Address to the IP address of phone B. Also, set Dial Without Registered. Phone B is of course not a server so registration will fail. However, it should still be possible to dial B’s number from the second account on A and get connected, without the PBX being involved at all. You can check latency by speaking into the mic on A and listening on B, or vice-versa. If you observe the same varying latency that your internal calls now show, you will have eliminated the PBX as a possible cause and demonstrated that it’s a Fanvil bug.

now I understand and it would be very helpful, unfortunately I think that this kind of feature is not available on fanvil, I tried to setup like this with no success.

where 192.168.25.173 is a phone in the same local network
meantime fanvil’s support asked to me 2 extension (through openvpn) to test on their lab.

Try putting 192.168.25.173 in the Server Name field (in addition to the SIP Proxy Server Address field).

Did you get the Messagenet trunk working?

unfortunately this doesn’t works even putting in the server name( i tried everythink). About messagenet I
just tried with same results, only a little better with g722 (ehiweb doesn’t support g722) but I think that at this point it’s clear that is fanvil poor quality firmware related problem, I just tried with an old cisco SPA514G and there is no delay in the echo test after 30 seconds and voice quality is really better, unfortunately I have only one of these cisco so I can’t test internal calls. anyway I want to really thank you for your support and explanations.
regards

Unless these Fanvil guys are real idiots, transmission is probably ok. So on an internal call between Cisco and Fanvil, what you hear on the Cisco is likely realistic.

Some thoughts on possibly salvaging the existing phones:

Configure a Fanvil to register to Messagenet directly. (Remove the Messagenet trunk on the PBX so it doesn’t hijack the calls.) Check incoming call quality. If it’s much better, my theory about sequence numbers is probably correct; there may be a way to get Asterisk to “clean up” the incoming RTP so the Fanvil will render it reasonably well.

You may want to investigate the packet loss in more detail. If caused by your router or LAN, it may be easy to fix. If there are errors on your fiber line, you may be able to get the ISP to fix it.

I tried a ping test to voip.vivavox.it (from Paris) and didn’t see a single packet lost:

Ping statistics for 83.211.227.21:
    Packets: Sent = 2771, Received = 2771, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 18ms, Maximum = 37ms, Average = 19ms

After days of researches I made some new tests

  • tried to isolate 2 fanvils directly connected to a freepbx desktop through a dedicated switch
  • tried to isolate 2 fanvils directly connected to a 3CX desktop through a dedicated switch
  • tried to isolate 2 fanvil directly connected to a freeswitch desktop through a dedicated switch

3 different technologies with dedicated hardware, I Always have the following problem, exactly after 30s ping from PBX to Fanvil increase from 0.5ms to 240ms
InkedCattura_LI

and obviuosly an Annoying delay will be inserted in the call. If that call is beetween internals will simply preserves the voice quality and slowly resynchs after some minutes, if the call is external will affect voice quality. This ping delay is happening Always in several networks and different pbx technologyes as I already said, and is never happening with any other kind of endpoint. Fanvil don’t want to recognize this bug and tells me that is related to my network setup!!! even with screens of the ping. I will never buy again fanvil phones, this is obviously related to a firmware bug.

For sure this would be a Fanvil issue it appears. Have you put in a different phone on your same network to test. As someone who oversaw our phones and the development I can tell you it’s very complicated and phone CPUs are never enough horsepower and always trying to balance out CPU resources.

yes I think that is cpu related too…but I think that if this happens exactly after 30 seconds, then bust be there something happening after these 30 seconds that overloads the cpu, like a keep alive process or something similar. Unfortunately a wasted a lot of time searching for an appropriate settings with no results… only the keep alive = 30s seems to afftect a little the delay, so a set keep alive = 60s and the delay is a little less(ping from 400-200ms to 150ms)

I tried to syslog the fanvil to my log server with level critical/error and exactly after 30 seconds I got these errors

06-07-2018	17:16:39	User.Error	192.168.25.176	[SIP]  | ERROR  | osip_ua_find:  192.168.25.200 <>  192.168.25.176 
06-07-2018	17:16:39	User.Error	192.168.25.176	[SIP]  | ERROR  | osip_ua_find:  235 <>  235 
06-07-2018	17:16:39	User.Error	192.168.25.176	[SIP]  | ERROR  | osip_ua_find:  192.168.25.200 <>  192.168.25.176 
06-07-2018	17:16:39	User.Error	192.168.25.176	[SIP]  | ERROR  | osip_ua_find:  235 <>  235 
06-07-2018	17:16:39	User.Error	192.168.25.176	[SIP]  | ERROR  | osip_ua_find:  192.168.25.200 <>  192.168.25.176 
06-07-2018	17:16:08	User.Error	192.168.25.176	[SIP]  | ERROR  | The rfc2327 says there should be at least an email or a phone header!- anyway, we don't mind about it.
06-07-2018	17:16:08	User.Error	192.168.25.176	[SIP]  | ERROR  | The content-type is not explicitely set.
06-07-2018	17:16:07	User.Critical	192.168.25.176	[SIP]  | BUG    | Sending ACK!
06-07-2018	17:16:07	User.Error	192.168.25.176	[SIP]  | ERROR  | The rfc2327 says there should be at least an email or a phone header!- anyway, we don't mind about it.
06-07-2018	17:16:07	User.Emerg	192.168.25.176	[MGR]  | FATAL  | >chan 0 gets the key=end down,status=5
06-07-2018	17:16:07	User.Emerg	192.168.25.176	[MGR]  | FATAL  | >chan 0 gets the key=* down,status=5

what is this osip_ua_find: 192.168.25.200 <> 192.168.25.176 error? I think that is related to the qualify event… maybe there is a way to fix this…

I FOUND IT!!!
disabling call history from the webui fixed completely the problem, no more lag or bad voice quality… I think this is a cpu overload due the bad management of cdr. Fanvil have to fix this! call history is very usefull!

3 Likes