Version 13 Upgrade Problems

arakhm · September 28, 2015, 9:24pm

I applied the convertion script to the DB “asteriskcdrdb”. Is this right DB for CDR records?

tm1000 · September 28, 2015, 11:39pm

Well. Yes. However Asterisk itself doesn’t support those characters so it won’t help you.

Marbled · September 28, 2015, 11:45pm

It looks like there’s a problem with displaying UTF-8 data from those tables…

I am not sure why I didn’t spot it before…

The data is properly encoded but it tries to display it as ISO-8859-1 as far as I can tell…

Andrew, even if Asterisk itself doesn’t quite comprehend this couldn’t you for this to be shown as UTF-8? The data seems to be there and properly encoded but it’s processed as ISO–8859-1 apparently…

Have a nice day!

Nick

Marbled · September 29, 2015, 12:47am

OK, what seems to happen is that they are encoded twice or something similar…

I tried the following character “ș” which is in Romanian and either needs UTF-8 or ISO-8859-16… (I could have tried Cyrillic but they were still some test entries left with that Romanian character and it was easier to start from there…)

In UTF-8 this character is encoded 0xC8 0x99 and this is what I see in the other tables.

In the CDR one though I see 0xC3 0x88 0xE2 0x84 0xA2…

What it seems to do is think that 0xC8 0x99 are two different characters and that they are in ISO-8859-1…

0xC8 = È
0x99 = ??? That’s not valid ISO-8859-1, that’s actually Windows 1252 which is a non-standard superset of ISO-8859-1 invented by Microsoft… It represents the character ™.

That È is later converted from ISO-8859-1 to UTF-8 so it gives 0xC3 0x88.
That ™ is later converted from ISO-8859-1 to UTF-8 so it gives 0xE2 0x84 0xA2.

So that explains that weird 0xC3 0x88 0xE2 0x84 0xA2 sequence instead of 0xC8 0x99… The string is actually encoded twice…

As to where this is done, I can’t say…

A workaround I could see is that if the Cyrillic entries could somehow be transliterated (converted from Cyrillic to latin characters) before hitting the CDR db that could be a way to deal with this but I do not know if it’s doable and there is actually more than one way to transliterate (and it’s most likely different from one language that uses Cyrillic to the other and there are more than one language that use Cyrillic).

Andrew, anything that comes to mind as to the reason of this double encoding?

Have a nice day!

Nick

tm1000 · September 29, 2015, 1:31am

For cdr entries that is all asterisk. Freepbx does nothing. If asterisk is inputting bad data then it is as I previously stated. Asterisk does not support utf8

Marbled · September 29, 2015, 2:15am

Andrew, is this still true?

What is the following commit supposed to address exactly?
http://svnview.digium.com/svn/asterisk?view=revision&revision=358437

Thank you and have a nice day!

Nick

tm1000 · September 29, 2015, 2:51am

That commit doesn’t mean utf8 works. I just means if the table is set to utf8 asterisk won’t crash.

Unless you can prove otherwise I still think asterisk has issues with utf8. If it doesn’t then great but remember that freepbx is not a middle man in CDRs. Asterisk writes directly to the database with freepbx.

arakhm · September 29, 2015, 6:31am

Andrew, do you think this CDR’s issue might go away if I switch back to Asterisk 11 from Asterisk 13?

tm1000 · September 29, 2015, 6:36am

Asterisk has never supported UTF8 in cdrs. So no.

arakhm · September 29, 2015, 7:10am

So how could it work before my upgrade to Asterisk13/FreePBX13?

Marbled · September 29, 2015, 7:16am

Hi!

I was told by an Asterisk dev that it’s not supported by them but certain combination of databases in certain conditions could support UTF-8 but that they made no effort to make this work (I asked around on their IRC channel earlier today but went to sleep after).

I wonder if keeping the database in ISO-8859-1/Latin 1 while at the same time somehow treating what is inside the database as something else (which older FreePBX somehow did) might not be what worked out in the past…

It looks like it’s having the database in UTF-8 that could be the cause of that reencoding and that Asterisk dev was pretty sure they weren’t doing any character manipulation of any kind…

I can see possible solutions to this problem but they all require some dev of some kind I believe…

Have a nice day!

Nick

arakhm · September 29, 2015, 7:32am

Thank you guys but I’m giving up
Will wait for the official release of FreePBX 13 and look how fresh setup will handle Сyrillic characters.

Marbled · September 29, 2015, 8:39am

Andrew, there might be a way to handle this without doing too much of an hack…

If the information that get’s copied into the CDR database if first encoded using base64 then decode for the CDR reports Asterisk would not actually see the UTF-8 characters…

If that actually work then UTF-8 support in FreePBX would be even more complete without even having to change anything on the Asterisk side of things…

I do not know why CID Superfecta does is but there is already CIDSFSCHEME which is encoded in base64…

Sure we can’t store as many characters in the field after that but is the length of those database fields actually important or can we make them larger?

Thank you and have a nice day!

Nick

tm1000 · September 29, 2015, 3:34pm

You seem to be missing the fact that freepbx does not insert into the cdr table. Asterisk does directly. Freepbx can’t do anything here.

Marbled · September 29, 2015, 6:10pm

But it insert information provided by FreePBX or something like CID Superfecta…

If the fields Asterisk insert in the CDR database (it looks like they are passed to Set’s that assign cdr(cnam) and other similar variables) are encoded using base64 won’t Asterisk blindly use them and insert them in the database?

Could this work?

The CDR report would only need to deencode before producing the report after…

It’s only an idea, maybe I am completely wrong about this…

Thank you and have a nice day!

Nick

tm1000 · September 29, 2015, 7:17pm

No. FreePBX (CID Superfecta) sets the CID(name) variable in Asterisk, Asterisk then writes this to the database. NOT FreePBX.

Sure. That could work. I’m not going to spend the time doing that though. Firstly because it’ll piss off everyone who is using CDR outside of FreePBX, which many do.

tm1000 · September 29, 2015, 8:15pm

Here is a proposed solution: http://issues.freepbx.org/browse/FREEPBX-10392

Let me know if it works for you guys and I will then use it

Marbled · September 29, 2015, 11:52pm

Лол (Lol)…

I tried

this: http://blog.oneiroi.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connections/
and this: http://stackoverflow.com/questions/3513773/change-mysql-default-character-set-to-utf-8-in-my-cnf

and a few others I have lost the links to (one of which was in cdr_adaptive_odbc.conf)

but I did not stumble on that fix…

It works wonderfully for me both with my test entries and another one I did expressly for Andrey (arakhm)

2015-09-29  18:58:28  1443567508.3  "Привет "<200>  <5555551212>  Dial  7775551212  ANSWERED  00:13

Привет (Privet but pronounced priviett is the casual way to say “hi” in Russian which I am not sure if it’s what Andrey speaks but these are Cyrillic letters so it’s as good a test as any…).

I was going to suggest other things which could be done but as far as I am concerned this it the best fix of them all…

The Asterisk guy I talked to said certain combination of databases in certain conditions could support UTF-8 and with this fix the distro seems to meet these conditions and is a relatively controlled environment so it’s a lot easier to make sure it stays that way…

Tstar did a very nice find… I was searching for something like this yesterday but never found this…

Thank you and have a nice day!

Nick

Marbled · October 3, 2015, 5:49pm

OK, I spotted one problem but it’s most likely fixable considering the info I believe comes from the CDR table and the CDR reports work wonderfully now…

UCP (which I never used before today) is unable to display properly encoded UTF-8 but is very happy to display double-encoded UTF-8 (what I described above) for the CDR/CEL stuff…

Properly encoded UTF-8 (which is correctly shown in the CDR reports) is replaced by question marks and the double encoded stuff displays correctly…

Whatever recipe was used to make the CDR reports work with UTF-8 (I mean before Tstar 's fix) should most likely be used with UCP…

Thank you and have a nice day!

Nick

arakhm · October 3, 2015, 8:35pm

What a simple sulution! That short string works perfectly.
Thanks.