SV: SV: Bug in MySQL code?

Mads Kiilerich mads at kiilerich.com
Thu Dec 24 04:33:30 UTC 2015


On 11/27/2015 01:33 PM, Dominik Ruf wrote:
>
> Great.
> BTW I made another test and it seems the key thing is charset=utf8.
>

TLDR: Lars is right that a default Kallithea installation on MySQL 
stores utf-8 in the database instead of storing unicode and letting the 
database deal with the encoding. I was also right that it generally 
works fine anyway. ;-)

I also tested (with Fedora, mariadb and mysql-python). I tested by 
creating a new database, changing the admin users name to blåbærgrød, 
creating a blåbærgrød repository, and inspecting database and file 
system content.

Everything worked flawlessly with the default mysql url. Only with the 
caveat that it stores utf-8 in the database. Sqlalchemy will however 
encode and decode it consistently so everything just works ... but I 
guess collation order and other "details" might be wrong and direct 
database hacking will be tricky - as Lars found out the hard way in the 
initial post.

I agree that
sqlalchemy.db1.url = 
mysql://kallithea:foobar@localhost/kallithea?charset=utf8
seems to be the right "solution". It works and the database content is 
as expected. (Except that this however apparently not is fully unicode 
compliant and it would be better to use utf8mb4 ...)

I don't know the root cause of the weirdness. It might be some (old and 
fixed?) MySQL deficiencies and workarounds in SqlAlchemy ... or 
something in Kallithea that triggers it. I guess it could be the 
combination of mysql not being unicode compliant by default and 
convert_unicode thus triggering the unnecessary utf8 encoding. 
(http://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine.params.encoding 
could also seem to play a role ... but probably only relevant for 
understanding.)

I guess we should change the default mysql uri in the .ini files to use 
charset=utf8?

Each table already specifies mysql_charset utf8 ... but that is 
apparently for something else?

We should probably also improve the documentation to give some advice of 
which "DBAPI" to use. Any recommendations?

I guess we also should get rid of all the explicit convert_unicode in 
db.py and .ini and just use Unicode and UnicodeText fields.

Changes in this area could however cause pain for installations that 
happily are using mysql with double encoding.

/Mads
|
|||
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sfconservancy.org/pipermail/kallithea-general/attachments/20151224/8cab655b/attachment.html>


More information about the kallithea-general mailing list