SV: SV: Bug in MySQL code?
Mads Kiilerich
mads at kiilerich.com
Thu Dec 24 04:33:30 UTC 2015
On 11/27/2015 01:33 PM, Dominik Ruf wrote:
>
> Great.
> BTW I made another test and it seems the key thing is charset=utf8.
>
TLDR: Lars is right that a default Kallithea installation on MySQL
stores utf-8 in the database instead of storing unicode and letting the
database deal with the encoding. I was also right that it generally
works fine anyway. ;-)
I also tested (with Fedora, mariadb and mysql-python). I tested by
creating a new database, changing the admin users name to blåbærgrød,
creating a blåbærgrød repository, and inspecting database and file
system content.
Everything worked flawlessly with the default mysql url. Only with the
caveat that it stores utf-8 in the database. Sqlalchemy will however
encode and decode it consistently so everything just works ... but I
guess collation order and other "details" might be wrong and direct
database hacking will be tricky - as Lars found out the hard way in the
initial post.
I agree that
sqlalchemy.db1.url =
mysql://kallithea:foobar@localhost/kallithea?charset=utf8
seems to be the right "solution". It works and the database content is
as expected. (Except that this however apparently not is fully unicode
compliant and it would be better to use utf8mb4 ...)
I don't know the root cause of the weirdness. It might be some (old and
fixed?) MySQL deficiencies and workarounds in SqlAlchemy ... or
something in Kallithea that triggers it. I guess it could be the
combination of mysql not being unicode compliant by default and
convert_unicode thus triggering the unnecessary utf8 encoding.
(http://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine.params.encoding
could also seem to play a role ... but probably only relevant for
understanding.)
I guess we should change the default mysql uri in the .ini files to use
charset=utf8?
Each table already specifies mysql_charset utf8 ... but that is
apparently for something else?
We should probably also improve the documentation to give some advice of
which "DBAPI" to use. Any recommendations?
I guess we also should get rid of all the explicit convert_unicode in
db.py and .ini and just use Unicode and UnicodeText fields.
Changes in this area could however cause pain for installations that
happily are using mysql with double encoding.
/Mads
|
|||
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sfconservancy.org/pipermail/kallithea-general/attachments/20151224/8cab655b/attachment.html>
More information about the kallithea-general
mailing list