Bug: ssl error with postgres and celery

Wed Nov 4 11:30:06 UTC 2020

El mié., 4 nov. 2020 a las 11:36, Valentin Kleibel
(<valentin at vrvis.at>) escribió:
>
> Hi Thomas,
>
> > I don't have answers but several questions.
> >
> > Can you say more about how you set up the database connection? Do you do
> > anything special here? Why is it different with SSL than without?
> About the setup:
> the pg_hba.conf for the database contains the line:
> hostssl all         all         1.2.3.0/24          md5
> which enforces the automatic and transparent use of ssl connections to
> our database server (that is in the subnet 1.2.3.0/16). To reproduce
> this on a local database you could set:
> hostssl all         all         127.0.0.1/32          md5
> to use ssl with it.
> I'm sure of one difference: with SSL a connection reused by a forked
> process will for sure lead to an error because the connection is
> stateful and will, for security reasons, fail if reused.
>
> > As far as I can see you would just need to give some extra parameters in
> > the sqlalchemy.url setting in your ini file.
>
> We were looking through all documented sqlalchemy.url settings and
> couldn't find anything of use here.

See for example the description here:
https://github.com/sqlalchemy/sqlalchemy/issues/4146
and a reference here:
https://www.postgresql.org/docs/10/libpq-connect.html#LIBPQ-CONNSTRING

As far as I understand it, you should be able to add ?sslmode=verify-full
to the database connection string, possible with references to
certificate or key files.

>
>
> > But your mail seems to suggest that you have a connection created upfront
> > and would like this one to be reused?
>
> We do not want to reuse a previously created connection but this is how
> the celery workers seem to behave.
>
> > Which are the uwsgi-related settings in your ini file? The 'lazy=true'
> > setting is not a choice for Kallithea and should be enabled in any case.
> > Our uwsgi template for ini files gives this comment as clarification for
> > 'lazy=true': "App *must* be loaded in workers - db connections can't be
> > shared"
>
> The only related setting is lazy=true. As you are saying, the database
> connections can't be shared. this can be achieved in multiple ways and
> one of them (although maybe not the most efficient one) is to use the
> lazy setting in uwsgi.
> access through uwsgi does work without issues in our setup.

My understanding was that Kallithea creates the db connection in each
worker anyway, and that it shouldn't matter whether SSL is used or
not.

But if I understand things correctly, celery may have multiple
workers, and each worker may create additional forks. If the db is set
up for each worker, it would still be shared between the forks. Is
that matching your understanding and situation?

I still wonder where the impact of SSL is. This sounds like a generic
problem, but perhaps it works most of the time if not using SSL, and
so we just don't see it?

>
>
> > Did you use Kallithea in this mode successfully before, perhaps on an older
> > version? Your mail seems to suggest that but I'm not sure.
>
> Yes, we did use Kallithea before successfully with this setup, with
> version 0.4.1. We think the problem arises with the upgrade due to the
> way celery workers seem to behave with the new prefork worker model.
> This change came with the upgrade of  celery to version 4.

As far as I can see, prefork was also used in celery 3.
Could it be that other changes make the difference? For example, I see:

https://docs.celeryproject.org/en/stable/history/whatsnew-4.0.html?highlight=prefork#ofair-is-now-the-default-scheduling-strategy

"-Ofair is now the default scheduling strategy

To re-enable the default behavior in 3.1 use the -Ofast command-line option.
..."

>
>
> > Note: I found this related question:
> > https://stackoverflow.com/questions/51466007/how-to-use-psycopg2-properly-in-a-prefork-celery-environment
> > but it does not involve sqlalchemy and does not really give immediate
> > solutions.
>
> While looking around for solutions we found this blogpost:
> https://virtualandy.wordpress.com/2019/09/04/a-fix-for-operationalerror-psycopg2-operationalerror-ssl-error-decryption-failed-or-bad-record-mac/
> Which is about uwsgi but also describes an alternative solution to
> setting lazy=true which is to call engine.dispose in order to make sure
> a new connection is used.
> This is the same solution we also found in the sqalchemy documentation
> in the section about connection pools use with mutiprocessing or forked
> processes:
> https://docs.sqlalchemy.org/en/13/core/pooling.html#using-connection-pools-with-multiprocessing-or-os-fork
>
> We implemented this in kallithea/lib/celerylib/__init__.py for the
> celery workers.
> Attached is a diff for the current stable release which works well in
> our setup, although we are not sure if it has any unwanted side effects
> we just didn't encounter by now.

Thanks for the patch and links.

So with that patch, you no longer see issues and you can have
'celery.worker_concurrency' set higher than 1 ?

I would like to await the analysis of Mads Kiilerich on this.

Thanks,
THomas