db-create with PostgreSQL fails with user "user"

Fri May 15 20:25:46 UTC 2020

On 5/15/20 9:35 PM, Louis Bertrand wrote:
> I chose PostgreSQL because a) it's already installed on the eventual 
> target server and b) I'm familiar with it. However, where is the 
> trade-off between SQLite and PostgreSQL? Tens, hundreds, thousands? 
> Number of users, transactions? Etc. In other words, I might have saved 
> myself some trouble by simply accepting the default for a small site. 

One advantage of supporting multiple databases is that we don't have to 
choose on behalf of someone else ;-)

Also, I guess the load of a "user" might vary with a factor 100 
depending on how the system is used. I guess a system with 1000 quite 
active users easily can have less than 1 write operation per second and 
a manageable amount of read operations and work fine with a number of 
worker processes on a single server. In that case I guess sqlite would 
work fine.

I think the biggest difference is in the operations. If a DBA is running 
the system, he might want a "real" database. There might be more 
confidence in how PostgreSQL handle scaling and system failures and 
allow more powerful integrations with other systems. It is very hard to 
lose critical data in a DVCS hosting system. The important data are in 
the repos directly in the file system - not in the database. And worst 
case, all the clients will still have the data they pushed ... and other 
clients might already have pulled them.

We could perhaps put some generic advice in the docs:

Start out with a single server with sqlite and reliable local storage 
(with regular backup of storage and database). If you need failover, 
make it cold. If users experience delays while the server load is low 
(especially if the workers are busy serving repo data), add more worker 
processes. If the server load is high, consider adding more CPU or 
memory if feasible. If network load is the bottleneck, try solving that. 
Also consider offloading buildfarm load to a simpler or separate system 
or make it "smarter" to decrease the load.

Then, if necessary or for other reasons desirable, scale up to use 
something like PostgreSQL (possibly on a separate HA system/cluster ... 
and potentially higher latency), shared storage (network or SAN), and 
multiple (physical) worker servers.

While different existing setups might have met different limitations, 
does any existing users have any comments or advice to add to this?

/Mads