S/W issues: Apache 2.4/mod_fcgid

During a recent hardening/maintenance session,  the httpd was upgraded to the 2.4 version, which initially produced lots of php segfaults, most probably  b/c content and configs were not sufficiently adjusted by the admin(s) beforehand.

After some config + content adjustments during a 2nd maintenance session, the setup ran quite well, but during periods of high load, lots of different error messages were shown. Different types that kept our attention were:

[Sun Mar 05 23:39:40.008981 2017] [fcgid:emerg] [pid 6608:tid 139850686646016] (35)Resource deadlock avoided: [client x.x.x.x:48916] mod_fcgid: can't get pipe mutex
[Sun Mar 05 23:43:43.343519 2017] [fcgid:warn] [pid 7829:tid 139850577540864] (104)Connection reset by peer: [client x.x.x.x:49693] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer: ....
[Mon Mar 06 00:03:53.432715 2017] [fcgid:emerg] [pid 8618:tid 139850669860608] (35)Resource deadlock avoided: [client x.x.x.x:47173] mod_fcgid: can't lock process table in pid 8618, referer: ....

Also, the server had at least 20 php zombie processes running which cannot get killed but exhaust ressources and pile up:

1745 ?        Z      0:06 [php-cgi] <defunct> 
1753 ?        Z      0:06 [php-cgi] <defunct> 
3340 ?        Z      0:09 [php-cgi] <defunct> 
3341 ?        Z      0:09 [php-cgi] <defunct> 
3509 ?        Z      0:02 [php-cgi] <defunct>

So,  it seemed to have something to do w/ the way mod_fcgi starts php and locks underlying processes.

After a little research here and there, a checkout of how the system is configured gave us:

user@host:/opt/apache2/conf# grep -i mutex */*
extra/httpd-ssl.conf:SSLMutex "file:/opt/httpd-2.2.29/logs/ssl_mutex"
extra/httpd-ssl-domain.conf:Mutex file:/opt/apache2/logs/ssl_mutex

Hmm. One config even points to the wrong directory, and both use the “file” fcntl locking mechanism which seems to initially cause the errors.

Possible solutions would be to use

Mutex flock:${APACHE_LOCK_DIR} default

or as recommended

Mutex sem
SSLMutex sem

instead. Also, if the “sem” config-switch produces new errors, it may be b/c the kernels semaphore arrays are limited and should be extended by

sysctl -w kernel.sem="250 32000 32 1024"

So, just to be on the safe side, let’s do both, wait for a good timeslot to maintain, and reload the new config by service apache reload or via init which uses apachectl anyways:

/etc/init.d/apache reload

Great, all php zombie processes also vanished:

www-data 16178 0.9 0.2 95092 21788 ? S 02:41 0:00 /opt/php5/bin/php-cgi -c /opt/apache2/conf
www-data 16505 1.8 0.2 95100 21628 ? S 02:41 0:01 /opt/php5/bin/php-cgi -c /opt/apache2/conf
www-data 16685 3.9 0.2 93856 20480 ? S 02:41 0:03 /opt/php5/bin/php-cgi -c /opt/apache2/conf
www-data 16819 3.7 0.2 95104 21448 ? S 02:41 0:03 /opt/php5/bin/php-cgi -c /opt/apache2/conf
www-data 16846 1.7 0.2 95364 21692 ? S 02:41 0:01 /opt/php5/bin/php-cgi -c /opt/apache2/conf

The rest now seems to work as well, no more error messages thrown so far! Wait – after a while, one error reoccurs, and this very last section should be pretty self-explanatory:

#soll wohl auf 500 stehen
# d1g 060317
#FcgidMaxRequestsPerProcess 500
FcgidMaxRequestsPerProcess 0