TL;DR: NGINX was not launching correctly. Since no logs were being written by the process, had to use strace to debug what was going on.
There was a weird thing going on with one of our NGINX servers. The sequence of events was like this:
1. Our server rebooted (after many, many months of uptime).
2. After reboot, NGINX was running but not responding to requests
Even if I curled localhost like this, nothing happened:
root@amy:/tmp# curl -v http://localhost
* Rebuilt URL to: http://localhost/
* Hostname was NOT found in DNS cache
* Trying ::1...
* Connected to localhost (::1) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: localhost
> Accept: */*
>
3. Checked error and access logs. Nothing was being written in logs after reboot
That was weird...
4. Did a ps to check if process was running at all. And it was. But realized that no NGINX workers were spawned after launch. How come?
* Connection #0 to host localhost left intactroot@amy:/tmp# ps aux | grep nginxroot 880 0.0 0.1 43424 5968 ? Ss 08:40 0:00 nginx: master process /usr/local/nginx/sbin/nginx -g daemon on; master_process on;root@amy:/tmp#
Dec 31 08:40:11 amy systemd[1]: Stopping A high performance web server and a reverse proxy server...Dec 31 08:40:11 amy systemd[1]: Stopped A high performance web server and a reverse proxy server.Dec 31 08:40:15 amy systemd[1]: Starting A high performance web server and a reverse proxy server...Dec 31 08:40:15 amy systemd[1]: Started A high performance web server and a reverse proxy server.
Launched strace attaching it to NGINX's using its PID.
# strace -p 513 -s 10000 -v -f
On a different terminal reloaded NGINX
# systemctl reload nginx
Then strace's output gave me the reason no workers were being spawned.
[pid 844] prctl(PR_SET_DUMPABLE, 1) = 0
[pid 844] chdir("/tmp/cores") = -1 ENOENT (No such file or directory)
[pid 844] write(16, "2021/12/31 08:38:08 [alert] 844#0: chdir(\"/tmp/cores\") failed (2: No such file or directory)\n", 93) = 93
[pid 846] fstat(20, <unfinished ...>
[pid 844] exit_group(2) = ?
[pid 844] +++ exited with 2 +++
7. Turned out I directory I configured a long time ago to evaluate a SIGSEGV I was having, was deleted on reboot so workers were failing to spawn. After that, created the directory again and NGINX was responding to my requests once again.
====
End of (sad) story. Half an hour I will never get back.
Happy New Year!
