Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

504 Gateway Timeout - S3User Config

I am getting Error 504 - Gateway timeout when trying to view/edit S3 Users.

It seems that every time this page loads, its getting active data from radosgw-admin commands.
Since I have over 110M objects in S3 it takes about 2.5 minutes to load.
I can see the commands to get user stats traversing all users.
I am guessing its timing out waiting on that command.

I have fixed is by change the /etc/nginx/sites-enabled/petasan_admin proxy timeouts for server on port 443.
It is now:

 

server {
listen 443 ssl;
server_name 10.1.7.182;
ssl_certificate /opt/petasan/config/certificates/server.crt;
ssl_certificate_key /opt/petasan/config/certificates/server.key;
location /grafana/ {
proxy_pass http://stats/;
proxy_connect_timeout 5s;
proxy_send_timeout 5s;
proxy_read_timeout 5s;
}
location / {
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_redirect http://$http_host/ https://$http_host/;
proxy_pass http://127.0.0.1:5002;
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
}
}

 

Perhaps a better way is to load user stats via cron into a s3stats file and perhaps have a load interval.
Or don't show that on the front page and only load stats in the user config or info context menu.
The default 60 second timeouts would work for most of our S3 Users. 120 seconds would be better, but only if we are loading single user stats.

Its late here and I hope this make sense.

Do you have a multisite setup ?

How many users do you have ?

This is not a multisite setup.

We have 15 now and adding 2 more in the next week or 2.
When I was testing, I had up to 20ish, but very small amount of object.
By the time we get done, we are going to have about 160 to 200 million objects.

This will grow more over time and I might have to adjust the timeout for this page to load all the stats (size and object count) if this remains the same.

Can you try the following

Open the file :
/usr/lib/python3/dist-packages/PetaSAN/core/ceph/api.py

Find a method with name ---> get_rgw_user_stats

In the body of the method , find the following line :
cmd = "radosgw-admin user stats --uid={} --sync-stats".format(id)

Remove the option "--sync-stats" from this line.

Save and close the file.

On the first 3 nodes, restart management service

systemctl restart petasan-admin

I have made the change.

The page now loads in ~38 seconds.
That is about 2 minutes faster and should not timeout.

What is the data refresh rate in this scenerio?

Very Good. The refresh rate is within 3 min, it is define by rgw_user_quota_bucket_sync_interval.

Can you manually run the command for a specific user id:
radosgw-admin user stats --uid=XX --sync-stats
and see roughly how long it takes

Are you using SSD, HDD, mix ?

We have a mix of HDD and SSD. Only the S3 Data is on the HDD in a EC32 pool.
The metadata and index is on a SSD pool.
I ran the 3 largest buckets (40 million, 35 million, and 22 million object) with and without the --sync-stats from the command line.

The results are they run in about the same 5 seconds each.
I have now 16 buckets, so it should run in 80 seconds or something close to it.
That is still longer than the default 60 seconds.

I also cannot get near 38 seconds, might have been an anomaly.
I have run it on all 3 nodes multiple times and each is at 130 seconds to load.
Still a lot better than the 210 seconds with --sync-stats in there.