replication job: is it working ?
Ste
125 Posts
October 12, 2022, 5:33 pmQuote from Ste on October 12, 2022, 5:33 pmHi all,
I configured a replication job between two PetaSAN cluster in the same LAN, the job start at the scheduled time, but i does not seem to do anything, as the throughput graph sticks to 0 (zero) bs on the destination cluster. Here is the log:
2022-10-12 18:00:27 - Job 00001 | Job instance (2022-10-12 18:00:01) | Executing replication job.
2022-10-12 18:00:23 - Job 00001 | Job instance (2022-10-12 18:00:01) | Creating new snapshot at source disk.
2022-10-12 18:00:07 - Job 00001 | Job instance (2022-10-12 18:00:01) | Source disk has no snapshots , start clearing destination disk.
2022-10-12 18:00:07 - Job 00001 | Job instance (2022-10-12 18:00:01) | Started.
2022-10-12 18:00:07 - Job 00001 | Getting source disk snapshots list.
2022-10-12 18:00:05 - Job 00001 | Getting destination disk snapshots list.
2022-10-12 18:00:04 - Job 00001 | Getting metadata of source disk.
2022-10-12 18:00:02 - Job 00001 | Getting metadata of destination disk.
2022-10-12 18:00:01 - Job 00001 | Stopping destination disk.
2022-10-12 18:00:01 - Job 00001 | Run replication job ...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-10-12 17:04:24 - Job 00001 has been updated.
2022-10-12 16:58:31 - Job 00001 has been updated.
2022-10-12 16:56:30 - Job 00001 has been started.
2022-10-12 16:56:28 - Job 00001 has been created.
Am i missing something ? Maybe it is taking long time to do the snapshot (10 TB) on the source cluster ?
Thanks, Ste
Hi all,
I configured a replication job between two PetaSAN cluster in the same LAN, the job start at the scheduled time, but i does not seem to do anything, as the throughput graph sticks to 0 (zero) bs on the destination cluster. Here is the log:
2022-10-12 18:00:27 - Job 00001 | Job instance (2022-10-12 18:00:01) | Executing replication job.
2022-10-12 18:00:23 - Job 00001 | Job instance (2022-10-12 18:00:01) | Creating new snapshot at source disk.
2022-10-12 18:00:07 - Job 00001 | Job instance (2022-10-12 18:00:01) | Source disk has no snapshots , start clearing destination disk.
2022-10-12 18:00:07 - Job 00001 | Job instance (2022-10-12 18:00:01) | Started.
2022-10-12 18:00:07 - Job 00001 | Getting source disk snapshots list.
2022-10-12 18:00:05 - Job 00001 | Getting destination disk snapshots list.
2022-10-12 18:00:04 - Job 00001 | Getting metadata of source disk.
2022-10-12 18:00:02 - Job 00001 | Getting metadata of destination disk.
2022-10-12 18:00:01 - Job 00001 | Stopping destination disk.
2022-10-12 18:00:01 - Job 00001 | Run replication job ...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-10-12 17:04:24 - Job 00001 has been updated.
2022-10-12 16:58:31 - Job 00001 has been updated.
2022-10-12 16:56:30 - Job 00001 has been started.
2022-10-12 16:56:28 - Job 00001 has been created.
Am i missing something ? Maybe it is taking long time to do the snapshot (10 TB) on the source cluster ?
Thanks, Ste
Last edited on October 12, 2022, 5:35 pm by Ste · #1
admin
2,930 Posts
October 12, 2022, 10:16 pmQuote from admin on October 12, 2022, 10:16 pmCan you run
ps aux | grep rbd
on source cluster on node running the replication service as well as on destination cluster node that has destination ip
i assume you followed the replication guide to setup replication user..etc.
Can you run
ps aux | grep rbd
on source cluster on node running the replication service as well as on destination cluster node that has destination ip
i assume you followed the replication guide to setup replication user..etc.
Ste
125 Posts
October 13, 2022, 9:39 amQuote from Ste on October 13, 2022, 9:39 amHello, here is the output from the source node:
root 280584 0.0 0.0 6896 2308 ? S Oct12 0:00 /bin/bash -c rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_1.txt) | /opt/petasan/scripts/backups/pipe_reader.py /opt/petasan/config/replication/00001-1665590402_progress_uncomp.json | ssh -o StrictHostKeyChecking=no -i /opt/petasan/config/replication/00001-1665590402_1665590425_474326_sshkey.txt root@10.220.12.133 "tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_2.txt) | rbd import-diff - Solo_su_HDD/image-00003 -n client.root --keyring=/etc/ceph/ceph.client.root.keyring" 2> /opt/petasan/config/replication/00001-1665590402_progress_import.txt
root 280585 0.0 0.1 950836 83396 ? Sl Oct12 0:04 rbd export-diff rbd/image-00005@rep-snap-00001-00001 -
root 280589 0.0 0.0 6896 228 ? S Oct12 0:00 /bin/bash -c rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_1.txt) | /opt/petasan/scripts/backups/pipe_reader.py /opt/petasan/config/replication/00001-1665590402_progress_uncomp.json | ssh -o StrictHostKeyChecking=no -i /opt/petasan/config/replication/00001-1665590402_1665590425_474326_sshkey.txt root@10.220.12.133 "tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_2.txt) | rbd import-diff - Solo_su_HDD/image-00003 -n client.root --keyring=/etc/ceph/ceph.client.root.keyring" 2> /opt/petasan/config/replication/00001-1665590402_progress_import.txt
root 1124148 0.0 0.0 6436 656 pts/0 S+ 11:28 0:00 grep --color=auto rbd
And this is from the destination node:
root@petaIBIS03:~# ps aux | grep rbd
root 3523 0.0 0.0 0 0 ? S< Oct12 0:00 [rbd]
root 1058592 0.0 0.0 6436 2604 pts/0 S+ 11:29 0:00 grep --color=auto rbd
I followed the Admin guide vers. 1.8, completing all these steps (in this order):
- added backup/replication role to one node in source cluster and one on dest cluster
- added destination cluster -> connection test successful (root user)
- added replication job -> succesfully selected nodes and disks
- added a daily schedule
The job is started, but the log is now at the same point of yesterday (my first message here in this thread).
Thanks and bye, S.
Hello, here is the output from the source node:
root 280584 0.0 0.0 6896 2308 ? S Oct12 0:00 /bin/bash -c rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_1.txt) | /opt/petasan/scripts/backups/pipe_reader.py /opt/petasan/config/replication/00001-1665590402_progress_uncomp.json | ssh -o StrictHostKeyChecking=no -i /opt/petasan/config/replication/00001-1665590402_1665590425_474326_sshkey.txt root@10.220.12.133 "tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_2.txt) | rbd import-diff - Solo_su_HDD/image-00003 -n client.root --keyring=/etc/ceph/ceph.client.root.keyring" 2> /opt/petasan/config/replication/00001-1665590402_progress_import.txt
root 280585 0.0 0.1 950836 83396 ? Sl Oct12 0:04 rbd export-diff rbd/image-00005@rep-snap-00001-00001 -
root 280589 0.0 0.0 6896 228 ? S Oct12 0:00 /bin/bash -c rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_1.txt) | /opt/petasan/scripts/backups/pipe_reader.py /opt/petasan/config/replication/00001-1665590402_progress_uncomp.json | ssh -o StrictHostKeyChecking=no -i /opt/petasan/config/replication/00001-1665590402_1665590425_474326_sshkey.txt root@10.220.12.133 "tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_2.txt) | rbd import-diff - Solo_su_HDD/image-00003 -n client.root --keyring=/etc/ceph/ceph.client.root.keyring" 2> /opt/petasan/config/replication/00001-1665590402_progress_import.txt
root 1124148 0.0 0.0 6436 656 pts/0 S+ 11:28 0:00 grep --color=auto rbd
And this is from the destination node:
root@petaIBIS03:~# ps aux | grep rbd
root 3523 0.0 0.0 0 0 ? S< Oct12 0:00 [rbd]
root 1058592 0.0 0.0 6436 2604 pts/0 S+ 11:29 0:00 grep --color=auto rbd
I followed the Admin guide vers. 1.8, completing all these steps (in this order):
- added backup/replication role to one node in source cluster and one on dest cluster
- added destination cluster -> connection test successful (root user)
- added replication job -> succesfully selected nodes and disks
- added a daily schedule
The job is started, but the log is now at the same point of yesterday (my first message here in this thread).
Thanks and bye, S.
admin
2,930 Posts
October 13, 2022, 1:16 pmQuote from admin on October 13, 2022, 1:16 pmon source cluster replication node
1) Can you show output of
consul kv get PetaSAN/Replication/Active_Jobs/00001
cat /opt/petasan/config/replication/00001-1665590402_progress_import.txt
2) Do you see progress updating when you run:
rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | md5sum
else if it is stuck, add the following to /etc/ceph/ceph.conf
[client]
debug_rbd = 20
debug_rados = 20
log_file = /var/log/ceph/client_rbd.log
and rerun the command and check the log
on source cluster replication node
1) Can you show output of
consul kv get PetaSAN/Replication/Active_Jobs/00001
cat /opt/petasan/config/replication/00001-1665590402_progress_import.txt
2) Do you see progress updating when you run:
rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | md5sum
else if it is stuck, add the following to /etc/ceph/ceph.conf
[client]
debug_rbd = 20
debug_rados = 20
log_file = /var/log/ceph/client_rbd.log
and rerun the command and check the log
Last edited on October 13, 2022, 1:18 pm by admin · #4
Ste
125 Posts
October 13, 2022, 2:24 pmQuote from Ste on October 13, 2022, 2:24 pmPoint 1)
root@petasan03:~# consul kv get PetaSAN/Replication/Active_Jobs/00001
{
"job_id": "00001-1665590402",
"job_name": "prova-replica",
"node_name": "",
"uncompressed_transferred_bytes": "0.0 GB",
"uncompressed_transferred_rate": "0.0 MB/s",
"compressed_transferred_bytes": "",
"compressed_transferred_rate": "",
"compression_ratio": "",
"elapsed_time": "22:19",
"pid": "279341",
"progress": "",
"start_time": "2022-10-12 18:00:01"
}
root@petasan03:~# cat /opt/petasan/config/replication/00001-1665590402_progress_import.txt
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.root.keyring: (2) No such file or directory
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 AuthRegistry(0x55f08b4a1ea0) no keyring found at /etc/ceph/ceph.client.root.keyring, disabling cephx
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.root.keyring: (2) No such file or directory
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 AuthRegistry(0x7ffd531652a0) no keyring found at /etc/ceph/ceph.client.root.keyring, disabling cephx
rbd: couldn't connect to the cluster!
Point 2), the rbd command hangs also after adding the [client] section in /etc/ceph/ceph.conf, here is a tail of the log file:
root@petasan03:~# tail -50 /var/log/ceph/client_rbd.log
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a00008818
2022-10-13T16:21:11.996+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebfac58 finish_op
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000c2e0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b4 r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000c2e0 finish: rbd_data.b24cfb871d634b.00000000000004b4 r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007380 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007380 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec470 complete_request: cb=1, pending=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec470 finalize: r=4194304
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705ebff928
2022-10-13T16:21:11.996+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebec568 finish_op
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a000048a0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b6 r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a000048a0 finish: rbd_data.b24cfb871d634b.00000000000004b6 r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ca30 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ca30 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705eb9b700 complete_request: cb=1, pending=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705eb9b700 finalize: r=4194304
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705eba3908
2022-10-13T16:21:12.000+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705eb9b7f8 finish_op
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000b500 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b3 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000b500 finish: rbd_data.b24cfb871d634b.00000000000004b3 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000c1b0 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000c1b0 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec280 complete_request: cb=1, pending=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec280 finalize: r=4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000bd58
2022-10-13T16:21:12.008+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebec378 finish_op
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a00008790 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b5 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a00008790 finish: rbd_data.b24cfb871d634b.00000000000004b5 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007450 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007450 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ec00be0 complete_request: cb=1, pending=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ec00be0 finalize: r=4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000acc8
2022-10-13T16:21:12.008+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ec00cd8 finish_op
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000a120 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b0 r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000a120 finish: rbd_data.b24cfb871d634b.00000000000004b0 r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ab00 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ab00 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ea88890 complete_request: cb=1, pending=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ea88890 finalize: r=4194304
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000a618
2022-10-13T16:21:12.012+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ea88988 finish_op
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000bac0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004af r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000bac0 finish: rbd_data.b24cfb871d634b.00000000000004af r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00008e20 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00008e20 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebff7b0 complete_request: cb=1, pending=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebff7b0 finalize: r=4194304
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705ec00778
2022-10-13T16:21:12.020+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebff8a8 finish_op
Bye, S.
Point 1)
root@petasan03:~# consul kv get PetaSAN/Replication/Active_Jobs/00001
{
"job_id": "00001-1665590402",
"job_name": "prova-replica",
"node_name": "",
"uncompressed_transferred_bytes": "0.0 GB",
"uncompressed_transferred_rate": "0.0 MB/s",
"compressed_transferred_bytes": "",
"compressed_transferred_rate": "",
"compression_ratio": "",
"elapsed_time": "22:19",
"pid": "279341",
"progress": "",
"start_time": "2022-10-12 18:00:01"
}
root@petasan03:~# cat /opt/petasan/config/replication/00001-1665590402_progress_import.txt
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.root.keyring: (2) No such file or directory
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 AuthRegistry(0x55f08b4a1ea0) no keyring found at /etc/ceph/ceph.client.root.keyring, disabling cephx
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.root.keyring: (2) No such file or directory
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 AuthRegistry(0x7ffd531652a0) no keyring found at /etc/ceph/ceph.client.root.keyring, disabling cephx
rbd: couldn't connect to the cluster!
Point 2), the rbd command hangs also after adding the [client] section in /etc/ceph/ceph.conf, here is a tail of the log file:
root@petasan03:~# tail -50 /var/log/ceph/client_rbd.log
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a00008818
2022-10-13T16:21:11.996+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebfac58 finish_op
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000c2e0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b4 r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000c2e0 finish: rbd_data.b24cfb871d634b.00000000000004b4 r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007380 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007380 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec470 complete_request: cb=1, pending=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec470 finalize: r=4194304
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705ebff928
2022-10-13T16:21:11.996+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebec568 finish_op
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a000048a0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b6 r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a000048a0 finish: rbd_data.b24cfb871d634b.00000000000004b6 r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ca30 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ca30 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705eb9b700 complete_request: cb=1, pending=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705eb9b700 finalize: r=4194304
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705eba3908
2022-10-13T16:21:12.000+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705eb9b7f8 finish_op
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000b500 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b3 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000b500 finish: rbd_data.b24cfb871d634b.00000000000004b3 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000c1b0 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000c1b0 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec280 complete_request: cb=1, pending=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec280 finalize: r=4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000bd58
2022-10-13T16:21:12.008+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebec378 finish_op
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a00008790 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b5 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a00008790 finish: rbd_data.b24cfb871d634b.00000000000004b5 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007450 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007450 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ec00be0 complete_request: cb=1, pending=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ec00be0 finalize: r=4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000acc8
2022-10-13T16:21:12.008+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ec00cd8 finish_op
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000a120 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b0 r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000a120 finish: rbd_data.b24cfb871d634b.00000000000004b0 r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ab00 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ab00 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ea88890 complete_request: cb=1, pending=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ea88890 finalize: r=4194304
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000a618
2022-10-13T16:21:12.012+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ea88988 finish_op
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000bac0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004af r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000bac0 finish: rbd_data.b24cfb871d634b.00000000000004af r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00008e20 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00008e20 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebff7b0 complete_request: cb=1, pending=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebff7b0 finalize: r=4194304
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705ec00778
2022-10-13T16:21:12.020+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebff8a8 finish_op
Bye, S.
admin
2,930 Posts
October 13, 2022, 7:53 pmQuote from admin on October 13, 2022, 7:53 pm
- Is the cluster healthy in source and destination ?
- not sure if this causing an issue, but maybe using root as replication user could be a problem, it could be we create a key pair for this destination user which may conflict with existing source root keys..not sure but i would recommend you try creating a dedicated replication user.
- It is strange if the rbd export diff command at the source hangs, i am not sure if the tail log you showed is hung or it keeps adding, if so it maybe it is working but very slow. Does the progress stay at 0% even if you leave it for a few minutes ?
- Can you do a test with a small image say 100 GB in size and see if it is any better ? you can also try to run the rbd export-diff command on this smaller image, you can find the snapshot name from rbd snap ls command.
- Is the cluster healthy in source and destination ?
- not sure if this causing an issue, but maybe using root as replication user could be a problem, it could be we create a key pair for this destination user which may conflict with existing source root keys..not sure but i would recommend you try creating a dedicated replication user.
- It is strange if the rbd export diff command at the source hangs, i am not sure if the tail log you showed is hung or it keeps adding, if so it maybe it is working but very slow. Does the progress stay at 0% even if you leave it for a few minutes ?
- Can you do a test with a small image say 100 GB in size and see if it is any better ? you can also try to run the rbd export-diff command on this smaller image, you can find the snapshot name from rbd snap ls command.
replication job: is it working ?
Ste
125 Posts
Quote from Ste on October 12, 2022, 5:33 pmHi all,
I configured a replication job between two PetaSAN cluster in the same LAN, the job start at the scheduled time, but i does not seem to do anything, as the throughput graph sticks to 0 (zero) bs on the destination cluster. Here is the log:
2022-10-12 18:00:27 - Job 00001 | Job instance (2022-10-12 18:00:01) | Executing replication job.
2022-10-12 18:00:23 - Job 00001 | Job instance (2022-10-12 18:00:01) | Creating new snapshot at source disk.
2022-10-12 18:00:07 - Job 00001 | Job instance (2022-10-12 18:00:01) | Source disk has no snapshots , start clearing destination disk.
2022-10-12 18:00:07 - Job 00001 | Job instance (2022-10-12 18:00:01) | Started.
2022-10-12 18:00:07 - Job 00001 | Getting source disk snapshots list.
2022-10-12 18:00:05 - Job 00001 | Getting destination disk snapshots list.
2022-10-12 18:00:04 - Job 00001 | Getting metadata of source disk.
2022-10-12 18:00:02 - Job 00001 | Getting metadata of destination disk.
2022-10-12 18:00:01 - Job 00001 | Stopping destination disk.
2022-10-12 18:00:01 - Job 00001 | Run replication job ...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-10-12 17:04:24 - Job 00001 has been updated.
2022-10-12 16:58:31 - Job 00001 has been updated.
2022-10-12 16:56:30 - Job 00001 has been started.
2022-10-12 16:56:28 - Job 00001 has been created.
Am i missing something ? Maybe it is taking long time to do the snapshot (10 TB) on the source cluster ?
Thanks, Ste
Hi all,
I configured a replication job between two PetaSAN cluster in the same LAN, the job start at the scheduled time, but i does not seem to do anything, as the throughput graph sticks to 0 (zero) bs on the destination cluster. Here is the log:
2022-10-12 18:00:27 - Job 00001 | Job instance (2022-10-12 18:00:01) | Executing replication job.
2022-10-12 18:00:23 - Job 00001 | Job instance (2022-10-12 18:00:01) | Creating new snapshot at source disk.
2022-10-12 18:00:07 - Job 00001 | Job instance (2022-10-12 18:00:01) | Source disk has no snapshots , start clearing destination disk.
2022-10-12 18:00:07 - Job 00001 | Job instance (2022-10-12 18:00:01) | Started.
2022-10-12 18:00:07 - Job 00001 | Getting source disk snapshots list.
2022-10-12 18:00:05 - Job 00001 | Getting destination disk snapshots list.
2022-10-12 18:00:04 - Job 00001 | Getting metadata of source disk.
2022-10-12 18:00:02 - Job 00001 | Getting metadata of destination disk.
2022-10-12 18:00:01 - Job 00001 | Stopping destination disk.
2022-10-12 18:00:01 - Job 00001 | Run replication job ...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-10-12 17:04:24 - Job 00001 has been updated.
2022-10-12 16:58:31 - Job 00001 has been updated.
2022-10-12 16:56:30 - Job 00001 has been started.
2022-10-12 16:56:28 - Job 00001 has been created.
Am i missing something ? Maybe it is taking long time to do the snapshot (10 TB) on the source cluster ?
Thanks, Ste
admin
2,930 Posts
Quote from admin on October 12, 2022, 10:16 pmCan you run
ps aux | grep rbd
on source cluster on node running the replication service as well as on destination cluster node that has destination ip
i assume you followed the replication guide to setup replication user..etc.
Can you run
ps aux | grep rbd
on source cluster on node running the replication service as well as on destination cluster node that has destination ip
i assume you followed the replication guide to setup replication user..etc.
Ste
125 Posts
Quote from Ste on October 13, 2022, 9:39 amHello, here is the output from the source node:
root 280584 0.0 0.0 6896 2308 ? S Oct12 0:00 /bin/bash -c rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_1.txt) | /opt/petasan/scripts/backups/pipe_reader.py /opt/petasan/config/replication/00001-1665590402_progress_uncomp.json | ssh -o StrictHostKeyChecking=no -i /opt/petasan/config/replication/00001-1665590402_1665590425_474326_sshkey.txt root@10.220.12.133 "tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_2.txt) | rbd import-diff - Solo_su_HDD/image-00003 -n client.root --keyring=/etc/ceph/ceph.client.root.keyring" 2> /opt/petasan/config/replication/00001-1665590402_progress_import.txt
root 280585 0.0 0.1 950836 83396 ? Sl Oct12 0:04 rbd export-diff rbd/image-00005@rep-snap-00001-00001 -
root 280589 0.0 0.0 6896 228 ? S Oct12 0:00 /bin/bash -c rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_1.txt) | /opt/petasan/scripts/backups/pipe_reader.py /opt/petasan/config/replication/00001-1665590402_progress_uncomp.json | ssh -o StrictHostKeyChecking=no -i /opt/petasan/config/replication/00001-1665590402_1665590425_474326_sshkey.txt root@10.220.12.133 "tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_2.txt) | rbd import-diff - Solo_su_HDD/image-00003 -n client.root --keyring=/etc/ceph/ceph.client.root.keyring" 2> /opt/petasan/config/replication/00001-1665590402_progress_import.txt
root 1124148 0.0 0.0 6436 656 pts/0 S+ 11:28 0:00 grep --color=auto rbdAnd this is from the destination node:
root@petaIBIS03:~# ps aux | grep rbd
root 3523 0.0 0.0 0 0 ? S< Oct12 0:00 [rbd]
root 1058592 0.0 0.0 6436 2604 pts/0 S+ 11:29 0:00 grep --color=auto rbdI followed the Admin guide vers. 1.8, completing all these steps (in this order):
- added backup/replication role to one node in source cluster and one on dest cluster
- added destination cluster -> connection test successful (root user)
- added replication job -> succesfully selected nodes and disks
- added a daily schedule
The job is started, but the log is now at the same point of yesterday (my first message here in this thread).
Thanks and bye, S.
Hello, here is the output from the source node:
root 280584 0.0 0.0 6896 2308 ? S Oct12 0:00 /bin/bash -c rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_1.txt) | /opt/petasan/scripts/backups/pipe_reader.py /opt/petasan/config/replication/00001-1665590402_progress_uncomp.json | ssh -o StrictHostKeyChecking=no -i /opt/petasan/config/replication/00001-1665590402_1665590425_474326_sshkey.txt root@10.220.12.133 "tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_2.txt) | rbd import-diff - Solo_su_HDD/image-00003 -n client.root --keyring=/etc/ceph/ceph.client.root.keyring" 2> /opt/petasan/config/replication/00001-1665590402_progress_import.txt
root 280585 0.0 0.1 950836 83396 ? Sl Oct12 0:04 rbd export-diff rbd/image-00005@rep-snap-00001-00001 -
root 280589 0.0 0.0 6896 228 ? S Oct12 0:00 /bin/bash -c rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_1.txt) | /opt/petasan/scripts/backups/pipe_reader.py /opt/petasan/config/replication/00001-1665590402_progress_uncomp.json | ssh -o StrictHostKeyChecking=no -i /opt/petasan/config/replication/00001-1665590402_1665590425_474326_sshkey.txt root@10.220.12.133 "tee >(md5sum | cut -d ' ' -f 1 > /opt/petasan/config/replication/00001-1665590402_md5_2.txt) | rbd import-diff - Solo_su_HDD/image-00003 -n client.root --keyring=/etc/ceph/ceph.client.root.keyring" 2> /opt/petasan/config/replication/00001-1665590402_progress_import.txt
root 1124148 0.0 0.0 6436 656 pts/0 S+ 11:28 0:00 grep --color=auto rbd
And this is from the destination node:
root@petaIBIS03:~# ps aux | grep rbd
root 3523 0.0 0.0 0 0 ? S< Oct12 0:00 [rbd]
root 1058592 0.0 0.0 6436 2604 pts/0 S+ 11:29 0:00 grep --color=auto rbd
I followed the Admin guide vers. 1.8, completing all these steps (in this order):
- added backup/replication role to one node in source cluster and one on dest cluster
- added destination cluster -> connection test successful (root user)
- added replication job -> succesfully selected nodes and disks
- added a daily schedule
The job is started, but the log is now at the same point of yesterday (my first message here in this thread).
Thanks and bye, S.
admin
2,930 Posts
Quote from admin on October 13, 2022, 1:16 pmon source cluster replication node
1) Can you show output of
consul kv get PetaSAN/Replication/Active_Jobs/00001
cat /opt/petasan/config/replication/00001-1665590402_progress_import.txt
2) Do you see progress updating when you run:
rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | md5sumelse if it is stuck, add the following to /etc/ceph/ceph.conf
[client]
debug_rbd = 20
debug_rados = 20
log_file = /var/log/ceph/client_rbd.logand rerun the command and check the log
on source cluster replication node
1) Can you show output of
consul kv get PetaSAN/Replication/Active_Jobs/00001
cat /opt/petasan/config/replication/00001-1665590402_progress_import.txt
2) Do you see progress updating when you run:
rbd export-diff rbd/image-00005@rep-snap-00001-00001 - | md5sum
else if it is stuck, add the following to /etc/ceph/ceph.conf
[client]
debug_rbd = 20
debug_rados = 20
log_file = /var/log/ceph/client_rbd.log
and rerun the command and check the log
Ste
125 Posts
Quote from Ste on October 13, 2022, 2:24 pmPoint 1)
root@petasan03:~# consul kv get PetaSAN/Replication/Active_Jobs/00001
{
"job_id": "00001-1665590402",
"job_name": "prova-replica",
"node_name": "",
"uncompressed_transferred_bytes": "0.0 GB",
"uncompressed_transferred_rate": "0.0 MB/s",
"compressed_transferred_bytes": "",
"compressed_transferred_rate": "",
"compression_ratio": "",
"elapsed_time": "22:19",
"pid": "279341",
"progress": "",
"start_time": "2022-10-12 18:00:01"
}root@petasan03:~# cat /opt/petasan/config/replication/00001-1665590402_progress_import.txt
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.root.keyring: (2) No such file or directory
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 AuthRegistry(0x55f08b4a1ea0) no keyring found at /etc/ceph/ceph.client.root.keyring, disabling cephx
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.root.keyring: (2) No such file or directory
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 AuthRegistry(0x7ffd531652a0) no keyring found at /etc/ceph/ceph.client.root.keyring, disabling cephx
rbd: couldn't connect to the cluster!Point 2), the rbd command hangs also after adding the [client] section in /etc/ceph/ceph.conf, here is a tail of the log file:
root@petasan03:~# tail -50 /var/log/ceph/client_rbd.log
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a00008818
2022-10-13T16:21:11.996+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebfac58 finish_op
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000c2e0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b4 r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000c2e0 finish: rbd_data.b24cfb871d634b.00000000000004b4 r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007380 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007380 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec470 complete_request: cb=1, pending=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec470 finalize: r=4194304
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705ebff928
2022-10-13T16:21:11.996+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebec568 finish_op
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a000048a0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b6 r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a000048a0 finish: rbd_data.b24cfb871d634b.00000000000004b6 r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ca30 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ca30 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705eb9b700 complete_request: cb=1, pending=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705eb9b700 finalize: r=4194304
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705eba3908
2022-10-13T16:21:12.000+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705eb9b7f8 finish_op
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000b500 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b3 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000b500 finish: rbd_data.b24cfb871d634b.00000000000004b3 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000c1b0 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000c1b0 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec280 complete_request: cb=1, pending=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec280 finalize: r=4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000bd58
2022-10-13T16:21:12.008+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebec378 finish_op
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a00008790 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b5 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a00008790 finish: rbd_data.b24cfb871d634b.00000000000004b5 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007450 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007450 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ec00be0 complete_request: cb=1, pending=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ec00be0 finalize: r=4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000acc8
2022-10-13T16:21:12.008+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ec00cd8 finish_op
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000a120 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b0 r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000a120 finish: rbd_data.b24cfb871d634b.00000000000004b0 r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ab00 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ab00 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ea88890 complete_request: cb=1, pending=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ea88890 finalize: r=4194304
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000a618
2022-10-13T16:21:12.012+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ea88988 finish_op
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000bac0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004af r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000bac0 finish: rbd_data.b24cfb871d634b.00000000000004af r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00008e20 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00008e20 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebff7b0 complete_request: cb=1, pending=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebff7b0 finalize: r=4194304
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705ec00778
2022-10-13T16:21:12.020+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebff8a8 finish_opBye, S.
Point 1)
root@petasan03:~# consul kv get PetaSAN/Replication/Active_Jobs/00001
{
"job_id": "00001-1665590402",
"job_name": "prova-replica",
"node_name": "",
"uncompressed_transferred_bytes": "0.0 GB",
"uncompressed_transferred_rate": "0.0 MB/s",
"compressed_transferred_bytes": "",
"compressed_transferred_rate": "",
"compression_ratio": "",
"elapsed_time": "22:19",
"pid": "279341",
"progress": "",
"start_time": "2022-10-12 18:00:01"
}root@petasan03:~# cat /opt/petasan/config/replication/00001-1665590402_progress_import.txt
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.root.keyring: (2) No such file or directory
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 AuthRegistry(0x55f08b4a1ea0) no keyring found at /etc/ceph/ceph.client.root.keyring, disabling cephx
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.root.keyring: (2) No such file or directory
2022-10-12T18:00:27.422+0200 7f83d1480380 -1 AuthRegistry(0x7ffd531652a0) no keyring found at /etc/ceph/ceph.client.root.keyring, disabling cephx
rbd: couldn't connect to the cluster!
Point 2), the rbd command hangs also after adding the [client] section in /etc/ceph/ceph.conf, here is a tail of the log file:
root@petasan03:~# tail -50 /var/log/ceph/client_rbd.log
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a00008818
2022-10-13T16:21:11.996+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebfac58 finish_op
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000c2e0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b4 r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000c2e0 finish: rbd_data.b24cfb871d634b.00000000000004b4 r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007380 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007380 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec470 complete_request: cb=1, pending=0
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec470 finalize: r=4194304
2022-10-13T16:21:11.996+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705ebff928
2022-10-13T16:21:11.996+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebec568 finish_op
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a000048a0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b6 r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a000048a0 finish: rbd_data.b24cfb871d634b.00000000000004b6 r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ca30 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ca30 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705eb9b700 complete_request: cb=1, pending=0
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705eb9b700 finalize: r=4194304
2022-10-13T16:21:12.000+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705eba3908
2022-10-13T16:21:12.000+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705eb9b7f8 finish_op
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000b500 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b3 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000b500 finish: rbd_data.b24cfb871d634b.00000000000004b3 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000c1b0 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000c1b0 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec280 complete_request: cb=1, pending=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebec280 finalize: r=4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000bd58
2022-10-13T16:21:12.008+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebec378 finish_op
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a00008790 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b5 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a00008790 finish: rbd_data.b24cfb871d634b.00000000000004b5 r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007450 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00007450 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ec00be0 complete_request: cb=1, pending=0
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ec00be0 finalize: r=4194304
2022-10-13T16:21:12.008+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000acc8
2022-10-13T16:21:12.008+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ec00cd8 finish_op
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000a120 handle_read_object: rbd_data.b24cfb871d634b.00000000000004b0 r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000a120 finish: rbd_data.b24cfb871d634b.00000000000004b0 r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ab00 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a0000ab00 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ea88890 complete_request: cb=1, pending=0
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ea88890 finalize: r=4194304
2022-10-13T16:21:12.012+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x7f5a0000a618
2022-10-13T16:21:12.012+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ea88988 finish_op
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000bac0 handle_read_object: rbd_data.b24cfb871d634b.00000000000004af r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ObjectRequest: 0x7f5a0000bac0 finish: rbd_data.b24cfb871d634b.00000000000004af r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00008e20 finish: C_ObjectReadRequest: r=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 10 librbd::io::ReadResult: 0x7f5a00008e20 finish: got {0=4194304} for [0,4194304] bl 4194304
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebff7b0 complete_request: cb=1, pending=0
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::AioCompletion: 0x55705ebff7b0 finalize: r=4194304
2022-10-13T16:21:12.020+0200 7f5a0effd700 20 librbd::io::ReadResult: 0x7f5a0eff7d60 operator(): moved resulting 4194304 bytes to bl 0x55705ec00778
2022-10-13T16:21:12.020+0200 7f5a0e7fc700 20 librbd::io::AsyncOperation: 0x55705ebff8a8 finish_op
Bye, S.
admin
2,930 Posts
Quote from admin on October 13, 2022, 7:53 pm
- Is the cluster healthy in source and destination ?
- not sure if this causing an issue, but maybe using root as replication user could be a problem, it could be we create a key pair for this destination user which may conflict with existing source root keys..not sure but i would recommend you try creating a dedicated replication user.
- It is strange if the rbd export diff command at the source hangs, i am not sure if the tail log you showed is hung or it keeps adding, if so it maybe it is working but very slow. Does the progress stay at 0% even if you leave it for a few minutes ?
- Can you do a test with a small image say 100 GB in size and see if it is any better ? you can also try to run the rbd export-diff command on this smaller image, you can find the snapshot name from rbd snap ls command.
- Is the cluster healthy in source and destination ?
- not sure if this causing an issue, but maybe using root as replication user could be a problem, it could be we create a key pair for this destination user which may conflict with existing source root keys..not sure but i would recommend you try creating a dedicated replication user.
- It is strange if the rbd export diff command at the source hangs, i am not sure if the tail log you showed is hung or it keeps adding, if so it maybe it is working but very slow. Does the progress stay at 0% even if you leave it for a few minutes ?
- Can you do a test with a small image say 100 GB in size and see if it is any better ? you can also try to run the rbd export-diff command on this smaller image, you can find the snapshot name from rbd snap ls command.