Continuous Replication starts failing



  • On several occasions I've experienced failure in the CR process. This is what shows in the job log:

    2016-02-03 10_05_32-xen orchestra

    Looking at the output from xo-server, here are the errors I am seeing:

    xo:xapi Snapshotting VM xxxx2800 +4s
    xen-api root@192.168.1.36: VM.snapshot_with_quiesce(...) [38ms] =!> XapiError: VM_SNAPSHOT_WITH_QUIESCE_NOT_SUPPORTED(OpaqueRef:89834259-5c9b-7110-5391-2b8bef3e0885) +9s
    xo:xapi task created: VDI Export (xxxx2800_1) +2s
    xo:xapi exporting VDI xxxx2800_1 (from base xxxx2800_1) +1ms
    xo:xapi task created: VDI Export (xxxx2800_0) +12ms
    xo:xapi exporting VDI xxxx2800_0 (from base xxxx2800_0) +0ms
    xo:xapi Creating VM xxxx2800 +2s
    xo:xapi Cloning VDI xxxx2800_0 +174ms
    xo:xapi Deleting VM xxxx2800 +3ms
    xo:api dan@mydomain.com | vm.deltaCopy(...) [6s] =!> Error: missing base VDI (copy of cca2df1b-85cf-4842-965b-4cba7a6448d5) +2ms
    

    Anyone else encountering this or similar failures?

    @olivierlambert Let me know if there is something I can do to assist with troubleshooting / debugging this.

    Dan



  • FYI, the only way I can get things working again is to delete all snapshots for the given VM.



  • Okay, let's find out. Did you move the VDIs from this VM to another SR?



  • No, I made absolutely no changes to the VM or hosts. The job just spontaneously begins to fail. Each time it fails, there is a snapshot left over so the number of snapshots can grow quickly...



  • I imagine you have enough space to create the new snapshot before it fails?



  • Yes, as best as I can tell. This is what XenCenter shows for the SR:
    468.3 GB used of 815.6 GB total (1432.8 GB allocated)



  • @julien-f any opinion on this?



  • Did you removed the VM on the other SR? I mean the "copy"?

    edit: the error message means: "Ok I made the delta, now I want to upload the result on the target VDI.. but I can't find it anymore!"



  • Went back and checked to see when the previous failure occurred. Job ran successfully every 5m from 1/31 @ 8:30am until today @ 9:30 am.



  • @olivierlambert said:

    Did you removed the VM on the other SR? I mean the "copy"?

    edit: the error message means: "Ok I made the delta, now I want to upload the result on the target VDI.. but I can't find it anymore!"

    No, the VM is still there. As I stated earlier, the job just spontaneously began failing. This has occurred several times for me in the last few weeks. I am the only person with access to the hosts, and no changes were being made prior to the initial failure.

    I just checked and there were numerous unattached VD on the destination SR, likely leftovers from the failed jobs so I have removed them.

    Right now only one of the VDs is showing for the copied VM. However, the job is still running to recreate the initial copy, so I'm guessing that this is just a by-product.



  • Enough space on the destination SR too? (sorry for those questions, just removing all possibles issues)



  • Yes. Destination SR from XenCenter: 191.5 GB used of 1090.1 GB total (1008.8 GB allocated)

    Currently this host only houses XO and another small linux VM, so there's plenty of room to replicate this VM from the other host.



  • Do you have the log of the first fail attempt?

    It will be interesting to understand what is the trigger. Every 5 min is maybe too short, but the scheduler would avoid to start the new job if the previous one isn't finished.



  • Not that I am aware of. The xo-server process is started by a cron job, and I am not aware of any logs automatically saved by XO.



  • What do you mean by "xo-server process started by a cron job"?

    Also, xo-server output is where everything is displayed, you don't save them anywhere?



  • This was discussed in this thread over on the Mangolassi forum. There is a cron job that runs each time the server is started, so there isn't a need to manually log in and start xo-server.

    Since this isn't attached to the console, I don't believe the output from xo-server can be retrieved from an earlier timeframe.



  • I'm not sure to understand what stuff is done by this cronjob. For me, a cron is a way to schedule something, it's not an init script. Can you explain further?



  • @Danp said:

    Right now only one of the VDs is showing for the copied VM. However, the job is still running to recreate the initial copy, so I'm guessing that this is just a by-product.

    Ok... so now there are two copies of the VM on the destination host. The "original" one (which is missing a VD) and the "new" one (which contains both VDs).

    I'm guessing the the missing VD is the issue. Not sure why that happened, but it had to be the result of the CR process in XO.



  • The cron job executes cd /etc && ./xo-start.sh. This file contains the following:

    #!/bin/sh
    cd /opt/xo-server
    sudo npm start 
    

    Without this or a similar mechanism, xo-server doesn't automatically start.



  • This is a kind of init script, not a cronjob. Or I don't understand it yet. If it's a cron job, show me your crontab, this way it will maybe more clear.


Log in to reply