Delta Backups Stuck at Merge



  • So this is probably an issue on your XO build from the sources (or something missing in your distro) 😕 You should try to do some merge manually with VHD tool to see where exactly is the problem. But it's not directly related to XO, otherwise we would have the same issue in XOA.



  • @olivierlambert

    Same problem here ..

    It seem that it hang when retention is done in delta mode..

    On file /node_modules/xo-server/src/xo-mixins/backups-ng/index.js

    Line 35 :

    I added "mergeVhd" ,

    import Vhd, {
      chainVhd,
      createSyntheticStream as createVhdReadStream,
      mergeVhd,
    } from 'vhd-lib'
    

    Line 1684 :

    I deleted and replaced the mergeVhd call by :

        const mergedDataSize = await mergeVhd(handler, path, handler, childPath)
    
    /*
        const mergedDataSize: number = await this._app.worker.mergeVhd(
          handler._remote,
          path,
          handler._remote,
          childPath
        )
    
    */
    

    In "backup.js" (not in backup-ng/index.js) , the approach was different.

    In fact , the function await this._app.worker.mergeVhd is never launched (=> The await/yield is not called)

    I imagine it's not "the good solution" but i don't have found how to do it.

    An idea ?



  • @igordosgore hi,

    The original code should work perfectly. You probably have an environment problem (or double check you are on latest on master). Please double check that it works with XOA, this will prove it's not a source code problem.

    edit: you should ping @julien-f or @pdonias I'm not a XO maintainer 😉



  • @julien-f , @pdonias

    Can you tell me which process is launching :

    "/usr/local/bin/node /usr/local/lib/node_modules/xo-server/node_modules/jest-worker/build/child.js" ?

    It seem that the worker jest-worker is not launched on frech source install.

    (And the original source try to communicate with child.js via jest-worker module)

    On commercial version of XO , the child.js is spawn at boot. (we found it when we do a "ps ax")

    Thanks 🙂



  • Just my 2 cents: XOA use the same source code from GitHub repo, so I would find this very strange it's different. Check how you start your xo-server process please. It's VERY likely an environment problem on your distro.



  • Found the issue.

    => https://github.com/facebook/jest/issues/7181

    Jest-Worker cannot launch process if the VM have only 1 CPU because :

    numWorkers: number (optional)

    • Amount of workers to spawn. Defaults to the number of CPUs minus 1.

    You should detect if 1 CPU in XOA and set minimum value => numWorkers = 1 (and don't let jest-worker to calculate default value)



  • Nice catch 🙂 As I said, it was an environment problem. Note: XOA is delivered with 2vCPUs by default, so you wouldn't have the problem. BTW, QA is done on XOA, so we validate an environment for production.

    Anyway, please report your problem to XO bugtracker, so our devs can use a correct minimal amount of CPU (despite it's not a good idea to have only one CPU in your XO VM, it shouldn't break the merge process, so it's legit to report it!)



  • @igordosgore

    Awesome! Stoked that this was finally resolved!



  • Is someone going to open the issue on GH?

    In the mean time, just bump your vCPU count by 1. 👍



  • @igordosgore good catch!!! this has been driving me nuts and i'd about given up. bumped vCPU from 1 to 2 and it is working again!



  • Hey guys. I'm still having a problem similar to this. My vCPU count is 2 and my thread count is also 2 -- made changes as suggested in this thread. I had two backup jobs scheduled 30 min apart and somehow they ran into each other -- I have no idea why taking a snapshot of 2 running VM's and merging would take over one hour to complete however that's what happened since the first job ran over the next. The two jobs became stuck in the merge state. My questions are

    1. When they get stuck in a merged state -- how do you stop the job? Clicking on stop on the XO interface doesn't stop them.
    2. Why is this still happening?


  • @kevdog Have you updated to the latest source code?



  • @jht3
    I'm using xoa-community edition which I updated through jarli script. I even did a forced rebuild after I updated.

    The base is Ubuntu 18.04.2
    $ node -v
    v8.15.1

    npm -v
    6.4.1

    $ yarn -v
    1.15.2

    $n -V
    2.1.12

    $ dpkg -l | grep vhdi
    ii libvhdi-utils 20170223-3 amd64 Virtual Hard Disk image format access library -- Utilities
    ii libvhdi1:amd64 20170223-3 amd64 Virtual Hard Disk image format access library

    vCpu = 2, 2 sockets with 1 core per socket

    *** Update -- after the process crashed for the last 5 days, it finally worked this morning.
    I have no idea. Sorry guys I guess I can't give you much at this point. Damn thing seems so temperamental. I'll report back if it happens again with more useful logs and what not, however when it was happening it was just as above -- stuck at the merge state. Something tells me changing to two virtual cpus doesn't always work.


Log in to reply