Cloud instances perfomance - running as cluster ?

English Support for Syncovery on Linux etc.
Post Reply
Jean-Michel
Posts: 15
Joined: Fri Jun 10, 2022 2:12 pm

Cloud instances perfomance - running as cluster ?

Post by Jean-Michel »

Hello

We are evaluating running Syncovery running in AWS cloud. Currently we have a docker instance running in a 4vCPU / 4GB FARGATE for testing

Last night I started a test between a AWS S3 bucket and a GCS bucket with 220 files with a total of 4TB
The job is running 5 files in parallel
The transfer speed runs around 80-90MBytes/s which leave us with 50 hours for that big bunch of transfer

Unfortunately these performances are too slow for our objectives as we are looking to be able to transfer continuously ~ 200 files 4TB/day

While looking at the CPU and memory usage, everything is in control with less than 20% used from the 4TB memory and CPU lower than 50% most of the time with some peaks at 75%
We are looking on how we could increase the performance.

We can't run multiple independant Syncovery instances as we are copying from 1 source to 1 destination so we are afraid that multiple instances running the same job definition would risk to transfer the same files in each instances creating dupplicates, increasing the problem instead of solving it.

So we are wondering if it is possible to run multiple Syncovery transfer agent in a cluster under the control of a single job scheduler which would distribute a lot of copy across those agents.

Other alternative is to try running on a big EC2 with 10Gb network interface, 1TB of temp storage but that would be less flexible than a cluster which we could scale up and down as needed.

Thanks for any suggestion

tobias
Posts: 1669
Joined: Tue Mar 31, 2020 7:37 pm

Re: Cloud instances perfomance - running as cluster ?

Post by tobias »

Hello,
yes this can surely be distributed across multiple instances.

It is important to remember that Syncovery needs to download complete files to TEMP storage, so you need enough hard disk space for 5 simultaneous temporary files on each machine. My intention is to avoid this necessity in the upcoming Syncovery 11, so you will not need so much space.

In general when Syncovery is running on multiple machines, each instance is completely independent, so there will be multiple independent schedulers. The instances do not communicate with each other (yet).

Therefore we must find a way to determine which files are going to be copied by which instance.

In the most simple case, as an example, instance one could use a file mask [A-M]* and instance 2 could use a mask [N-Z]*. So instance 1 copies files whose names start with A thru M, while instance 2 copies files whose names start with N thru Z.

Another idea would be that instance 1 copies files in alphanumerical order forwards, and instance 2 backwards, and each instance checks if a file has already been transferred.

As a next step, we can teach the instances to communicate with each other. This can be added either through a PascalScript, or I can add it to Syncovery internally. The communication would have to be done via some files in the file system, where each instance logs which files it is copying, and each instance checks if another instance is already copying a file and would then skip such files.

Alternatively, each instance could create a lock file on the destination (same filename + ".lock") which would then prevent another instance from uploading the file. In fact that could be easier and more reliable than communicating via files in the local file system.

Let me know your thoughts! I think we will be able to implement this quickly.

tobias
Posts: 1669
Joined: Tue Mar 31, 2020 7:37 pm

Re: Cloud instances perfomance - running as cluster ?

Post by tobias »

Hello,
I wanted to add that I am extremely interested in getting this to work for you and you can expect the highest level of support and implementation speed.

Did you authorize Syncovery with GCS using a service account according to
https://www.syncovery.com/gcs/ ?

Cheers,
Tobias

Jean-Michel
Posts: 15
Joined: Fri Jun 10, 2022 2:12 pm

Re: Cloud instances perfomance - running as cluster ?

Post by Jean-Michel »

Hi Tobias

Sorry for the late feedback
Just to update you, we just have purchases a Professional license to complete our month of intensive testing and we are currently in production copying roughly 10TB a day from our customer's Google bucket to our AWS bucket.
We are running Syncovery on a FARGATE instance Linux/x86 4vCPU 8GB using a Docker image built from https://github.com/MyUncleSam/docker-syncovery with no changes
Temp folder is mounted to an EFS which is an elastic file system so it resizes as needed and we don't have to care about the file size
Average transfer rate is around 80MBps with 3 files in //, some time less some time more
We tried 5 files in // but the transfer rate was not better and CPU higher.
May be the EFS is the limiting factor and we could get close to 100MB/s with a EBS storage but that would require to commit on the size.
Although files are between 15GB and 30GB each so with 3 files in // we would be ok with a temp storage of 100GB

CPU usage is correct (<70%) and memory usage VERY LOW less than 7% of the 8GB !!!

So far it is working fine and we don't need that clustering
But I keep your ideas for possible future use.

Thanks for all your support and great work so far

Post Reply