Directory Synchronizer

C# Library
C#
Github Gist

What It Is

A C# class to do an efficient one-way sync of files and folders from a source to a target

Status

It’s still in active development as of September 2024, but I haven’t changed anything in a while. It seems to “just work” – I run and observe it every day, and it syncs accurately.

Details

I wrote this as sort of a backup utility that keeps a log of what it does and allows me to version whatever it operates on. I also have some vague plans to use it on this site as a CMS publishing tool – I could operate on Markdown files in a “draft” directory structure, then use this code to “publish” to the production root.

This class synchronizes a directory tree from source to target. It’s a one-way sync, meaning that it modifies the target so it looks like the source – it is not expecting any changes in the target.

With its default implementation (see the note about “content signatures,” below), it’s very fast.

I’m running it on a deep directory structure with about 6,000 files. If there are no changes, it will run in less than one second. Even with dozens of sync operations, it rarely takes more than a second. (Note: this will, of course, be affected by your storage architecture. I’m reading and writing from/to the same SSD, which is clearly an ideal situation.)

To use, in its simplest form:

var syncManager = new FileSystemSyncManager(@"C:\source", @"C:\target");
syncManager.Sync();

That will perform all the activities required to make the target match the source. After the synchronization, syncManager.Log will contain a collection of everything it did.

Here’s the easiest way to persist a log:

var logJson = JsonSerializer.Serialize(syncManager.Log);
File.WriteAllText(@"C:\log.json", logJson);

A couple caveats/decisions:

The above were both arbitrary decisions that fit my usage. I might change those behaviors in the future, or provide some option flags for them.

There are some “events” (really just delegates; the default implementations return true in all cases):

syncManager.BeforeOverwriteExistingFile = (pathInSource, pathInTarget) => {
  // Do something
  return true; // If you return false, it will cancel the operation
};

syncManager.BeforeWriteNewFile = (pathInSource, intendedPathInTarget) => {
  // Do something
  return true; // If you return false, it will cancel the operation
};

syncManager.BeforeDeleteFile = (pathInTarget) => {
  // Do something
  return true; // If you return false, it will cancel the operation
};

I use this to archive the contents of any existing file before it’s overwritten or deleted.

The way it determines if file contents have changed is up to you. It does this by using a “content signature.” This is a string that will be different between source and target if the file contents have changed.

In the default implementation, it simply uses LastWriteTime, converted to a Unix timestamp. So, it’s just comparing file write times – if they’re different between source and target, then the file content is assumed to be different, and the source file will overwrite the target file.

This is very fast (because it doesn’t have to read the file contents), and it has been perfectly accurate in my usage so far.

However, you can get more exact, if you like –

There is a static delegate on FileRef called GetContentSignature. You can re-implement this to use any other method you like – you just need to return a string that represents the file contents in a way that will be reliably different between source and target if the content has changed.

FileRef.GetContentSignature = (f) => {

  // f is the FileRef object
  // f.File is the FileInfo of the underlying file

  // Do whatever you need to do to represent the content as a string
  return "this is the content signature";

}

Obvious example for a universally accurate signature –

If you’re always dealing with text files, you could literally read the entire file contents as the signature. This would be the most brute force and inefficient method, and would clearly only work with text. This will require a file read which will slow things way down, but it would indisputably be the most accurate method.

If you have non-text files, you’d just need to create a string hash of the bytes of the file. Here’s what that would look like (the ToMD5Hash extension method is provided in the code):

FileRef.GetContentSignature = (f) => 
{
  return File.ReadAllBytes(f.File.FullName).ToMD5Hash();
};

In my usage situation (6,000 files), this method ran in about 30 seconds, as opposed to less than one second.

(The first time I ran it, it took almost two minutes, but it was never again that slow. This happened in another instance as well – the first time, it took 4-5 times as long as every subsequent time. After a reboot, it goes back to the long execution time. I have to assume Windows is caching some file data.)

The sole benefit to this is that it doesn’t copy files that are unchanged, even if they were written. To test this, I re-deployed a ZIP file to the source without any changes. The hash method did not copy the file over to target, since the bytes/hash were the same. When I switched back to the (default) signature based on the last write time, the file copied over since the write times were different.