I haven’t owned a desktop computer for many years, but recently I had to admit that I can’t get both ultra-portability and enough performance for video editing and machine learning projects in a single computer. So here I am, a happy owner of a fast workstation but struggling how to keep my data synced between the two. The painful part is to keep my photo workflow in control. I used to develop my own photo workflow application and at that time gave a lot of thought for this particular problem and I thought I had solved it. But apparently many Lightroom users are facing these issues still today, so here’s my combined wisdom on designing asset management and workflow software.
Ideally I want to access all of my photos (about 100 000 images from four decades, a couple of terabytes of data) from all of my computers. At home I use mostly my workstation but while traveling I want to copy any projects I am corrently working with. And naturally, after coming back it must be easy to copy all changes and new photos back to the workstation. Oh, and cloud storage is not an option: internet connectivity is not fast enough where I live and even slower while traveling in the field.
The only way I know for achieving all of this using Lightroom is to store both database and all photos on external USB disk and commecting that to the computer I am using. This kind of works but it is still cumbersome and really not what I want.
Photo workflow database design
I believe strongly in non-destructive photo editing workflow: the original file should never be modified. Instead, decriptions of the image processing operations are associated with the original image, along with any other data. Therefore it is easy to make or revert changes to photos or rerender them when image processing technology has improved.
Key design decision in a don-destructive workflow is how this information is associated with the original image file. The most obvious solution is to sotre all photos in a database. However, the database becomes very large and if the original photos are needed by any other application they must first be exported from it. Therefore this approach suits mostly for server based systems.
More common approach in workstations is to store image files in user defined directory structure and reference them by file name (or path) from the database that contains the metadata added during workflow. This is how Lightroom works, and this design was also used originally in my homegrown system.
This approach has several problems as well. It breaks if any of the image files are renamed or moved, and since they are just normal file from user’s point of view this will almost certainly happen eventually. Similar problems occur when there are duplicates of the same file. And what to do if a rendering of the original image is added to the file system indexed by the workflow software? Should it be somehow asociated with the original? Updated when the original image is adjusted? Or should its meta data be synced with the original? Photo workflow software needs to include heuristics of resolve this, but risk of database losing the link between files and emtadata stored in database remains (and has happened to me several times)
Still another solution is to store this information in the file itself. Lightroom actually does this as well: if the file format supports it, most of the Lightroom data is kept up to date in the file using XMP metadata format. This way metadata always follows the original image but if the same image. One problem solved, but the problems with duplicates and merging conflicting edits remain. More importantly it approach fundamentally breaks the key principle of “nondestructive editing” that original file must never modified. Just try to keep an offsite backup of Lightroom catalog up tu date when every DNG, TIFF and JPEG file is modified every time Lightroom is started. Besides if the metadata is modified, can other applications be sure that the actual image remains intact?
What I really want is a fully distributed asset management solution, similar to distributed version control systems used in software development. More about that in next blog post.