Safe data handling during image updates
-
Currently when you update an image, one of the first things to happen is that the old image files gets deleted. If the upload does not succeed, this means operations must halt until an image can be uploaded successfully.
This issue is difficult to discuss because the word image is used for several things, so going forwards:
Logical Images: The concept listed on the Images page, which contain Image Profiles
Profiles: Image Profiles
Image Data Files: The files and folders located at [storage path]\images\There exists at present no way to move or clone Profiles between Logical Images, if you're using universal images then you will accrue a lot of Profiles with different File Copy configurations, reproducing these on a new Logical Image is cumbersome and time consuming, in addition to other configuration like Model Match (which must first be removed from the old Logical Image). This creates a need to avoid making a new Logical Image, so it's not viable to "update" by taking a new image. The ability to clone an image or migrate profiles would be generally nice to have, but the correct solution to this problem is to handle the Image Data Files safely to begin with.
The essential premise of data safety is to ensure no step in the process can result in total data loss if it fails. Notably, Theopenem will never be qualified to determine if a taken image is valid and correct, that's my job; though Theopenem could do more to detect obvious failure, such as there being the wrong number of .wim files for the schema. As is though, no amount of detecting failure will recover the deleted Image Data Files.
The structure should be, broadly, Logical Images possess a set of Image Data folders, each with their schema and .wim files. One of them is marked as Primary and this is what gets deployed normally. When an image is uploaded the new Image Data Files gets written to the storage location in parallel to the existing images' files. When I have validated the new Image Data Files, I can mark it as primary, or if there's a problem I can delete it. If it's valid, then when I feel comfortable doing so I can delete the old Image Data Files. My ideal is usually to keep one known good version in case I later discover a problem, but structurally there's no good reason to limit the depth of this history beyond reasonable datatype maximums.
This allows deploys to take place while the image is updating, some organizations will want a way to prevent that but for me it's very useful. It also allows the new upload to be thrown into service with zero downtime; ongoing deploys have their orders and can finish, while new deploys will get the new primary. The most inconvenient step is that there will need to be a workflow that allows non-primary image data files to be deployed for testing, without making it primary, which could allow the upload to be erroneously deployed to a different machine before it is known to be good.
This could all be structured on disk so that Theopenem can re-integrate Image Data Files that it finds but didn't expect, such as would occur if I restored the deleted files using my file server's snapshots and replications.
-
Been going through requests trying to figure what to work on next and I think this makes the most sense. This will be a focus of the next release.