Issues deploying Microsoft Surface Pro 6 image
-
To preface, I've been using Aomei Backerupper for a while, I dislike it a lot. I have a bunch of Surface Pro 6s to deploy, I've done this hundreds of times with Aomei. This time, I build a fresh windows 10 install, get all the updates done, boot into Aomei to take, and the UX does not display the drive; I can access diskpart, see that the disk is there, the partitions appear normal, the volume is mounted, I can cd /d to it and look at windows. Just to establish, this began on a weird foot.
I've taken the opportunity to put more time into trialing Theopenem as a replacement, so I'm here fully recognizing that whatever is going wrong is probably my fault.
Environment
I have a windows 10 based server to host Theopenem 1.5.5, and a TrueNAS (BSD) based file server hosting over SMB, and I can upload the image both via the COM server and direct to SMB. The schema on these images appear normal.
Surfaces are the main thing I do, they're very ornery little things, but the 6 is usually pretty good. The storage is soldered to the motherboard, Microsoft handles the majority of the drivers themselves. The UEFI has extremely few controls, basically which secureboot keys to use, TPM or no, some let you disable hyperthreading, disabling webcams, wifi, etc, and boot configuration.
The tablets are inside a VLAN with no DNS service due to Aomei reasons, the COM server and SMB server are in the main network.
Symptoms
When I deploy them, first of all, frustratingly, it has succeeded twice, non-reproducible. I deploy one tablet, it works, I deploy the next tablet, doesn't work, same image, no changes, same UEFI configuration.
Most of the time after the deploy it drops back to shell and the message is
** Deploying Image for Drive 0 ** The operation completed successfully. The boot configuration data store could not be opened. The requested system device could not be found ** Closing Active Task **
or
The boot configuration data store could not be opened. The system cannot find the file specified ** Closing Active Task **
cannot find the file specified
completes much faster, based on the server network graph, it does not actually transfer the OS partition.Afterwards the system cannot boot to the local disk and the UEFI does not automatically list a Windows Boot Manager option.
Going into diskpart, I will find one of two things, for
device could not be found
is one partition, and it is 4 times larger than the disk,cannot find the file specified
has 3 out of 4 partitions present, System, Reserved, and Recovery, but not Basic. This version is happening when I've configured Images->SurfacePro6->Image Profiles->Default WIE->Deploy Options-> Modify The Schema to make these 3 partitions FixedLogs
I'm not as well versed in the subjects being covered in these logs as you might expect so they haven't been much help to me. Also I haven't had the time to fully comprehend how everything in Theopenem is structured so let me know if there's another kind log I should add.
Successful deploy
Immediately following failed deploy
cannot find the file specified
94_9A_A9_27_AF_36-modelmatchdeploy.txt
The first error appears to be line 256:Set-Partition : The requested access path is already in use.
which rolls forward into a lot of warnings about ignoring files (the whole partition)Later deploy
device cannot be found
94_9A_A9_20_95_38-unregdeploy.txt
There's no literal error that I can identify, so at this point I believe the upload or deploy configurations were faulty for this outcome.I made a lot of tweaks trying to get these working and I don't have a good record of how things were configured for any specific log.
Research & Experimenting
Thread with the same final error: [WIE] Problem with RST drivers
Windows itself does not have RST drivers installed, it just has
Microsoft Storage Spaces Controller
andSurface NVM Express Controller
. I'm working on extracting the NVMe Controller driver, but the fact that it is able to get some partitions onto the drive without this driver confuses meThread with Set-Partition error: No hard detected for deployment
Similar situation but the thread was inconclusive -
Putting the NVMe controller driver on the WIE stick didn't help. I tried taking a New image with the driver as well.
Intel RST will not install "unsupported platform"
-
I have just managed to successfully image several units in a row!
Prior to the streak I switched the partition mode to Standard based on some other threads, this didn't help immediately but that's how it's currently configured. I'm unclear exactly what this setting means. Ideally I want to be using a universal image process in the end, so the image will need to deploy to drives of various size.
I've been pulling the USB stick before selecting Deploy, and I switched to using a USB ethernet adapter. If you've never touched a Surface Pro 4 or later, Go, Laptop, Laptop Go, or Book, they use a proprietary magnetic blade connector for docking and charging.
I'm suspicious that this connection is not very stable, getting the specific dock & ethernet drivers may slip in some code consideration that smooths out the connection. This hasn't been a problem with Aomei, but they use a proprietary format I haven't identified, so they may also have a more convoluted protocol as well.
This suspicion arose from a number of attempts where partition 3 had loaded 8% of the data before trying to update the boot order. The consistency of 8% argues a bit against arbitrary network interruptions.
I started removing the USB stick because it seemed like it was getting hammered during deploy, flickering constantly and warm to the touch, and in some logs I was seeing weird entries in the drive list. The thought being maybe it was somehow being involved in the deploy.
-
At this point I've been reliably deploying these and several other models of surface for a couple days. It's safe to say the problem is fixed, but I'm not confident I know why.
My best guess is I had enabled Randomize GUIDs in
Admin Settings -> Image Profile Templates
, for the LinuxBlock.I first tried Standard Partitions while this was enabled, and it's the only thing I can remember that was consistent between when deploy didn't work, and when it did. Everything else I tried didn't correlate, as in, I was having failures and successes with the setting in all states.
I'd still love to understand what was going wrong, if that is possible to discern from the existing evidence, so that I can avoid or fix it in the future.