Multicast hanging for multiple PCs
-
@theopenem_admin after setting this option in Com Server Multicast settings, after I click start multicast for a group, it says "Could not start the multicast application"
-
Fixed it - had to add a space before "--blocksize", or it wasn't in the command:
07-27-22 08:01 Starting Multicast Session With The Following Command: cmd.exe /c ""C:\Program Files\Theopenem\Toec-API\\private\apps\udp-sender.exe" --file "O:/\images\Win10-EduPro-IDE_Test\hd0\part2.ntfs.lz4" --portbase 9370 --min-receivers 2 --ttl 32 --interface X.X.143.115 --mcast-rdv-address X.X.143.115--blocksize 700" Session Started And Was Forced To Quit, Try Running The Command Manually
-
FWIW, I have the same situation, and the only thing that seemed to work for me was to make sure that all storage settings were set to local, on the main server, and secondary com servers, and to make sure upload/deploy direct to SMB was disabled. Works like a charm after that.
-
@hodgesc I've got the system as Proxmox VM with 50GB C:\ drive and 1TB O:\ drive, which has the images. I don't use any secondary servers (although I think having one server per VLAN would work, I don't think it's the right way to do it, as I have system with like 5 VLANs or so) and I've never used the SMB option. Also "deploying via SMB" is a thing?
-
@eruthon Upload/Deploy Direct to SMB is the last option under Admin Settings --> Imaging Client. To multicast across VLANs, you'd have to have a com server on each vlan and run the multicast session through there.
-
@hodgesc And the com server has to be a standalone windows server with its own installation of TOEM or it could be just one windows server with one toem, but set up with multiple network interfaces and I'd just add different com servers on those networks?
-
@eruthon what I think you'd actually have to do if you just want to use one actual system to run this on multiple network interfaces, is to:
- Clone the TOEC-API site for each IP.
- Bind each IP to the site in IIS
- Setup a "com server" for each IP in the UI.
- Modify the web.config for each com server and verfiydb.
- Add them to the default cluster as passive servers with the multicast, TFTP, and imaging options.
(edit) 6. Go into HeidiSQL, change the root account to have remote access. (almost forgot that)
That should do it I think.
-
@hodgesc
You are 100% correct. Love seeing people with an understanding of how things work, not to discount others that are still learning. -
@hodgesc will try, thanks for steps, will let you know if it works!
-
@theopenem_admin I wish I could say I was an expert, but I literally just started using this last thursday. However, I've been in systems management for 16 years so I've seen enough to figure things out quickly.
-
@hodgesc ok..
- I opened the IIS config file (c:\windows\system32\inetsrv\config\ApplicationHost.config) and copied the Toec-API page, the second one called Toec-API-129
- Every web page till now had "all unassigned" interfaces, I gave the new one the IP address X.X.129.253
- I created a new com server, with its new IP address, gave it the same local storage location as the main server
- I copied the folder of Toec-API as Toec-API-129 folder, and changed the com server ID in webconfig to the one generated by website for new com server
- I added the new com server to the cluster, gave it those 3 functions
I know I forgot the steps for database, but I'd like to know what exactly to do, please.
Now I can succesfully connect to the web interface on both IPs, but PXE for PCs on the new VLAN run into PXE-E99 error (worked for every PC on the main IP; I added "next-server X.X.129.253" in my isc-dhcp config for the VLAN), while if I let the old server IP for all VLANS, it gives server timeout.
-
@eruthon Couple of quirks I forgot about.
Some permissions don't seem to copy over.
- You'll need to make sure that iis_iusr has read and execute permissions for the Toec-API-129 folder.
- Within the Toec-API-129 folder, you'll need to make sure iis_iusr has modify rights to the "private" folder. Be sure to apply permissions to child objects.
- Open HeidiSQL, enter the root password (located in the connection string in the Toems-API web.config folder, and click open. At the top, click the user manager icon (looks like 2 people). Click root on (the one with the host as your servername), change the "from host" setting to Access from anywhere.
-
- added the permissions for folders (IIS_USERS were missing, now it's the same as the original folders);
- changed the settings for database, now it shows "%" character
The problem with the PXE boot continues though... I can't PXE boot from any other VLAN other then the one with first network interface. Seems like it can't read from tftp server (which is on for both com servers in their configs and in their cluster settings).
-
@eruthon I don’t use PXE yet, but I believe you need to go into the group settings and assign the machines to the com server on their vlan.
-
You probably need to modify your tftp server.
Program files\Theopenem\tftpd32\tftp64_gui
Open settings and make sure that bind tftp to this address is unchecked -
@theopenem_admin it shows the client when it tries to download pxeboot.0, but it shows ERR in progress bar and in the logs, although I didn't do anything:
Peer returns ERROR <User aborted the transfer> -> aborting transfer [09/08 21:37:25.927]
Edit: also the bindded interface was already disabled
Edit: efi started working for other VLANs except the one with new com server
-
@eruthon I don't know if this plays a role in this or not, but my understanding is that PXE settings are pushed to vlans via DHCP, so if machines on that vlan are still trying to only contact the main rig, it may be a DHCP configuration issue.
-
@hodgesc I played with it a bit
I have ISC DHCP: global next-server set to the main IP and for VLAN 129 I have dedicated next-server on the second IP
Also I have classes in DHCP server to distinguish legacy and UEFI clients and direct them to their respective boot files, and I have proxy DHCP setting turned on in TOEM, ofcI couldn't look into it today, and I think I'll have time maybe on Friday or next week, but I can't see any problem network-wise... just finding some weird setting in our Cisco 3560G would be the best way to handle this, but multicast for campus TV is working flawlessly, also the FOG multicast deployment. And I don't have enough resources to have multiple Windows servers running on every VLAN...
I thought about setting up the SMB option, too, but you mentioned it just creates another headache generator to tackle, so that won't help, I guess, but I could try it at least. -
@theopenem_admin @hodgesc
Well, I tried different scenarios, during which neither option worked perfectly...
When I boot PXE, it either:- shows timeout of the server (this happens when I contact the new VLAN129 server)
a. the tftpd logs show multiple tries with 0% progress and then ending in ERR - downloads NBP file successfully, but doesn't download kernel (says "connection reset")
a. this shows in tftpd that peer returns error: user aborted the transfer - only 1 time I successfully got to the linux env, set up multicast for 2 clients, joined them through the PXE menu and went approximately to 67% of 30GB image, then got stuck and sadly never managed to get there again
I tried restarting the tftpd32 service many times after changing the config in com servers and clusters. Tried disabling TFTP information server for the second com server. Tried changing IPs of next-server for given VLAN. Tried different options in tftpd64 gui.
Of course there are many more combinations I tried, but neither worked, except for that one time, where the settings were set exactly as @hodgesc wrote, but it worked one time and after restarting the multicast and booting the clients again it went back to the timeout/not downloading kernels...The worst thing now is that even if I bind TFTP to only first interface, delete next-server option for new com server and even delete the com server from all configs and stopping the API for new com server, disabling network second interface - it doesn't even work as before (same timeouts and problems downloading kernel).
EDIT:
tftpd also shows in some cases: TIMEOUT waiting for Ack block #0 [16/08 12:21:50.206]Error communicating with VLAN129 com server (everything set up according to guide):
Connection received from X.X.129.238 on port 2027 [16/08 12:50:37.308] Read request for file <proxy/efi64/pxeboot.0>. Mode octet [16/08 12:50:37.308] OACK: <tsize=882048,blksize=1468,> [16/08 12:50:37.308] Using local port 54070 [16/08 12:50:37.308] Peer returns ERROR <User aborted the transfer> -> aborting transfer [16/08 12:50:37.308]
Error downloading kernel log (TFPT server for com server of VLAN129 disabled in com server cluster):
Connection received from X.X.129.215 on port 2041 [16/08 12:40:45.683] Read request for file <proxy/efi64/pxeboot.0>. Mode octet [16/08 12:40:45.683] OACK: <tsize=882048,blksize=1468,> [16/08 12:40:45.683] Using local port 52369 [16/08 12:40:45.683] Peer returns ERROR <User aborted the transfer> -> aborting transfer [16/08 12:40:45.683] Connection received from X.X.129.215 on port 2042 [16/08 12:40:45.763] Read request for file <proxy/efi64/pxeboot.0>. Mode octet [16/08 12:40:45.763] OACK: <blksize=1468,> [16/08 12:40:45.763] Using local port 52370 [16/08 12:40:45.763] <proxy\efi64\pxeboot.0>: sent 601 blks, 882048 bytes in 0 s. 0 blk resent [16/08 12:40:45.953] Connection received from X.X.129.215 on port 59651 [16/08 12:40:49.710] Read request for file <proxy/efi64/pxelinux.cfg/default.ipxe>. Mode octet [16/08 12:40:49.710] OACK: <blksize=1432,tsize=1141,> [16/08 12:40:49.710] Using local port 50690 [16/08 12:40:49.710] <proxy\efi64\pxelinux.cfg\default.ipxe>: sent 1 blk, 1141 bytes in 0 s. 0 blk resent [16/08 12:40:49.710] Connection received from X.X.129.215 on port 60599 [16/08 12:40:49.710] Read request for file <proxy/efi64/pxelinux.cfg/01-30-9c-23-6a-6a-40.ipxe>. Mode octet [16/08 12:40:49.710] File <proxy\efi64\pxelinux.cfg\01-30-9c-23-6a-6a-40.ipxe> : error 2 in system call CreateFile The system cannot find the file specified. [16/08 12:40:49.710] Connection received from X.X.129.215 on port 4576 [16/08 12:40:49.710] Read request for file <proxy/efi64/pxelinux.cfg/01-.ipxe>. Mode octet [16/08 12:40:49.710] File <proxy\efi64\pxelinux.cfg\01-.ipxe> : error 2 in system call CreateFile The system cannot find the file specified. [16/08 12:40:49.710]
This is shown during every successfull PXE boot, but the kernel wants to be downloaded from the VLAN129 com server, which resets the connection and ends PXE boot.
- shows timeout of the server (this happens when I contact the new VLAN129 server)
-
Well, tried drastic measures and reinstalled the whole server from zero...
Now I can PXE boot on BIOS machine, but can't download kernel due to connection reset...
And I can't even PXE boot on UEFI machine due to "PXE-E99: Unexpected network error"I found out I didn't copy everything I needed in the Applicationhost.config, so this time I tried to copy everything regarding Toec-API to its duplicate applicationpool, site and location.