• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Server Project

It would appear that macOS Mojave is the last version that can support Radeon GPUs with just PCI Passthrough. macOS versions past Mojave won't allow simple GPU passthrough to work, even with Lilu and Whatevergreen kexts loaded. macOS VMs may now require passing vBIOS/option ROM passed to them as well. However, such a feature may only be available in vSphere 7.X.
More info on some of the features listed here:
 
The only other option I can see is using something like Clover to dynamically load a compatible vBIOS. But, that will require further research...
 
As mentioned in the previous update(s), the macOS VM is currently unable to use any AMD GPUs that I've passed through to it. This issue only appeared after upgrading from Mojave. There are two potential solutions for this issue (pertaining to device initialisation):
  • convert to OpenCore and use SSDT to inject vBIOS
  • use ESXi 7, VMX settings to load "option ROM" (vBIOS)
I've also seen a few posts online, indicating that the latter option may be available in ESXi 6.7. However, I have yet to confirm this. I'm tempted to try upgrading the DL580 G7 to ESXi 6.7u3, to see if that exposes the VMX options I'd need. In the worst case scenario, I won't be able to use AMD GPUs with this VM as long as I'm using the DL580 G7, and the VM runs Monterey.


Now for better news...


OSRM will run just as well in an application container as it would in a system container:
I can leave that in a Podman container now, and not be concerned about potential performance penalties.

I also encountered a thread yesterday, mentioning this repo:
AD CS can be made compatible with ACME clients, to allow for easier certificate renewal automation.

The vSphere version target for the DL580 Gen9 has been moved, from 6.7 to 7.0.
 
Still looking into solutions for using newer cards in the DL580 G7, until I can move to the DL580 Gen9. From what I've seen in documentation, I could try disabling unneeded PCIe devices to free up resources for other PCIe devices:
However, I'm not sure which ones to disable yet. I may have to open a support ticket with HPE:
That will take a while to investigate. Still need to get vBIOS for the FirePro S9300 X2, to re-test the VMX parameters.

hMailServer is no longer actively maintained. I'll be attempting a migration to Stalwart this year. But need a way to either migrate or archive and access e-mails handled and generated with the previous mail server. Currently looking into MailStore for that.

On a side note, I'm taking another shot at RADIUS with ClearBox Enterprise RADIUS server. As usual, the MikroTik Chateau isn't playing nice. Same results as last time, with TekRADIUS OD. I'm starting to wonder if I should just ditch the idea of having LTE failover in the future...
 
I finished installing and configuring MailStore Server, in preparation for the move from hMailServer to Stalwart. Evaluation of ClearBox Enterprise RADIUS server has been delayed indefinitely (best candidate tested). Project:ArcZ has changed a bit more, swapping LightDM for ly. Working on releasing an ISO for a small group of testers. The ISO repair for the Windows 10 VM appears to have been successful -- no issues since completion in mid-February. Swapped the current PDU for one with more outlets, since I was running out of usable ones. Too many appliances have chunky rectangular plugs, that block adjacent outlets on the PDU. The next version of the server project has moved on from 400GB SAS SSDs to 800GB ones. It appears that running TrueNAS as a VM, in production, is no longer discouraged:
If such is the case, I may no longer need the DL380 Gen9. If I had known (late last year) that such a change-up was coming, I would not have gotten a dedicated file server. But, it's here now...
 
The month of March has been very eventful. At first, I was looking into whether I should split the Windows Server VM into 2-3 different VMs instead:
During this brief period, I was also reviewing some security policy changes/software patches that were suggested in ManageEngine Endpoint Central. One of the software patches were for MariaDB, which would require me to check version compatibility with each app/service accessing it. Knowing my luck, things were bound to get complicated on day 5.

I then found multiple pages from iXsystems, stating that it's safe to virtualise TrueNAS Scale. I'd already spent money on the DL380 Gen9 for that, but I guess there's no use getting peeved about that. This simply means that I can get away with one less physical server in my rack (and less power draw), so there is a plus side to it. Most of the monetary loss is still there, but I can at least use the SSDs (and the discrete HBA) planned for it elsewhere.

On that same day, the VM for Project:ArcZ also threw warnings related to deprecated options/hooks in image build config file (initcpio). The older Artix OpenRC VM did not give the same warning. I got help from a contact on Discord, to correct the deprecated config parameters. Two days later, I was installing a service pack for Endpoint Central.

The next day, I was testing the Nextcloud Social app, and found out that I finally had to configure .well-known/webfinger (CardDAV/CalDAV related) for the instance. I started looking into how to edit the Nextcloud container's config for it. Attempts for this concluded on the 21st. I committed changes to the .htaccess file in Nextcloud itself, and the subdomain > custom location(s) in NGINX Proxy Manager (reverse proxy). Both methods did not work, leaving me with no clear path forward. I'll have to leave self-hosting federated services for later.

Five days later, I was reviewing FreePBX extension configs when I decided to buy more DID numbers to use in FreePBX. I also attempted to install Sunshine gamestream server via MacPorts, only for it to fail at the installation step. I'll have to look into that later as well.

Four days later, I was advised to move /boot/efi to its own dedicated partition (/efi) while updating GRUB on Project:ArcZ. I spent the next 2 days working on it, with help from the same Discord contact. At this point, if you couldn't tell, they're pretty amazing! Still need to write a pacman hook for auto-generating GRUB configuration whenever GRUB gets updated. I then started work on a dedicated VoIP VLAN for FreePBX the next day. Work for this concluded on the 22nd.

After that, I was applying and testing more security policy changes through Endpoint Central. On the 25th, I decided to remove the * (wildcard) user from SoftEther VPN, due to the rapid increase in reported software vulnerabilities. Now, each VPN user has to be explicitly defined with an AD-linked account. On the 26th, I started clearing out TimeShift backups on the Artix OpenRC VM (backup partition ran out of space for new backups).

This morning, the Windows Server VM reported an unexpected shutdown from the previous night -- even though I had issued the last shutdown command myself. I checked the Event Logs, and found multiple warning/error events from yesterday and today. Investigation and remediation for it is ongoing...
 
After seeing a notification in the Server Manager (mentioned unexpected power event) I had the Windows Server VM perform a check-disk on next power-on and checked Event Viewer. That's where I started seeing errors and event IDs that I hadn't encountered before. I ended up doing the same on the Windows 10 VM. Here are some (not all) of the things I had to review, mask, and/or remediate in the last 24 hours:
Still more for me to take on in the coming months. Some of these started popping up after taking actions suggested in ManageEngine Endpoint Central, as security policy/configurations (like the RPC-related one). While most of the heavy-lifting in Endpoint Central is done (may setup MDM certificate for Apple devices), I now have to start doing the same in Wazuh XDR. The work never stops.

On a side note, I also need a Redis replacement for the Nextcloud instance...
 
Changed Cloudflare WAF settings recently, so that only connections from certain regions are allowed (GeoIP).

With FreePBX Distro losing support soon, I'll have to move to Debian by the end of May.

May start experimenting with ClearBox RADIUS server again, once FreePBX has been moved to Debian.

The macOS has gone back to chewing through USB cards, which means that I can't directly connect a BluRay player to that VM at this time. I'll have to see if I can start ripping and uploading DVD/BluRay images to it over the network.
 
After the recent BSODs that the Windows VM has been experiencing, I'm starting to wonder if the Titan Xp is dying. They have started to appear more often, and aren't just appearing with graphically-intensive workloads. I have a replacement GPU coming in the mail.
 
Server Project Changelogs


06/01/2024
  • Temporarily changed configuration of Artix OpenRC from 96GB RAM to 128GB, for OSRM map file processing.

06/09-10/2024
  • Created and signed CSRs for expiring SSL certificates.

06/17/2024

06/18/2024
  • Initial setup/configuration of LibreTranslate (Docker Compose).
  • Created Compose file draft for FlareSolverr.

06/22/2024

06/25-27/2024
  • Updated to Nextcloud 29.X.
  • Removed LibreSign app.

06/26/2024
  • Rise in ambient temperatures noted.

07/07/2024
  • Thermal shutdown(s) recorded, due to high ambient temperatures.

07/16/2024
  • Confirmed root cause of unexpected server/host shutdowns.

07/18/2024

07/24/2024
  • Updated MariaDB to 10.6.18 .
  • Nightly backups identified as likely root cause of MariaDB performance issue.
  • Changed backup schedules/types to minimise performance impact.

07/30/2024

08/02/2024

08/04/2024
 
So, OSM Tile Server is being held off for now. I still can't find any indication that Nextcloud will officially support use of 3rd party tile servers, and I'm hoping that I just suck at reading. Really want to have a feature like that. I will buy the RAM and storage needed to support it XD

Had to delay moving from Wazuh 4.8.X to 4.9, due to issues that are being reported (where I may have to re-install if I attempt the upgrade). Hoping that an incremental version of 4.9.X will address the issue.

Currently taking another go at configuring TekRADIUS OD, to see if I I can get MFA setup for remote connections. Will have to obtain a commercial license to continue though.

Also started taming my cringe folder redirection GPO, because I made a few rookie mistakes. Ended up following initial advice from this article, and have sent the following folders back to where they belong: Contacts, Favourites, Links, Searches, AppData

But the last one was a pain to fix. AppData (Roaming) is in constant use -- see these pages for more details. And yes, it's likely that you'll have to dive into the Registry if you did this. In addition to possibly reinstalling any applications that use the AppData/Roaming folder. I got off easy, because most of the apps I use (that were affected) are not mission-critical.

Lastly, I've created the Debian VM that I'll be migrating FreePBX to. Now to just make a final backup and restore it to the new VM.
 
So, I tried asking about potential alternative distros for FreePBX...

And my thread looks like it got hijacked:
I may just install LMDE and see how it goes, since I can't get a definitive answer.


I'm also considering a move from unGoogled Chromium to LibreWolf.

However, Mozilla's recent actions have me wondering if I should avoid Firefox and its forks altogether. While I do not dislike Mozilla's overall objective(s), I disagree with some of their recent actions. Doing a 30% layoff and then shoveling on the corpo-word-salad did not instill confidence in me. That was around the time I saw that they may no longer be receiving money from Google (not that I wanted them dependent on Google).

The browser-agnostic move would be to start replacing any browser extensions with services I host instead. First move would be an ICAP server, and some changes to the current DNS settings. Hopefully, I can pull this off *before* uBlock Origin stops working for me.


Further Reading:
 
Here to provide an edit for the previous comment (replacing the last paragraph):

The browser-agnostic move would be to start replacing any browser extensions with services I host instead. First move would be a proxy server that supports cosmetic filtering (may involve an ICAP server), and some changes to the current DNS settings. Hopefully, I can pull this off before uBlock Origin stops working in Chromium-based browsers...
 
In an effort to reduce reliance on browser features/functionality, I've added Privaxy to the server project. This proxy server will provide content blocking/cosmetic filtering as a network service.

Had to re-deploy Wazuh XDR, since the move from 4.8 to 4.9 broke the instance.

I'm also deciding between RCDevs OpenOTP and Authentik, for MFA purposes. Will be used to secure SoftEther VPN, which is the remote access entry point.

MariaDB backups have been re-implemented as a dedicated script, which will run 4 times per day. Each backup is a full backup -- no incremental backups at this time.

FreePBX 17 will be installed on Project:ArcZ instead of Debian. All focus is now on creating a working installer ISO for Project:ArcZ...
 
Privaxy's whitelist has grown considerably in size since initial setup. While more sites work with it now, than during initial setup, there are still webpages that are negatively impacted when running through it. Getting Privaxy configured properly is going to take a while.

AD backups are finally working, which is a relief (running a domain controller).

While upgrading MariaDB 10.6.20, it was discovered that a previous upgrade had moved MariaDB to a new install path. While also leaving the old installation behind, in an orphaned folder. This had to be fixed, and took an entire evening to cleanup the residuals.

Investigating Kanidm and OpenOTP as MFA options.

Tempted to change TrueNAS Scale's RAM allocation to 192GB.
 
It has been an active month, to say the least.

Days before the winter holidays, my smartphone (Asus Zenfone 8 Flip) stopped working almost completely. No mobile data (3G/LTE/5G) and barely-intermittent ability to handle calls/SMS. When contacting T-Mobile (my carrier at the time), they claimed that my device has somehow been locked by a previous carrier. That didn't make sense, since I originally purchased it new/unlocked. It didn't come through a previous carrier.

In the days that followed, I'd end up purchasing a 2nd (known-working/spare) phone to test their re-locked claim and to make sure that the modem/antenna on my original phone hadn't somehow stopped working. After more testing and research, I found that T-Mobile hay have dropped support for my device. This would then get confirmed, weeks into the New Year, with an automated SMS. After I had already decided to switch to a new carrier.

On the server project, things went from tame to wild. I was supposed to move to the DL580 Gen9 over the holidays, but that got delayed. I updated Azure AD Connect on December 28th, which required some registry edits. On the 29th, I ended up creating a dedicated certificate for encrypting telecommunications. This was to be used for SSL/STARTLS and call encryption (hMailServer and FreePBX). While the old mail server didn't accept the certificate, FreePBX did. Since hMailServer is currently due to be replaced, it not being able to use the certificate wasn't much of an issue. On January 9th, I was reviewing this in relation to the current plans for the next phase (which involves a macOS VM). On January 13th the Windows Server VM started showing some new errors in Event Viewer. Then the server PSOD'd, because macOS seemingly killed another, brand new USB card. In the grand scheme of things, this was a temporary scare -- but one that seemingly leaves me with no way to physically connect USB devices to that VM (in long-term). I had to remove the USB card from PCI Passthrough on that VM, sadly. In addition to that, the Windows Server VM had started throwing one more new error.

On January 15th, I made the decision to use a dedicated hotspot (in opposed to activating multiple devices directly through the carrier). The hotspot in question is a NETGEAR Nighthawk M6 Pro (5G). On the 18th, while further testing call encryption, I ended up reading this. On January 19th, the real challenges began. The 8TB SAS HDDs started going bad, one by one. The Artix VM (Docker container host) was the first to start throwing errors. It came in on Sunday, and I was immediately forced to re-checking the SMART data on my large-capacity drives. I attempted an emergency drive clone that day, which ran late into the evening -- which didn't work out. That spilled into the next day. I ended up being saved by the Timeshift backups, which had a recent-enough backup to not cause major disruptions...

Is what I would have said if this didn't happen right afterward. The reverse proxy troubleshooting that I ended up doing would only lead into MeshCentral troubleshooting, once I figured out how to configure NGINX (since I replaced NGINX Proxy Manager with it). On January 26th, I had to replace the 8TH SAS HDD for the Windows Server VM as well. That was handled through drive cloning (but not before stopping all services first). However, MailStore Server's database had to be restored from a backup. It somehow got corrupted during the cloning process, even though it wasn't running at the time. Another service that took a hit was Nextcloud -- specifically, its database (associated tables). That led to MariaDB troubleshooting, which just ended recently. This morning, 2-3 RAM sticks seemed to have gone bad -- leaving me with 16GB less RAM until they (or the memory cartridge) are replaced. I'm currently monitoring the macOS VM, to make sure its 8TB SAS HDD doesn't go out before replacement. Will need to remove 3x 8TB SAS HDDs, a(nother) USB card, and the RAM at some point. I've already purchased replacements for everything but the USB card, since I'm considering USB-over-Ethernet. The storage enclosure that the 8TB HDDs sit in also has a strange caveat -- the UID LEDs don't work well with vSphere. I'll have to figure out how to remove the correct HDDs.

I'm hoping that nothing else comes up, so that I can begin to get back on track. I've delayed the server migration to August of this year, so that I'll hopefully have enough CTO to cover a few weeks.
 
Last edited:
Picking up from where the last project update left off...

On the morning of February 2nd, the DL580 G7 ran an automatic memory test when I powered it on. Interestingly, it found nothing wrong. That same day, I re-checked drive health for some of the large capacity (4TB/8TB) HDDs. While half of the drives checked showed signs of wear, none of them were marked with IMPENDING FAILURE yet. I had time to preemptively start cloning/replacing the worn-out disks first. By the end of the month, I'd end up replacing all (but one) of the older 3.5-inch SAS HDDs with more HGST 8TB SAS HDDs. I'd also end up replacing all (but one) of old 300GB SAS HDDs (HP-branded), used for VM (OS storage).

The RDMs (large HDDs) used by the Windows VMs were able to be cloned with regular cloning/partitioning tools. I used Macrorit Partition Expert for this. The backup HDD used by Artix was pretty easy to replace, since I could swap in the new HDD and just start backups again. The 8TB HDD for the macOS VM was a bit tedious, since Disk Utility couldn't clone or restore to the new drive. I ended up manually copying everything to the new drive instead.

On February 9th, I started looking into alternatives for FreePBX. Sangoma was treating their community (and long-time contributors) pretty poorly, and I had no desire to directly support them anymore.

On February 10th, the replacement 800GB SAS HDDs arrived in the mail. I tested them, to the best of my ability, and found no meaningful errors. Initial cloning attempts failed pretty consistently. On the 12th, I installed HPE's SSACLI to assist with managing and swapping out the drives. By February 13th, I had settled on using Storage vMotion instead of drive cloning. I'd have to re-apply with unlocker for the macOS VM afterward, since that stopped working after installing SSACLI (and rebooting the server). SSACLI seems to have also made it so that the UID LEDs, on the front bays, work again.

On February 19th, I started messing around with FreeBSD. If things had worked out, I would have had FusionPBX running on it...

On February 20th, I encountered an issue that temporarily broke pacman. I was forced to start including pacman-static on all Arch-based installs after that one.

On February 22nd, I found out about IncrediblePBX. It was easier to configure, and wouldn't require paid support/classes on a recurring basis. By March 1st, I finally got around to installing LMDE 6 for the OS (since closer to Debian). The FreeBSD VM had to go. While moving to Incredible PBX isn't exactly a bold move, it does allow me to rely less on Sangoma.

On March 6th, I moved from a stable build of SoftEther VPN server to the latest 5.X build.

On March 13th, I restored the previous FreePBX backup to Incredible PBX. Had to remove a few problematic modules, for features I never got around to using.

Just finished migrating the VMs (guest datastores) to HUSMM1680ASS204's. Only macOS still uses a 4TB SAS HDD.
 
This weekend, I may try to upgrade the DL580 G7 to ESXi 6.7u3b. August 2025 will mark the 5th year of operation for this server.
 
Currently considering adding FydeOS (after the move to the Gen9). While this could allow me to reduce reliance on buying replacement devices every few years, I'd then be reliant on Google not axing ChromeOS/ChromiumOS. Also considering ditching vSphere 8 for Proxmox VE...
 
Back
Top