This morning I poked a little fun at Dell for a frustrating user experience I encountered on their website. I decided to continue in that vein – mostly for cathartic reasons – to share a frustrating issue with Microsoft System Center Data Protection Manager (DPM).
Take a look at the screenshot below – which illustrates the memory usage on the Technology Toolbox DPM server:
Note that I captured this screenshot back in September of last year (when I spent some time investigating the issue and trying to improve the situation). It shows the typical memory usage for the VM over a 24-hour period.
Here’s the corresponding screenshot I just captured from System Center Operations Manager:
While there are some noticeable differences in the memory utilization over a typical day (specifically, the “sawtooth” pattern throughout the day), the general behavior is roughly the same. By that I mean the DPM server has plenty of available RAM for most of the day, but each night the memory consumption by the nonpaged pool increases dramatically for an extended period of time (i.e. several hours) – almost to the point where the server runs out of memory.
During my investigation, I identified the large memory spike is due to the ReFS driver. I believe this is ultimately caused by DPM/ReFS removing old backups (i.e. deleting a large number of backup files). Note that Modern Backup Storage (MBS) – introduced in DPM 2016 – stores backups on ReFS disks.
Note: In case you are wondering how I identified the ReFS driver as the culprit, I captured a memory dump of the VM during one of the instances of “low memory” and used the
!poolusedextension in WinDbg to display the memory usage summary. I subsequently confirmed this using the PoolMon utility.
If, like me, you’ve been running DPM for a number of years, you’ve probably come across one or more of the following items on the Internet:
- REFS issues (server lockups, high CPU, high RAM)
- High memory usage - DPM 2016 with Modern Backup Storage
- DPM 2016 MBS Performance downward spiral
- FIX: Heavy memory usage in ReFS on Windows
- ReFS volume using DPM becomes unresponsive on Windows Server 2016
- How To Optimize ReFS Performance with System Center Data Protection Manager?
Most, if not all, of these resources ultimately point you towards implementing some registry tweaks in order to force ReFS to use less memory. Well, I wish I could tell you that approach worked for Technology Toolbox.
If memory serves, I spent several hours that week trying out various registry changes to reduce the nightly memory spikes in the nonpaged pool. None of them helped – and, yes, I did reboot the VM after making each registry change (and subsequently waited until the following day to inspect the memory usage pattern overnight).
In the end, I simply decided to “punt” the issue and increased the RAM for the VM from 6 GB to 8GB. Since then, the DPM server has been running smoothly.
How much RAM will your DPM server need? Well, I suspect that depends on the particular workload you have configured. For Technology Toolbox, DPM is currently configured with the following protection groups:
Clients - Gold (synchronizes the BackedUp folders on a few Windows clients every hour; recovery points configured every 2 hours starting at 8:00 AM; retention range is 10 days)
Critical Files (synchronizes all file server volumes every 30 minutes; recovery points configured every 2 hours starting at 7:00 AM; retention range is 10 days)
Domain Controllers (backs up the complete “System State” of domain controllers everyday at 8:00 PM; retention range is 5 days)
Hyper-V (backs up 59 different VMs – as well as the “Host Component” for each hypervisor cluster node – everyday starting at 11:00 PM; retention range is 5 days)
SQL Server Databases (backs up 42 databases on three servers – every 15 minutes with an “Express Full Backup” at 6:00 PM everyday; retention range is 10 days)
SQL Server Databases (TEST) (backs up 20 databases on a “test” SQL Server instance – every 4 hours with an “Express Full Backup” at 7:00 PM everyday; retention range is 10 days)
Adding them all up, this results in approximately 2,700 jobs in DPM each day (which I suspect is on the low end compared to large organizations).
I vaguely recall one of the posts I came across previously mentioning a server with 384 GB of memory. Yikes! (Unfortunately, I was unable to find a link for that in the quick search I just ran.)