Thermal shutdowns are triggered by a metric called “TjMax,” or Thermal Junction – Max Temperature. When a CPU or GPU begins hitting its maximum allowable temperature as measured by internal diodes, the component will dispatch a “distress” call and trigger a hard shutdown, immediately disabling the system and preventing further heat build-up. This is a good thing; it stops the temperatures from permanently damaging components and gives the system builder a bit of a chance for resolution.
There are a few main reasons for thermal shutdowns. Fan connectivity is the most common, but it’s also possible that heatsinks are improperly mounted, overclocks are too ambitious, or the case isn’t well ventilated for the components used.
What are the Symptoms of an Overheating PC?
Symptoms of thermal issues can be summed up as:
- Decreased performance, even down to the Windows and input level. High temperatures will slowly degrade system performance until a point at which the system either shuts down or throttles its clocks.
- Complete shutdown without warning, often after running some sort of semi-intensive program or software.
- Clock-rates lower than they should be.
Troubleshooting an Overheating PC
We’d strongly recommend using CAM or something that can monitor temperatures like AIDA64 (free edition) for quick checking of your temperatures. The free version does not offer logging, but you won’t need it. You could also try HW Monitor+ or SpeedFan. Some CPUs will not correctly report their temperatures to the software – it just depends on the CPU and the software version used. You may have to try a few tools to get accurate readings. In general, almost all Intel CPUs read accurately into AIDA64, and all GPUs that we’ve tested (at GamersNexus, that is) report accurately to GPU-Z or AIDA64.
Use these tools to monitor thermals while running your applications responsible for triggering shutdowns. Keep an eye on the temperature as things progress. If you see the CPU and/or GPU begin hitting ~80*C and continue climbing, it’s possible that there’s cause for concern.
Once you’ve seen enough data to feel confident that thermals are a problem (particularly if either component begins hitting 90*C – just shut it down), it’s time to troubleshoot connectivity issues.
Check the CPU power header on the motherboard. If that’s good, also check that the CPU pump (if using a liquid cooler) is connected appropriately. You can press lightly on the top of the CPU pump (while it’s on) to feel if the pump is working. If you feel a light vibration, then it’s good. If you feel significant heat in the tubes themselves, it’s probably not turning on.
The next item is BIOS. Hit ‘del’ to get into UEFI/BIOS and take a look at the reported CPU temperatures, then look at the fan speed settings. You can configure custom fan curves or settings here. If “silent” is presently selected, try opting for auto or max – just temporarily, for troubleshooting purposes – and see if temps get more reasonable. If they do, it’s possible that your case is choking air intake. This can be resolved with better case positioning in the room (is the fan butted against something?) or with more intake/exhaust fans.
Finally, we’d recommend checking that the coldplate is actually making contact with the CPU. If improperly installed, it is possible for the CPU cooler’s coldplate to hover above the CPU – doing nothing, effectively, as it is not a convection unit. Remount as necessary.
What if the GPU is Overheating?
GPU fan speeds can be controlled with CAM or another similar tool. Use one of these utilities to measure GPU temperatures and match them against your fan speed. It may be the case that you need to manually increase fan speeds to reduce overall system temperatures. – Steve Burke, GamersNexus