EUE Types
From FaHWiki
Contents |
Core independent
"Time out" Errors
3 hours since checkpoint written... Folding@home Core Shutdown: EARLY_UNIT_END
These errors can occur when a machine goes into S3 (standby) without pausing or terminating the work thread. When the client resumes (switches to S1) it assumes that as nothing has happened for at least 3 hours the WU must be faulty and discards it.
EDIT: Apparently this has been corrected. I'll leave the explanation until we can confirm that this is no longer true.
GROMACS and derivatives specific
Gromacs cannot continue further
Windows Specific
Gromacs cannot continue further. Going to send back what have done. logfile size: xxxxx Writing xxxxx bytes of core data to disk... ... Done. Folding@home Core Shutdown: EARLY_UNIT_END
Quit - 101 Fatal Error (LINCS warnings)
Quit 101 - Fatal error: Step -2, time -0.002 (ps) LINCS WARNING relative constraint deviation after LINCS: max 0.038678 (between atoms 193 and 194) rms 0.001725 (#QNAN0) ...snip... Folding@home Core Shutdown: EARLY_UNIT_END
The #QNAN0 state is optional
Quit 101 - Fatal error: NaN detected: (ener[xx])
Quit 101 - Fatal error: NaN detected: (ener[xx]) ...snip... Folding@home Core Shutdown: EARLY_UNIT_END
xx can be 0, 11, 12, 13, 18, 20, or perhaps other values.
From Gromacs.org: The error comes from coordinates being NaN (not a number). The physical reason for this is that particles come too close to each other.
The most frequent cause of this error is an unstable CPU, which causes incorrect results to be inserted into the simulation. This is turn can result in molecules being reported in different (wrong) positions sometimes manifesting itself in this error.
Quit 101 - Fatal error: ci = ...
Quit 101 - Fatal error: ci = -2147483648 should be in 0 .. xxxx [FILE nsgrid.c, LINE 215] Simulation instability has been encountered. ...snip... Folding@home Core Shutdown: EARLY_UNIT_END
Quit 101 - Fatal error: XTC Error
Quit 101 - Fatal error: XTC error ...snip... - Could not open results file Folding@home Core Shutdown: EARLY_UNIT_END
From Gromacs.org: If your system has relatively slow disk-IO, and/or you write frames and energies out very often, and/or you have a very large system the performance might be limited by disk access. In that case, you might consider writing fewer frames to your trajectories (.xtc and especially .trr or .trj) and energy file (.ene or .edr).
When applied to F@H this usually means the core can't access one of the frame/trajectory files. Since this constitutes a serious error the core quits and causes an EUE.
Quit 101 - Fatal error: Too many box elem corrections
Quit 101 - Fatal error: Too many box elem corrections 1 ...snip... Folding@home Core Shutdown: EARLY_UNIT_END
This error is reported very infrequently, and with the limited reports received to date appears to be specific to certain WUs.
AMBER specific
NaN/Inf detected e[0]
NaN/Inf detected e[0] Going to send back what have done. logfile size: 33750 - Writing 34270 bytes of core data to disk... ... Done. Folding@home Core Shutdown: EARLY_UNIT_END
Note: Inf is short for Infinity.
QMD specific
Core cannot continue further
Core cannot continue further. Going to send back what have been done. logfile size: 19691 - Writing 20211 bytes of core data to disk... Done: 19699 -> 4588 (compressed to 23.2 percent) ... Done. Folding@home Core Shutdown: EARLY_UNIT_END
NaN/Inf detected V[0]
NaN/Inf detected V[0] Going to send back what have been done. logfile size: 47106 - Writing 47626 bytes of core data to disk... Done: 47114 -> 8208 (compressed to 17.4 percent) ... Done. Folding@home Core Shutdown: EARLY_UNIT_END

