CoreStatus codes

From FaHWiki
Jump to: navigation, search

Contents


Known CoreStatus codes shown by FAH client v5.xx.

Main CoreStatus Codes

64

Folding@home Core Shutdown: FINISHED_UNIT
CoreStatus = 64 (100)
Sending work to server

66

Folding@home Core Shutdown: INTERRUPTED
CoreStatus = 66 (102)
+ Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
Killing all core threads
Folding@Home Client Shutdown.

Also see GPU Error Codes
There are unexplained reports where this message is received but nobody interrupted the WU.

72

Folding@home Core Shutdown: BAD_WORK_UNIT
CoreStatus = 72 (114)
Sending work to server

Possibly memory issues, see:
Folding-community: vgt's post in Folding@Home Common Errors
Folding-community: bad work unit

Folding@home Core Shutdown: EARLY_UNIT_END
CoreStatus = 72 (114)
Sending work to server

The EARLY_UNIT_END error is returned whenever a WU dies with a known error that can provide useful results. All this error code signifies is that the results can be returned even though the MD calculations failed. The points received are proportional to the work done. The causes can be machine related (see link below). It is also possible that several machines EUE on it at the same point, but that it is completed (100%) by yet another. Folding-community: Uncle_Fungus' post in EUE #771 - Partial Log Pasted

Read more at Early Unit End for descriptions and possible causes.

99

CoreStatus = 63 (99)

This error has been noted when switching from one version of the Windows SMP to another. Apparently a WU started with one version of MPI cannot be completed with the other version. If you need to switch versions, use the -oneunit flag to finish the current WU first or discard the active WU.

Other CoreStatus Codes

Non-fatal

62

CoreStatus = 62 (98)
+ Restarting core (settings changed)

6E

Folding@Home Core Shutdown: CORE_OUTDATED
CoreStatus = 6E (110)
+ Core out of date. Auto updating...

If the new dowloaded core still fails with the same error then it is likely that there is an assignment issue server-side at Stanford. Report the case and delete the WU.
Folding-community: Core Status 6e .. FahCore_7a.exe (not?)
At the extreme you'll need to download the beta core manually for normal operation:
Folding-community: DoubleTop's post in Core Status 6e .. FahCore_7a.exe (not?)
Downloading FAH Core files manually

Fatal

-10

CoreStatus = FFFFFFF6 (-10)
Client-core communications error: ERROR 0xfffffff6
This is a sign of more serious problems, shutting down.

CAL returns this error code when it cannot initialize the GPU. Verify that there is a monitor or dummy plug on that card and that the Windows desktop has been extended to it. Verify that you're running the correct version of the .dll files.

12

Error: Could not write local file.  Exiting.
- Shutting down core
CoreStatus = 12 (18)
Client-core communications error: ERROR 0x12

This error occurs if the (SMP) client is started but the files of the current work unit (as registered in the queue.dat) have been deleted. This error has also been reported on ext4 filesystems.

61

Folding@home Core Shutdown: UNKNOWN_ERROR 
CoreStatus = 61 (97) 
+ Client running with incorrect SMP settings for work unit.  Please check settings and restart 

This errors occurs when the -smp switch is not used when restarting the v6 SMP client in the middle of a work unit. The first example as from a Linux SMP Beta client.

63

CoreStatus = 63 (99)
+ Error starting Folding@Home core.

Generically, this error is reported when some initial fahcore startup requirement is not met.

Permission issues? (FAH V5.04)
Folding-community: Error starting Folding@Home core.

With the SMP for Windows client, a CoreStatus = 63 can either be a permissions problem or a MPI registration problem, or both. To fix the problem, set the properties of the fah.exe file to "Run as Administrator" and then run the "install.bat" file again to register MPI and the SMPD service.

Failure to install .NET 2.0 (Win 2K, XP) prior to installing the SMP client can also result in this error.

v5.91 and v6.x
Folding-community: Error starting Folding@Home core.

65

Instability has been encountered.
CoreStatus = 65 (101)
Core internal error: SPECIAL_EXIT

This refers to simulation rather than machine instability

6F

Folding@home Core Shutdown: BAD_FILE_FORMAT
CoreStatus = 6F (111)
The core could not recognize the format of the provided work file. 

The disk may be corrupted or may be a memory error.

Folding-community: BAD_FILE_FORMAT?

70

VerifyARCFile: Checksums don't match (frame 1). Failed verification
Error: ARC file integrity could not be confirmed. Exiting

Folding@home Core Shutdown: BAD_FRAME_CHECKSUM
CoreStatus = 70 (112)
+ The core could not validate the current work unit for processing.
Deleting current work unit & continuing... 

75

Size of work/wudata_xx.xtc not what saved.
Folding@home Core Shutdown: FILE_IO_ERROR
Couldn't delete work/core78.sta.
CoreStatus = 75 (117)
Error opening or reading from a file.
Deleting current work unit & continuing... 
- Couldn't open work/wudata_xx.chk
- Couldn't open work/wudata_xx.chk
Writing local files
Completed xxxxxxx out of xxxxxxx steps (xx)

or

Folding@home Core Shutdown: INTERRUPTED
CoreStatus = 75 (117)
Error opening or reading from a file.
Deleting current work unit & continuing...

or

- Checksums don't match (work/wudata_01.xtc)
Premature end of file when checksumming (1037560 bytes left)
- Could not calculate checksum (work/wudata_01.xtc)
Checksum not what expected.

Folding@home Core Shutdown: FILE_IO_ERROR
CoreStatus = 75 (117)
Error opening or reading from a file.
Deleting current work unit & continuing... 

or

- Error: Bad work unit. Digital signatures don't match
Error: Could not open work file

Folding@home Core Shutdown: FILE_IO_ERROR
CoreStatus = 75 (117)
Error opening or reading from a file.
Deleting current work unit & continuing... 

These errors indicate an I/O hardware problem or perhaps an AV program preventing FAH from writing/reading certain work files.

77

Folding@home Core Shutdown: UNKNOWN_ERROR
CoreStatus = 77 (119)
Client-core communications error: ERROR 0x77
Deleting current work unit & continuing...

Disk full?!

Ran out of Virtual memory. (Increase the size of the paging file.)

79

Gromacs error.
Folding@home Core Shutdown: UNKNOWN_ERROR
CoreStatus = 79 (121)
Client-core communications error: ERROR 0x79
Deleting current work unit & continuing...

This error can occur with these lines preceding the error message above:

- Couldn't open work/wudata_xx.chk 
- Couldn't open work/wudata_xx.chk 
Couldn't open for writing 
Writing local files

In this case the error is caused by the core being unable to open, and therefore write-to, it's checkpoint file. Check that the permissions on the files are correct and that you didn't run out of space on the disk

This error also can be caused by memory errors which may be related to overclocking or wrong voltages or simply by bad RAM. If this error occurs when the core is just starting, there's a reasonable chance that it was an "unable to allocate" issue such as running out of space in the paging file or a memory fragmentation issue.

This error also can be caused by a WU which is corrupted during downloading.

Folding-community: p1488 r6 c9 g2, dies at 98% - Linux & error 79

7A

This appears in various forms but appears to be directly related to calculation errors detected by a GPU. Whether the errors are GPU hardware errors or are inherent in the WU is currently unknown. One known cause is switching user sessions in Windows whilst folding or the use of Microsoft's Remote Desktop (both of which disable and cause a reset the GPU, crashing the FahCore). Another known cause is allowing the Windows power-savings mode to "turn off the monitor" for certain driver versions. (This problem has been fixed in may driver versions.)

Run: exception thrown during GuardedRun
Run: exception thrown in GuardedRun -- Gromacs cannot continue further.
Going to send back what have done -- stepsTotalG=10000000
Work fraction=0.2249 steps=10000000.
logfile size=14814 infoLength=14814 edr=0 trr=23
- Writing 15350 bytes of core data to disk...
Done: 14838 -> 4465 (compressed to 30.0 percent)
... Done.
mdrun_gpu returned -1
Going to send back what have done.
- Writing 558 bytes of core data to disk...
... Done.
.
Folding@home Core Shutdown: UNSTABLE_MACHINE
CoreStatus = 7A (122)
mdrun_gpu returned 
NANs detected on GPU
Folding@home Core Shutdown: UNSTABLE_MACHINE
CoreStatus = 7A (122)
Sending work to server
Project: xxxx (Run xx, Clone xxx, Gen xx)
- Read packet limit of 540015616... Set to 524286976.
- Error: Could not get length of results file work/wuresults_01.dat
- Error: Could not read unit 01 file. Removing from queue.

7B

CoreStatus = 7B (123)
Client-core communications error: ERROR 0x7b
Deleting current work unit & continuing...
CoreStatus = 7B (123)
Sending work to server
Project: xxxx (Run xx, Clone xx, Gen xx)

The first 7B EUE shown is an unknown error from the fahlog of SMP clients. Because 0x7b is not defined in the F@h client or SMP fahcore, it is believed to be a Windows or SMPD/MPICH error code. Known causes are unstable systems from too much overclocking, changing network settings while the client is running, or stopping restarting the client. The error is also caused by Windows updates that need to restart your machine. See the List of Known Issues.

Folding-community: Error 0x7b
Folding-community: Kasson' post in Error 0x7b
Folding-community: List of known issues - SMP Windows Client

Here's one more possiblity:
(0X7B Problem with Windows SMP Client solved by MichaelO)
The 0X7B Error that many folks have gotten when using the Windows SMP client is not an error from the Folding Client, but rather from Windows itself according to the Folding Forums, Wiki, etc. I had increasingly started having this problem on one of my machines. This morning I was notified that my Acronis backups had failed in the Verify Step. Upon researching this error in the Acronis forums, it was suggested that the Acronis error was caused by a memory fault. I found this a little hard to swallow at first but after having 0X7B problems this morning after rebooting to try and solve my backup issue I decided to start changing the memory, and sure enough, once I replace my 2 x 1024 memory Pair the problems in both Acronis and the Folding Client disappeared.

Based on this, I would suggest that if you start getting errors on the folding client, you should first try testing your installed memory thoroughly. Windows XP is fairly forgiving about memory faults but Vista is more stringent, and the SMP Client, especially on a dual core is also not very forgiving once the memory starts to go bad.

The second 7B EUE shown is an example of how an EUE is supposed to work, where the error is reported back to Pande Group, and hopefully partial credit is given for that work unit listed.

7E

CoreStatus = 7E (126)
Client-core communications error: ERROR 0x7e
Folding@Home will go to sleep for 1 day as there have been 5 consecutive Cores executed which failed
to complete a work unit.
(To wake it up early, quit the application and restart it.)
If problems persist, please visit our website at http://folding.stanford.edu for help.
+ Sleeping...


This can be caused by incorrect file permissions. FahCore*.exe and mpiexec must be executable.

Folding-community: core not found; client going to sleep
Folding-community: How to fold in Freebsd
Folding-community: Linux FAH Error: cannot execute binary file

7F

CoreStatus = 7F (127)
Client-core communications error: ERROR 0x7f
Deleting current work unit & continuing... 

Folding-community: forgot the -local flag
Folding-community: (Unknown--redownloaded)
Folding-community: (Unknown--redownloaded)
=== cause is unknown, but the best guess is that mpiexec can't be started by the client ===
Mpiexec should already be in the local directory. If it's not, try adding it and report the results.

FF

CoreStatus = FF (255)
Sending work to server
Project: xxxx (Run x, Clone x, Gen x)
Trying to send all finished work units

FF is a generic segfault return code. This has many possible causes, but the first thing to check for is overclocking/overheating or a hardware failure in the memory subsystem.

89

CoreStatus = 89 (137)
Client-core communications error: ERROR 0x89
Deleting current work unit & continuing...

Triggered by the OS, probably due to insufficient memory

8B

CoreStatus = 8B (139)
Client-core communications error: ERROR 0x8B
Deleting current work unit & continuing...

Triggered by the OS, probably due to overclocking/overheating or a memory failure

0

Unix (Linux and Mac OSX) Specific

CoreStatus = 0 (0)
Client-core communications error: ERROR 0x0
Deleting current work unit & continuing...

Read more at Error 0x0 and 0x1.

1

Windows, Linux and OSX

CoreStatus = 1 (1)
Client-core communications error: ERROR 0x1
Deleting current work unit & continuing...

Read more at Error 0x0 and 0x1.
Folding-Community: Can't stop client without losing WU

FFFFFFF6 (-10) or 0xfffffff6

CoreStatus = FFFFFFF6 (-10) 
Client-core communications error: ERROR 0xfffffff6 
This is a sign of more serious problems, shutting down.

(Probably ATI only.) A -10 error right off the bat means that CAL didn't initialize correctly. It might mean that the board is not supported by the drivers. It also might mean that you need to install a new FAH client, since the CAL dlls are distributed with the client.

As for the 'late' -10 errors, that points to a CAL/Brook failure.

FFFFFFFF (-1) or 0xffffffff

CoreStatus = FFFFFFFF (-1) 
Client-core communications error: ERROR 0xffffffff 
Deleting current work unit & continuing...

Possibly a variation of errors 0 and 1. This error has also been common from the Windows SMP beta client when the MPICH service becomes unregistered (for unknown reason). Running the install.bat file again usually resolves the problem.

This error has also been seen with the ATI GPU2 client, when running OEM based drivers. Download and install the latest driver from ATI.

This error has also been seen with the NV GPU2 client, in the middle of a work unit. Apparently if the card crashes in the middle of a work unit, the same error will appear as if using the wrong driver.

C0000005

Windows and F@H GUI Specific

CoreStatus = C0000005 (-1073741819)
Client-core communications error: ERROR 0xc0000005
Deleting current work unit & continuing...

This is a known Windows memory error, while running the v5.x GUI client with the GUI open while finishing and uploading a work unit. Workarounds include updating the video driver (doesn't always help), keeping the GUI closed near the end of a work unit, or switching to the console client and using a 3rd party utility to see the pretty pictures and monitor the client's progress.

It can also be caused by faulty memory or a bad memory controller, so you should consider both possibilities.

C000008F

CoreStatus = C000008F (-1073741681)
Client-core communications error: ERROR 0xc000008f
Deleting current work unit & continuing...

C0000135

CoreStatus = C0000135 (-1073741515)
Client-core communications error: ERROR 0xc0000135
This is a sign of more serious problems, shutting down.

Error C0000135 is a Windows error which means it was unable to locate a component. It could be an installation error if the .dll files used by FAH are not where they are supposed to be. Also, you may have been infected by a virus, which was partially removed.

GPU Error Codes

mrdun_gpu returned -1

See CoreStatus 7A above.

mrdun_gpu returned 50

mdrun_gpu returned 50 
GPU was interrupted 
Folding@home Core Shutdown: INTERRUPTED 
CoreStatus = 66 (102) 
+ Shutdown requested by user. Exiting. 
Folding@Home Client Shutdown.

Two suggested causes for mdrun_gpu error 50 are unstable GPU hardware or loss of directX context.

Loss of directX context can be caused by CTRL-ALT-DEL (for task manager -- Right-click on the task bar instead), by screensavers that leave the DirectX environment, Hibernation and/or screen-lock procedures which terminate DirectX, and by some logout procedures such as from a remote console.

mdrun_gpu returned 99

mdrun_gpu returned 99 
GPU failed to initialize
Folding@home Core Shutdown: INTERRUPTED 
CoreStatus = 66 (102) 
+ Shutdown requested by user. Exiting. 
Folding@Home Client Shutdown.

Caused by incorrectly installed drivers, or multiple card present and configured incorrectly.

This most often occurs with multi-gpu setups, where the second card won't initialize. Usually this is due to the second card not being enabled in "display properties" preventing the GPU client from accessing the GPU. Other situations that cause this include having an unsupported card as the primary GPU in a multi-gpu system. The GPU client will only detect the primary card, and will refuse to initialize any cards present in the system.


mdrun_gpu returned 114

mdrun_gpu returned 114 
Going to send back what have done.
Folding@home Core Shutdown: EARLY_UNIT_END
CoreStatus = 72 (114)
Sending work to server

We need more info on this error. This has been seen in Windows Vista, when the GPU client was installed in the C:\Program Files directory tree. It has also been seen when installed elsewhere, but with UAC still enabled. First, move the client out of Program Files. Then either disable UAC, or set the fah executable and the fahcore to run as admin.

Personal tools