TOPIC
Occasionally a Unix-style operating system may find itself shut down in such a manner that it is unable to write cached date or directory information back to disk. Under rare circumstances, one or more of the system's volumes may not have been properly flagged as "dirty." Because Unix-style operating systems fail to see that such a file system needs repair, autorecovery fsck fails to run before mounting the drive. Sometimes the damage may be such that the volume fails to mount, which brings the matter to the administrator's attention. However, if the volume mounts successfully and the system is put back in service without precautionary maintenance, continued writing may cause additional damage.
DISCUSSION Fortunately, these occurrences are quite rare, but the savvy user is well advised to learn in what circumstances such a situation is prone to occur, and intervene appropriately as a precaution. In the case of servers, it is always best to assume that any disorderly shutdown can lead to such a situation. Review of a great many support calls reveals that most of these situations stem from storm or other power outages, brown-outs, power supply failures, kicked power cords, shutdowns or restarts forced by an operator from front-panel buttons or keyboard, or unresponsiveness caused by hardware failure. All of these are very sudden events. A brownout or a spike caused by a nearby lightning strike seems often to be the cause of the more severe situations. If performed promptly, recovery of these unflagged dirty file systems can be routinely successful, but because the dirty flag isn't set, the system itself does not recognize the need for autorecovery. Therefore, the system administrator must know when to do routine manual maintenance for any of the improper shutdown situations described. It is especially important for servers because the data of many users may be at risk. A typical Unix-style system has one root filesystem that contains the system software, and may have one or more data volumes mounted within this root filesystem. If a system is believed to have suffered a traumatic shutdown, the first step in recovery is to startup partially to "single-user" mode, which gives a command line shell interface that can perform a file system check "fsck" on the system file system before it is mounted. For more information on this process, see the following article: Article: 24501 " Mac OS X Server: Why, When, and How to Run fsck for File System Maintenance. " Here are a few very important notes about using fsck in these situations: 1) If the default Unix-based partition is completely full, or almost completely full, it may be necessary to delete a few files before there will be enough scratch space on the disk for fsck to write recovered or temporary data. 2) If the filesystem isn't flagged "dirty," the usual fsck command believes the filesystem to be clean, and drops back to the command line without doing maintenance. To circumvent this and force the fsck utility to run, add the "-f" flag to the fsck command: fsck -y -f The root (boot) file system is assumed. 3) Any time that fsck makes any changes whatsoever to the drive, the final line before the prompt returns states: "***** FILE SYSTEM WAS MODIFIED *****." If your file system check ends with such a message, it is important that the check be run again until this message no longer appears. This is because repairs that allow the system access to parts of the file system that were unintelligible before may unearth previously undetected damage. 4) If repairs cannot be made, or if an error is presented regarding difficulties reading the file system's superblocks (the master table of pointers to directory elements which begins at block 8 of the volume), fsck can be directed to use one of the alternate copies of the superblocks. The first of these is block 16, on most Unix-style systems. Thus, to recover using a superblock mirror, the command becomes: fsck -y -f -b16 Note that because the original default superblocks at block 8 is re-written from the mirrored data at block 16, the message "***** FILE SYSTEM WAS MODIFIED *****" occurs. Therefore, it is necessary to run fsck again, but without the redirection to a superblock mirror. Only the first pass of fsck should be used with the -b flag, because once this is done, the default location has the same information as the mirror. 5) In the very, very unusual event that the superblocks at location 16 are also illegible, one may suspect that extensive damage has occurred, but the file system may still be salvageable if other superblock mirror locations are readable. To learn where other superblocks are located, use the "disk" command: disk /dev/rhd0a disk> scan Disk scans the partition for superblock mirror locations, listing each as it is found. To exit the disk utility, type "q" or "quit." For a summary of disk commands, type "?". For more information about the disk utility, which can initialize, partition, and query a disk in many ways, type the following command at the prompt: man disk 6) Whether the -b flag is used or not, if fsck is run three to five times and repairs are still being made, or especially if the same repairs are made over and over, one may conclude that either the data is truly damaged, or that there is a hardware fault. This may include worn media, damaged media, a frayed, cracked, or loose data cable, or a bad controller. If it is severe, 7) Have room on disk (df) 8) lost+found? 9) When fsck runs, it echoes to screen each problem it discovers, asking if you want it fixed, repaired, or salvaged, and automatically answering "y" if you entered the -y flag on the command. Errors such as cannot stat /dev/rhd0a or cannot determine file system type of /dev/rsd0a could mean that you have entered the command incorrectly (targeted a drive that doesn't exist, isn't powered up, or isn't a UFS or Mac OS Extended (HFS+) file system type. These errors may also mean exactly what they say: the disk may be damaged, or for other reasons may not be communicating properly. Cables, connectors, and SCSI termination (if applicable) should be reviewed carefully, especially if recent changes have been made to storage subsystems. 10) It is important to bear in mind that an improperly terminated SCSI system can behave properly for a while, only to become suddenly stubborn. Make no assumptions here unless you enjoy reformatting and reconfiguring systems. With internal SCSI devices, it is especially important to verify that termination jumpers are removed from all but the last device, because raw mechanisms are almost always terminated when received. 11) Once fsck runs without error, you may continue your use of the computer with one of the following commands: exit (the single-user shell) reboot (restarts and proceeds through a normal startup) shutdown (powers the computer off) For additional details about fsck, enter "man fsck" at the command prompt or see article 24501: " Mac OS X Server: Why, When, and How to Run fsck for File System Maintenance. " |
Document Information | |
Product Area: | Mac OS System Software |
Category: | Mac OS X Server |
Sub Category: | Troubleshooting |
Copyright © 2000 Apple Computer, Inc. All rights reserved.