home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
PC World Komputer 1996 May
/
PCW596.iso
/
wtest
/
polcom
/
doc
/
tn9508.asc
< prev
Wrap
Text File
|
1995-08-30
|
29KB
|
571 lines
Tech Note: TN9508
Date 23 August 1995
Revision: 1.0.0
Subject: Server Recovery Technique for NW 4.1 and Palindrome
4.0a3 Backup Products.
Scope: A technique for restoring a NW 4.1 server after a
complete hardware loss, using several floppy diskettes
And Palindrome's 4.0 backup software. NDS is presumed
to still be intact via replication.
It is also assumed that this is the server that the
Palindrome Backup Software is installed on.
While this technique is documented elsewhere, herein,
alongside some assumptions, the process is streamlined,
detailed and made maintainable.
WARNING: Successful implementation of this proceedure requires
extensive preparation and testing to polish the
technique, and customize it for your own network.
Do not attempt to implement this proceedure in a "live"
disaster recovery, without thouroughly testing it first.
ABSTRACT:
---------
Normally, if you suffer a total loss of a NW 4.1 server's hardware, in order
to restore it, you must first get the new hardware, install DOS on the C:
partition, then install NetWare, then install Palindrome, then restore
everything else from tape.
This alternative proceedure will allow you to skip the steps of reinstalling
NetWare and Palindrome, because most of that is already backed up to tape.
If you can just keep a couple of floppies with the bare minimum required
NetWare and Palindrome files, just to run the server, and get the tape
spinning, you can get the rest of the server back from tape. This is the
goal of this proceedure. To save you time in a disaster recovery scenario.
This, of course, assumes that you also have a backup system that is in good
working order, and that your tapes, spare hardware, and recovery floppies
are available at the time of the disaster. Good forsight and initial
planning can insure all of this.
Several large Palindrome customers have helped to develop this process, and
provided the testing environment and did much of the ground breaking work.
The one common factor at all sites was that there were no common factors.
Different NW 4.1 networks behave differently, and different customers have
different requirements and expectations of the outcome. The only way to
for Palindrome to provide this technique is to present it in a generalized
format, and to also instruct the reader in how to make it work for them.
A proceedure is documented in the manuals for 4.0a, however, for some
customers, the manual was too specific to meet their needs, for others it was
too general. . . go figure. This Tech Note is by no means the last word
on this technique, but it is a further attempt at refinement.
1.0 Overview:
-------------
There are probably many ways to do this technique. Some of these ways
probably do not involve using floppy disks. The only real requirement is,
since the assumption is we're starting with a server with NO software on it,
that we have a removable media device from which we can boot the machine.
Therefore, this technique could be done with a MO drive, possibly temporarily
connected to the server's SCSI bus, or this could be done quite easily with
a server equipped with a floptical drive.
There are two main scenarios that this proceedure is designed to help you
deal with: SCENARIO 1, is when you have a complete hardware loss, and need
to restore, not only your server volumes, but you need to restore the DOS
partition of the server as well. SCENARIO 2, is the more common example of
when your C: partition is still intact, but your SYS: volume and other
netware partitions have been lost due to corruption, or partial hardware
loss (failure of a separate disk array, or controller).
Methods that have been worked up in the past include using floppy disks,
with data compressed by PKZIP to reduce the amount of media changes.
The decompression steps could be avoided by simply using more disks.
This tech note assumes that the version of DOS underlying the NetWare OS
on the server is version 6.22, so access to DOS's disk compression software
is possible. Files are still reasonably compressed, but the copying
operations are simpler, and so are updating and maintenence proceedures.
By reading and understanding this tech note, the proceedures outlined herein
could easily be adapted to fit specific needs, or different techniques,
(ie. zipped compression rather than DOS 6.22 compression, or use of floptical
media, etc.)
There will be several main steps to this "recovery system". First, you must
determine the set of data that will be required to make your server operate
at the level you want. Suggestions are given here, but we can make no
assumptions in this tech note as to your LAN drivers, other protocols,
unique device drivers, etc. Then you should create the recovery sets,
consisting of three main portions: Boot diskette, DOS partition recovery set,
and server recovery set. You should then test implementation, first by
simulating loss of your server's SYS: volume, then by simulating loss of the
DOS partition as well (simulating a total failure of the system hard drive,
or complete hardware loss). Next, you should establish proceedures for
maintaining the system. . . manufacturing recovery sets, maintaining
up to date files on the recovery sets, maintaining up to date documentation
as steps in the proceedure change due to software changes, etc.
Keeping up with maintenence will guarantee that the "recovery system" will
not become obsolete over time, so you can be secure in that you can recover
whenever disaster strikes.
2.0 Preparation:
----------------
2.1.0 Hardware
For each server you intend to use this recovery proceedure on, you need to
make sure that you can duplicate the hardware configuration as closely as
possible, or at least be able to make allowances for any variations you
may encounter. (ie. in the event of a complete hardware loss, and the
exact hardware configuration is not available for duplication, be aware
that drivers may need to be changed, etc.)
Also make sure that spare hardware is on hand at the time of the recovery.
The middle of a server recovery proceedure is a bad time to find out that
your new SCSI card is bad, and a spare is not available.
2.1.1 Hardware Documentation
Document all adapter settings, device configurations, SCSI IDs for tape
drives, hard drives, etc, slot # that cards are mounted in, processor types,
etc. Anything that you would have to set up on the new hardware.
It's important that the backup hardware be IDENTICALLY configured when you
attempt this restore technique, because once you restore the Palindrome
System Control Database, the Palindrome software will talk to the Device
Configuration that was set up. The only way to change this device
configuration is from the client, and therefore, you would have to restore
the server software first. This would make the floppy recovery proceedure
obsolete, and that is why Hardware Documentation is vitally important to
the success of this process.
(By "backup hardware", I am referring to the SCSI Host Adapter Card, tape
drives, autochangers, identical firmware, identical SCSI drivers,
identical SCSI ID's and slot numbers).
2.1.2 Spare Reference Diskettes
Make backup copies of any hardware reference diskettes that are used for
EISA setup, etc. as well as configuration files, so that hardware can be
set up exactly the same as before.
2.2.0 Document NetWare Configuration
Document the server's NetWare partitions, volumes' names, sizes, block
sizes, compression, migration, and sub-allocation settings, name space
support, etc.
2.2.1 PALSDUMP
Obtain the program PALSDUMP.NLM from the Palindrome Installation Disk4,
in the \TOOLS directory. This file should be made available in your
server's \SYSTEM directory. Load PALSDUMP at the server console.
Make a print out of SYS:\PALSDUMP.DAT. This will be an important reference
during the recovery process. You should obtain an update to this document
each time you make a change to the server. This document records the
server's memory settings and configuration, autoexec and startup ncf files,
and modules loaded in memory. This information is critical for the recovery
process.
NOTE: All the information you have just gathered needs to be kept in a
secure but reachable location. This information should usually not change
often, so we recommend you duplicate it, and store copies offsite and on
site. When any of this information changes, update the documents, duplicate
then, and store them. They should also be kept with the standard proceedure
that should be followed during the recovery process, as well as this
documentation.
2.3.0 Recovery Diskettes
This proceedure rests on the assumption that your servers use MS DOS v 6.22,
and that the data on the diskettes use the DRVSPACE compression utility
that comes with DOS 6.22. As long as you have DRVSPACE installed on your
computer, when you make a boot disk, that boot disk will also have DRVSPACE
installed on it. When the machine boots to DOS 6.22 with DRVSPACE, it's
automatically running. Any compressed-volume floppies will automatically
be mounted as DRVSPACE volumes when they are first read. This means that
if you have compressed a floppy, a 3.5" High Density floppy should have
about 2.7 meg capacity (which could change, depending on the compressibility
of the contents).
2.3.1 Verifying DRVSPACE is "installed"
DOS 6.22 can be installed without DRVSPACE. If you cannot find the
DRVSPACE.EXE file in the DOS directory, then it's likely that DRVSPACE
wasn't installed, and you should obtain your DOS 6.22 disks, and re-run
SETUP.
Run DRVSPACE. The user interface can be either command line or menu
driven. If it runs, that means it's installed. If it's installed, then
instead of just IO.SYS and MSDOS.SYS being loaded when the machine boots,
DRVSPACE.BIN will also be loaded (before CONFIG.SYS is executed). This
installs the disk compression code, which recognizes compressed volume
files, and mounts them as a volume automatically.
2.3.2 Preparing floppies
Assuming that A: is a 3.5" HD floppy drive, you can take a formatted
floppy disk, and use the command:
DRVSPACE /COMPRESS A: /RESERVE=0
(Save yourself some time by doing this to a floppy BEFORE you have
copied any data to it. The process will run much faster.)
Prepare as many floppies as you think you will need. One floppy
should be uncompressed. This will be your emergency boot floppy.
2.3.3 Defining "recovery sets"
There will be three "recovery sets" that your floppies belong to.
In the first recovery scenario, (total HW loss), you will need all
three sets: Set 1, the boot floppy. Set 2, the DOS partition
recovery. Set 3, the Server recovery. The second recovery scenario
only requires Set 3, because you do not need to boot from a floppy,
or recover the DOS partition.
2.3.4 Managing Recovery Set Data
On another workstation, or server volume, you should make a \C directory.
Under this \C directory, you should create what will be essentially a
mirror of what exists on the C: partition of your server. Create
subdirectories for \DOS and \NWSERVER, and a \RECOVERY subdirectory.
\RECOVERY will contain the files for Set 3. Any other files directories
that you usually keep on your DOS partition should be kept under \C.
Because of the sensitive nature of some of this data, you should limit
access strictly using netware privileges.
When data on your DOS partition needs to be updated, (like drivers, etc.)
you do not need to down your server to copy data there. All you need to
do is run RCONSOLE, and select the menu option "Transfer Files to Server".
As your destination directory, enter C:\.
2.3.5 Creating "Recovery Set 1"
Format your uncompressed floppy with FORMAT A: /S. This will transfer
the operating system to that floppy. (alternately, you could use the
SYS C: A: command). You should also prepare a rudimentary set of
startup files (autoexec.bat and config.sys) to set up the DOS environment.
Later, the autoexec.bat could be set up to type a text file with
recovery instructions. . .
The boot disk should also have as many necessary tools and diagnostics
as you can cram onto it; FDISK.EXE, FORMAT.COM, ATTRIB.EXE, et. al.
Definately have XCOPY.EXE available. All of these files will get
copied to the server's C:\DOS directory during recovery.
You will also need to create a file called RECOV.BAT, and AERECOV.BAT
STAE.BAT, and CFGRECOV.SYS. These will become the server's startup files to
automate the recovery process. The CFGRECOV.SYS will be the normal
CONFIG.SYS for the server, and at the end, STAE.BAT will become the standard
AUTOEXEC.BAT for the server.
2.3.6 Creating "Recovery Set 2"
To copy the files for the DOS partition recovery set, you need to use
Windows' 3.1 File Manager. You simply display the \C directory in FM, and
tag the file in the top of the list, hold down the SHIFT key, and click on
the file at the bottom of the list. That will select ALL the files. Now,
click on the block, holding down the button, and drag them to the A: icon,
(with a compressed floppy in the drive). FM will copy as many files as
will fit, and when the floppy has no more room, it will prompt you to change
media, and continue the copy. You should copy any files from \C that are
NOT in the \RECOVERY tree. This is intended to restore your DOS partition
to it's normal status. DO NOT INCLUDE THE AUTOEXEC.BAT IN C:\ ! This
file needs to be replaced by the RECOV.BAT file on the BOOT disk, so the
file copies will be automated.
2.3.7 Creating "Recovery Set 3"
The files necessary for recovery set three include all the BARE MINIMUM
files needed to make your NW 4.1 server operate, and connect to the tape
drive. Also, the BARE MINIMUM files for Palindrome 4.0 required to perform
the restore of the REST of the data on SYS:. Under \RECOVERY, there should
be 3 directories. \SYSTEM, \LOGIN, \PAL. Under the \LOGIN directory,
there should be a \NLS directory.
On your server, you should have a \C\RECOVERY\SYSTEM directory to store these
files, (which, on your DOS partition will be stored in C:\RECOVERY\SYSTEM):
SERVER.MLS (server liscense file for the server you are restoring)
IPXS.NLM
SPXS.NLM
ROUTE.NLM (required?)
RSPX.NLM (optional)
REMOTE.NLM (optional)
EDIT.NLM (optional)
TIMESYNC.CFG
TIMESYNC.NLM
SMDR.NLM
TSA410.NLM
TLI.NLM
DS.NLM
AFTER311.NLM
STREAMS.NLM
MATHLIB.NLM
CLIB.NLM
DSAPI.NLM
DSI.NLM
INSTALL.NLM
NWSNUT.NLM
MSM.NLM
xxxxxx.LAN (your appropriate LAN driver)
xxxxxTSM.NLM (depends on network type, ETHERTSM, TOKENTSM, etc)
xxxxx.DSK (SCSI or disk driver, ie. AHA1740.DSK)
xxxxx.DSK (supp. SCSI, ie. ASPITRAN.DSK)
RECOV-?.NCF (automates console process, below)
(depending on your individual server's configuration, you may need other
files to run, ie. if you are running other protocols, or optical hardware,
TCPIP management software, etc. You need to test these configurations to
see what the bare minimum configuration is required (if any).)
Under \C\RECOVERY\LOGIN\NLS, you need 8 files.
*.001
Under \C\RECOVERY\PAL, you need 11 files.
PALREST.NLM
PALMEDIA.NLM
PALALDRV.NLM
PALSDRV.NLM
PAL.NLM
PALLIB.NLM
PALJSRVR.NLM
ARNANDX.RSF
ARNADAT.RSF
PALSHELL.NLM
PALFCOPY.NLM
Now all of these files can be similarly copied to their recovery disks
as described in step 2.3.6 using File Manager.
2.3.7 Managing recovery sets.
Now label each of the disks: Set 1, disk 1, etc. you can create a
"virtual directory" file of each of the disks, by running:
DIR A: /S >> RECOV.LST, which will generate a text file, RECOV.LST, listing
the contents of the disk. Do this for each disk, and you will have a master
record of all the files on all the recovery disks. Keep this file on your
boot floppy, and in your \C directory. Use this list to account for each
file, and also to track dates and versions.
2.4.0 Further Preparations.
Now go over the 3.0 section, below. Create documentation, that goes through
the process, step by step. Keep in mind the expertise level of the person
who will be executing the process. Very inexperienced people are capable of
carrying this out, as long as everything is spelled out, and it's documented
what to expect.
2.4.1 Testing
Test a few dry-runs on a test server first. If you intend for a non-technical
person to carry out the proceedure, have one do it now, and watch them. If
they cannot carry it out, or have ANY questions, then write them down, and
work them into the documented proceedure. Anything you can explain now, will
prevent them from balking in a trouble situation.
2.4.2 Final Preparations
Now that you have gathered all the information and materials together that
would be required in order to succesfully recover a server, you should compile
them into a "Recovery Kit". The disks, and documentation should all be
marked with a "last updated" date, duplicates should be made of everything.
(diskettes, documentation, etc.) The duplicates should be kept where they
can be easily accessed, and another copy might be kept off site in a vault,
etc, preferably with your off site backup tapes.
3.0 Execution:
--------------
Do not follow these steps unless you are testing this proceedure. This
proceedure assumes that specific documentation has been created by for the
environment you are in, and the person running the proceedure.
If you are under Scenario 1, total hardware loss, where you must restore the
DOS partition on the server, then start at the beginning. If you only need
to restore the NetWare partition(s), then start at step
3.0.1 OH NO!
Assuming you are starting from either a replacement box, or all the hard discs
of the server had to be replaced/formatted due to a crash, or hardware
failure.
3.0.2 Retrieve Your "Recovery Kit"
All of the documentation, and floppy disks that you created and prepared in
the preceeding portions of this document comprise a recovery kit.
3.1.0 Start Recovery
Power on the server with the BOOT floppy in the A: drive of the server. The
operating system will load. Enter the correct time and date.
Run the FDISK program to set up the partitions
of the C: drive. Check the documentation created in 2.2.0 for the sizes of
the partitions. Now, format the C: partition. These steps are probably the
most complicated ones, and therefore need to be documented very carefully.
3.1.1 Make C: Bootable (DOS 6.22)
Once the C: partition is established and formatted, you can run the RECOV.BAT
file on the A: drive. RECOV.BAT will first run the SYS command, and copy two
files to the C: drive, which are then renamed to AUTOEXEC.BAT and CONFIG.SYS.
Finally, it instructs the user to remove the boot floppy, and reboot the
machine.
RECOV.BAT
SYS A: C:
(makes C: bootable, transfers IO.SYS, MSDOS.SYS, DRVSPACE.BIN)
COPY A:\AERECOV.BAT C:\AUTOEXEC.BAT
COPY A:\CFGRECOV.SYS C:\CONFIG.SYS
MD C:\DOS
COPY A:\*.* C:\DOS
ECHO Now remove the floppy disk from the drive, and power the
ECHO machine down, wait 10 seconds, and turn it back on.
3.1.2 Begin Recovery Set 2, 3
When the server boots, the AUTOEXEC.BAT will have commands for restoring the
C: partition from the floppy disks:
PATH C:\;C:\DOS
PROMPT $P$G
VERIFY ON
ECHO Please put Recovery Disk; Set 2, Disk 1 into drive A:. . .
PAUSE
XCOPY A:\*.* C: /s /y /v
ECHO Please put Recovery Disk; Set 2, Disk 2 into drive A: . . .
PAUSE
XCOPY A:\*.* C: /s /y /v
.
.
.
continue for as many disks as are in the recovery sets 2 and 3.
COPY C:\DOS\STAE.BAT C:\AUTOEXEC.BAT
This, obviously, will copy all the files from the floppies down to their
home on the C: drive. This will put the C: partition back into a
recoverable state. If you are testing Scenario 2, then this is where you
pick up. . .
3.2.0 Creating the NetWare Partition
From C:\NWSERVER, now, you should type SERVER.EXE. This will start the
NetWare OS. Refer to the PALSDUMP output in the Recovery Kit documentation.
When all the NW 4.1 patches load, it will then prompt you to enter the
file server name, and internal network address. The printout of the
AUTOEXEC.NCF will supply this information. Now type:
LOAD C:\RECOVERY\SYSTEM\INSTALL.NLM
When the INSTALL screen comes up, you need to manually:
from the main menu;
select Install Options, from that menu;
select Disk Options, from that menu;
Modify the disk partitions and re-create the former NetWare disk
partitions according to the Recovery Kit documentation.
<ESC> escape out to the main menu;
choose Install Options, from that menu;
select Volume Options, from there;
re-create the former volumes, again, according to the information in
your Recovery Kit.
<ESC> escape back to the main menu.
select Install Options, from there;
choose License Option, and license the server;
press F3 to select a different path,
type in C:\RECOVERY\SYSTEM
<ALT><TAB> to the server console.
3.3.0 Installing NDS
Now Type:
C:\RECOVERY\SYSTEM\RECOV-1.NCF
which does the following:
SEARCH ADD C:\RECOVERY\SYSTEM
SEARCH ADD C:\RECOVERY\PAL
LOAD CLIB
LOAD PALFCOPY C:\RECOVERY\*.* SYS:\ /s
LOAD (your LAN driver, statement copied from AUTOEXEC.NCF from
palsdump output in preparation phase.)
BIND ( again, copy this statement from your documented
AUTOEXEC.NCF)
LOAD DS
LOAD TSA410
LOAD (SCSI driver, if it wasn't loaded in the STARTUP.NCF file)
Now go to a workstation and log into NDS, and run NETADMIN or NWADMIN.
Find the server, and delete it's volume objects.
At the server console, type TIME, and see if time is synchronized on
the network. If it is, then you can proceed to install NDS onto the
server:
<ALT><ESC> back to the install screen.
from Installation Options menu,
select Directory Options / Install Directory Services On To This
server.
select the original tree in which the server resided, then log
in and re-establish the server into the tree with the same
context it had before.
Note. Since the server object was already there, you will
get an error message: "An NCP server object . . . already
exists in the context . . . Press <enter> to continue."
Install-4.1-389
select the correct time zone
Verify the time parameters, and press <F10> when they are
correct.
Login
Now wait while the server displays "Scanning for Directory
objects. . .". This may take a while.
Make sure the server is in the correct context on the next
screen that comes up.
press <F10>.
You will be asked "Savc Directory Information and Continue?"
answer YES. press <enter>.
You will see the message "Delete the existing NCP server
object and continue?" chose YES>
You will see: "Installing Directory Services"
When that finishes, you will see "This server was installed without
a replica" Hit <enter>.
When you see a message: "Directory Services has been successfully
installed" hit <enter>.
After you see the message "Reading the Disks for volume
information" hit <enter>.
Now Directory Services Installation is finished, now you need to
install the mounted volumes into NDS.
Return to the Installtion Options menu, choose Directory Options.
choose Upgrade Mounted Volumes into the Directory. Make
sure you install all of the volumes.
now exit the INSTALL program.
3.4.0 Restoring Palindrome System Control Database
Make sure that the most RECENT MANAGED BACKUP tape is mounted in the
tape drive. How this is accomplished depends on the exact type of
tape unit you have. If the tape you have is NOT the most recent,
you will be prompted before the restore actually occurs, and you
will have the opportunity to change the tape if it's the incorrect
one.
Now type: SEARCH DEL 2
SEARCH ADD SYS:\PAL
LOAD PAL
this will take a while to load.
When the PAL screen comes up, select from the main menu:
RECOVER SYSTEM CONTROL DATABASE.
Use the TAB key to select the volume and path of the \PAL
installation directory. Volume: server_name/SYS:
Path: \PAL
Using the TAB key, go down to the auto login user and password,
use "full named context" (ie. .CN=archivist.OU=sysengr.O=Palindrome)
type in the password, and user name.
Select START RECOVERY
You will get an APSC-8 warning, press <enter> to continue.
This may take a while to access the tape.
When this is done, the Palindrome System Control Databases will be
recovered. You can tell when it's finished when the STATE says
"closing media" and does not change for some time. You escape from
that screen by pressing <enter>.
3.5.0 Recovering SYS volume
From the PAL.NLM main menu, select BACKUP or RESTORE RESOURCES.
Press the TAB key to highlight SYS, and press <enter>.
Select RESTORE.
The rest of the data should come back from tape, first the file
history database, then the directories and trustees, then the
volume's data.
3.5.1 Recovering the AUTOEXEC.NCF file.
LOAD PALSHELL /OP=RO server_name/SYS: /F+\SYSTEM AUTOEXEC.NCF /PA /Q
3.5.2 Recovering other resources
Process in 3.5.0 can be done for each other resource that needs to
be restored.
4.0 Maintenance:
----------------
Whenver files need to be updated, or the process needs to be updated
because of some change in hardware configuration or software
configuration, you should review a master copy of the documentation,
make all changes necessary, and re-make the copies, dating and revising
them. Throw out the copies of the old proceedure in their various
locations, and replace them with the new copies. Disks can similarly
be updated, using Windows' File Manager. However, if the number of
disks must change, remember also to change the AERECOV.BAT file to
reflect the changes, and to update all the floppy sets, AND use
RCONSOLE to update the contents of the C: partition. (part of this
proceedure, and documentation should be an update checklist).
All the tasks that could be performed automatically have been automated,
when possible. Operations that required interaction with a user
interface were obviously not automated, however, we've "heard" of some
automatic keystroke-faking tools out there that could theoretically
be used to automate even this.
In conclusion, this is a very difficult task to document, because of
the different environments that exist, and vastly different hardware.
This is an attempt at streamlining the technique as much as technically
feasable. Of course, some degree of expirimentation and customization
is expected. Also, the speed at which this technique can be performed
at depends a lot on how fast Directory Services can replicate changes
across your network. On some networks, that can take time for updates
to synchronize, therefore, there are technical limits to the feasability
of this proceedure. Only testing, dry runs, and drills can work out
the bugs until you have a smoothly running proceedure that can get your
network back up in an emergency.