Full text of "Digital Archeology with Drive-Independent Data Recovery: Now, With More Drive Dependence!"

See other formats

Digital Archeology with Drive-Independent Data Recovery:
Now, With More Drive Dependence!

[ELEN E9002 Research Project Final Report - Summer 201 1]

Christopher Fenton
chf2 1 1 0@columbia.edu

Introduction

The goal of this project was to recover the data from an 80 Megabyte CDC 9877 disk pack
that potentially contains system software for a Cray-1 supercomputer that may be of some minor
historical interest. It is quite challenging to recover data from obsolete digital media for a variety of
reasons - functioning hardware can be difficult to come by as well as difficult to interface with
even if you have it, and magnetic media can degrade over time, especially if not stored in an
archival environment. The target media for this project is a disk pack containing three double-sided
14"-diameter platters containing data - five data surfaces and one f servo1 surface, which provides
alignment data for the other five surfaces.

HEAD LOADING ZONE

OUTER 3UARD &AND (REV EOT)
24 TRACKS OF POSITIVE 01 BITS

823 SERVO TRACKS

INNER GUARD BAM? {FWD EOT)
56 TRACKS OF NEGATIVE CI BfT$

Figure 1: How data is stored on the disk pack (from pg. 74 of [5])

The initial plan for this project was to attempt to build a custom magnetic sensing platform
that would allow me to recover the data without a working CDC 9762 disk drive. Research from the
University of Maryland [1] had suggested that this might be a feasible approach for data recovery.
Unfortunately, this scheme presented a number of difficulties which eventually proved
overwhelming.

The primarily challenge was the relatively high data density. The disk contains data that is
stored with a maximum linear density of 6000 bits per inch, on 823 concentric data tracks that are
2.5 mils wide. This means any particular bit might be a mere -50x4 microns wide - a fairly tiny
target that would require extreme precision to sense. A magnetic sensor was located [2] that actually
had adequate precision (an active sensing area of only 1x2 microns), but then the problem (which
was eventually determined to be insurmountable) became one of actually positioning the sensor.
Nearly all magnetic disk drives work by allowing the read/write sensor head to 'float1 above the
surface of the disk. If a disk is rotating quickly enough, a thin layer of air will 'stick' to the surface
of the platter. A magnetic read/write head in a disk drive effectively acts like a wing, floating above
this thin layer of air - allowing it to float a few microns above the surface of the disk, as well as
automatically adjust to minor variations in the surface height of the disk.

Unfortunately, the initial plan had been to use stepper motors and gears to rigidly position

the head over the platter, with the platter mounted to a turn table. The turn table could then spin
relatively slowly, while an analog-to-digital converter quickly sampled the data. It quickly became
clear that it would be impossible to vertically position the sensor close enough to the surface to
accurately sense bits while maintaining enough clearance to avoid collisions. Additionally, due to
the way that servo data for all five data surfaces is contained on a separate surface, both the servo
surface and the targeted data surface would need to be sensed simultaneously, which also meant
leaving the disk pack intact and working within incredibly constrained physical dimensions.

An Exercise in Disk Drive Rehabilitation

At this point in the project, it became obvious that a multi-head sensing assembly that was
engineered specifically to fflyf above the surface of the disk was really needed. This also meant that
the disk needed to be mounted securely and spun quite quickly (a few thousand RPM), and the
analog-to-digital sampling needed to be performed that much quicker. Given unlimited resources
and time, these are surmountable problems. Given the time and resource constraints of this project,
however, it meant that I needed to find a working CDC 9762 disk drive.

I contacted Gil Carrick, who is the Director of the fledgling Museum of Information
Technology at Arlington, in Arlington, TX, and whose website happened to mention that they had
had a few of these drives in storage. After some lengthy logistical discussion, Gil agreed to lend us
two CDC 9762 disk drives (in unknown condition), a CDC TB216-A Field Test Unit (FTU)
designed for testing and calibrating the drives, as well as a spare disk pack for testing. We also
acquired a Customer Engineering ("CE") Pack from John Bachellier1 with a company called MBI-
US A that specializes in vintage computer equipment. A CE pack (as well as the FTU) is needed to
align and calibrate the disk heads in the event that a head needs to be replaced, or the drive has
become unaligned somehow.

Figure 2: The two CDC 9762 Disk Drives shortly after arrival.

All of the equipment finally arrived on July 21st, allowing me to begin work. The first
setbacks occurred almost immediately. Both drives had been sitting in some form of storage for at
least two decades, and had acquired a fairly thorough coating of grime and/or filth. Disk drives are
extremely precise, complicated electromechanical systems that effectively can't tolerate any kind of
particulate contamination, so cleaning alone was going to be a challenge. Additionally, I had
initially been working under the assumption that I had full documentation (including electrical
schematics) for these drives [3], which would be an immense aid in debugging and repairing.
Unfortunately, it appears that CDC produced multiple versions of the drive under the "CDC 9762"
label. Both of the drives I was working with were manufactured in 1976, and appear to be CDCs
earliest version of the drive. The documentation I had available belonged to a later version of the
same drive being manufactured as late as 1985. Although the drive's mechanical parts were virtually

1 MBI-USA initially had a CE pack that was compatible with our disk drive that had been in their inventory for a
decade or more, but it was apparently purchased by a customer from the US Navy while I was in negotiations with
them. John Bachellier was able to contact a personal friend of his that happened to own one, and was able to sell it
to us.

identical between versions, the newer drives contained a nearly completely reworked electrical
subsystem (each drive is controlled by a 'logic cage,1 containing sixteen circuit boards connected
through a wire-wrapped backplane, as well as a handful of other boards scattered through the
machine).

Both drives, when powered on, immediately asserted their internal 'fault' signals. The
machine with the lowest number of hours on its lifetime counter (a mere 38,000 or so) was chosen
for serious cleaning and debugging. A week or so of cleaning ensued before any serious electrical
debugging was attempted. One of the largest problems encountered with the cleaning process was
that the entire case of the drive was lined with 1/4" thick noise canceling foam that had degraded
over time. Any contact with the foam would cause it to crumble into dust, something potentially
disastrous if it were to contaminate the disk cavity, and ultimately all of it needed to be carefully
removed. Additional problems were encountered from the large number of spiders that had taken up
residence inside the disk drive, as well as a 3"-diameter (thankfully abandoned) "mud dauber" wasp
nest [4] that had been constructed within the drive.

1 -' ^^Bl

1 (%J # !

c ■jfiEPjii^^^

1 '"^sfJBi

^1 BlJl 1

Figure 3: The spacious former home of a family of computer-savvy wasps

During the cleaning process, an internal status panel was located within the drive that indicated the
'fault' signal was being generated due to an internal voltage fault. The disk drives internally use +-
42V, +-20 V, +-12V, and +-5V, and the problem was eventually tracked down to a short circuit on the
+20V supply. Through process of elimination, the fault was determined to be on a logic card located
in slot 1 of the logic cage, although there were no obvious faults visible on the card. A replacement
card was taken from the 'spare' machine which cleared the fault and allowed the machine to
continue its boot process.

At this point, the FTU was setup and appeared to pass all of its internal diagnostics
(thankfully, documentation for the FTU was available). When the FTU was connected to the disk
drive, however, the drive remained unresponsive to querying. The same process was repeated with
the spare disk pack installed in the drive, following which the drive spun up the disk and, following
a 30 second delay, promptly burnt out a fuse on its +42 V power supply and re-asserted its internal
fault signal. Consulting the documentation available, it appeared that the primary use of the +42 V
supply was to drive the large voice coil responsible for positioning the head assembly. The head
assembly, requiring extreme positioning precision, is constrained to only move in one direction via
a system of bearings and guide rails. Some kind of lubricant appeared to have dried out and
congealed on the rails and bearings, effectively cementing the head assembly in place. When the
drive attempted to power the coil to load the heads as part of its initialization process, the coil was
unable to move and a power surge resulted, blowing the internal fuse. Extensive cleaning of the
rails and bearings ensued, but movement continues to be significantly stiffer than intended,
potentially causing positioning errors.

ACTUATOR
|[f.:'-l?T^:

UPPER
RAIL

LOWER
RAIL

NOTE:

4j£i AE.L HEADS AfiE NOT SHOWN

& CAR&IAGE ALSO HAS 1&WER

REAB BEABI.VGS NOT SHOWN,

Figure 4: The coil and head assembly for a similar model of drive (from pg. 49 of [5])

As a debugging feature, the coil and head assembly can be disconnected from the power
amplifier and manually positioned over the disk, so long as the disk is spinning faster than 3000
RPM (the minimum speed required to allow the heads to fly). This procedure was attempted with
the spare disk pack installed, and the drive actually asserted its freadyf light, which I believe means
it had successfully sensed valid servo data and completed its initialization process. Unfortunately,
within 30 seconds of the heads being loaded a high-pitched whining noise began to be emitted from
the drive, implying a potential head-to-disk contact was taking place. The drive was then powered
down and the disk pack and heads were carefully examined. Thorough examination revealed that
Head #4 on the drive (which reads the bottom surface of the lowest data platter) had 'crashed' into
the disk surface and scraped away a concentric ring of oxide material, permanently damaging the
platter. This is a good time to point out the advantages of not experimenting with your primary
source material when performing digital archeology experiments!

The offending read head was removed from the drive, carefully cleaned to remove the layer
of oxide that had been deposited on it, and set aside until further notice. At this point, the spare disk
pack was once again loaded into the drive (now with only four read heads) and spun up, and the
heads were then able to be successfully loaded without further incident.

Figure 5: Exposed read head following cleaning

Reconnecting the coil to the power amplifier and attempting to let the drive continue
initialization on its own, the drive would now progress to the point where it would spin up the disk
and attempt to seek out the first data track (Track 0), before quickly retracting the heads and re-
asserting its internal fault light. According to an initialization flow chart belonging to a different
drive model in the same family [5], which appears to be identical across machines thus far, the drive
appears to be reaching a 350 millisecond timeout without locking onto the start of the servo data
while attempting to perform a load seek1 operation. This could potentially be due to a number of
factors, but the current most likely explanations seem to be:

• Due to friction in the rail and bearing system, the coil can not move quickly enough to lock
onto the servo data before reaching its timeout.

• The disk and/or servo read head has suffered damage due to a head-to-disk contact, and is
unable to function properly.

• The magnetic servo data on the disk pack being used has degraded over time, and the signal
is not strong enough for the drive electronics to sense it properly

• Due to the large number of electrolytic capacitors used in the system, and their tendency to
'dry out' over time and suffer from somewhat unpredictable failure modes, the analog
sensing electronics could be behaving improperly (this is the likely cause of the +20V short
mentioned earlier).

Drastic Measures

With time rapidly running out on this project's end-of- summer deadline, it became apparent
that debugging the myriad potential failures of the disk drive's electronic control system would lead
to little but frustration and heartache. A more direct approach was needed - as much as possible of
the disk drive's electronics needed to be bypassed. As mentioned earlier, schematics were not
available for much of the drive's electrical subsystem, but as fate would have it, schematics were
available for the drive's internal analog "read amplifier" (a fairly simple circuit that amplifies the
weak magnetic signal coming directly from the read head sensor itself). If the read-head assembly
could be appropriately positioned, the low-level analog data could be recorded directly from the
disk and post-processed off-line in order to recover the underlying data.

To test this hypothesis, our poor test disk pack was once again installed and spun-up, and an
oscilloscope was used to observe the (remarkably intact!) analog data signal coming directly from
the read amplifier.

Figure 6: Analog data snapshot clearly showing MFM-encoding pattern

With confirmation that the amplifier was intact and working properly, a plan was formulated
to quickly implement the necessary positioning and data logging system, completely bypassing the

rest of the drive's problematic control system. For a more modern system, this would be a daunting
design challenge. Fortunately, 35 years of technical progress have provided a number of useful tools
for tackling such a problem quickly. A high-speed, Field Programmable Gate Array (FPGA)-based
data logging system, along with a high-precision stepper motor and controller were chosen to
provide ample (some would say overkill) margin.

Drive Control and Data Recording System

Positioning Control

Head Select

Stepper

Motor

Controller

R»~

■ Analog
Data

Comparator

Digital
Data

- FPGA

SRAM
Buffer

♦

— — — »

i i

Read Head 0

USB

Read Head 1

Computer

Read Head 2

Read Head 3

Figure 7: Proposed block diagram of drive control and recording system

Positioning Sub-System

The actual data on the disk is recorded with a track density of 400 tracks per inch. Feedback
from the disk's servo sensor allows the drive to know exactly when its sensors are centered over the
intended data track. Without the drive's control electronics working (including any feedback from
the servo mechanism), a completely 'open-loop' control system would be needed. A mechanism
driven by a stepper motor would be mounted directly behind the voice coil, and used to slowly 'step'
the entire coil-and-read-head-assembly forward, across the surface of the disk. If the linear
resolution of the positioning system is sufficiently high, one can guarantee (if somewhat
inefficiently) that they accurately sense each data track by severely oversampling.

The positioning system was built from a modified Makerbot Thing-o-Matic [6] Z-axis
positioning stage mounted on a custom, laser-cut acryllic frame. The frame was designed to mount
securely to the rear of the disk drive and sit snugly behind the voice coil. The stepper motor has a
resolution of 200 steps / revolution, while the acme lead-screw it is driving contains 13
threads/inch, and has four 'starts,' (which means that it requires 3.25 revolutions to advance the nut
one inch). This would only give us a linear resolution of 650 steps/inch, insufficient to guarantee
that we appropriately over-sample the data stored at 400 tracks/inch. Fortunately, the Makerbot
Industries stepper motor controller thoughtfully supports 1/8 'micro-stepping,' so we can effectively
increase the resolution of our motor by a factor of eight. This brings us to a total of 5200 steps/inch,
allowing us to record 13 samples per data track, and effectively guaranteeing we get at least one
accurate sample per track.

Figure 8: Positioning robot with stepper motor

Control and Data Logging Sub-System

The heart of the control and data-logging sub-system is a Digilent Nexys2 FPGA
development board. FPGAs allow one to rapidly create high-speed digital logic systems that enable
nano-second level of control. For each step of the positioning system, the output from each of the
four remaining sensors is fed through a high-speed comparator and eventually logged by a computer
for later analysis. The comparator acts as a 1-bit analog-to-digital converter - sufficient resolution
to decode the 'modified frequency modulation1 (MFM) technique used to encode the data. Each fbitf
flies under the magnetic sensor for approximately 103 nanoseconds (9.6 Megabits/second), so to
ensure accuracy, our FPGA records a sample every 12.5 nanoseconds (-80 Megabits/second, or
roughly 8X faster). The disk is nominally rotating at a speed of 3600 rotations-per-minute (RPM),
so to capture one complete data track, we need to record data for 16.67 milliseconds. Continuing
with our design-theme of including a healthy 'margin1 in our sampling, the FPGA buffers 67
milliseconds of data (roughly 4 revolutions) at a time into an on-board SRAM chip before
eventually sending it back to the control computer over a high-speed USB interface.

The FPGA is controlled via its USB interface from a driver written in C++ that is running on
the data-logging computer. The FPGA also contains a small amount of logic to advance the stepper
motor when directed by the computer.

Figure 9: The FPGA (1), analog comparator (2) and stepper motor controller (3)

Putting It All Together

With the positioning system and control and recording electronics completed, the entire
setup was mounted to the disk drive for testing.

c*g

r§0 iL

L *~«m

^^^^^^^^T^ x ^0^

*._.„

SB!

IhhuI!

^E*4m2?S

USB

§S2

I ■

_ n

B3SB

ISBB Jfc) 1

^aBfa3?^B"^T

HS^

[ >-

6 • »• «/ ••'«'' ^

VI''

^Bf

EpL

rVl

Figure 10: Final setup with electronics and positioning robot mounted

Figure 11: Positioning robot securely mounted behind voice coil

Figure 12: The moment of truth - the Cray-1 disk pack installed in the drive

An oscilloscope was used to verify that the analog data being read out from the disk was
being appropriately converted to digital form by the comparator, and the data being sampled by the
FPGA was tested and confirmed using a known data pattern.

UGOL STOP

\ O 808ml.

: v i v= y ii v u xl
ri_j~iJijJTjmjnjii]j

.JppC2)= 4.28U : Umax(i:i =-2800rrPJ

Figure 13: Analog data (yellow) versus inverted comparator output (blue)

With everything tested and working as intended, the system was first used to record all four
data surfaces of the Cray-1 disk pack accessible via the remaining read heads. At this point, the 5th
read head, which had been removed from the drive (and carefully cleaned) following the earlier
head crash, was re-installed in the drive. Typically, re-installing a read head is followed by a
delicate re-alignment procedure needed to ensure that the sensor is in perfect vertical alignment
with the servo head. Fortunately, our recording system ignores the servo data completely,
conveniently allowing us to forgo the alignment procedure (which would have also required
working drive electronics). With the now-clean read head reinstalled, the Cray-1 disk pack was re-
installed, prayers were issued to the disk drive gods, and the head assembly was loaded. The
cleaning procedure was apparently effective as the head loaded without incident, and the remaining
surface of the Cray-1 pack was successfully scanned. With the Cray-1 disk pack scanned, the test
disk pack was also scanned in a similarly uneventful manner (albeit at somewhat lower spatial

resolution for the sake of timeliness) in order to provide a set of comparison data. All told, over 34
Gigabytes of data was recorded from the Cray-1 disk pack, and 8.75 Gigabytes of data was recorded
from the test disk pack.

Future Work

With the target disk pack imaged with as high resolution as was practical, an enormous
amount of data was generated. To actually recover the data will likely be every bit as challenging as
getting the raw data off of the disk, and a great deal of work will need to be done in terms of signal
processing and analysis. At a basic level, the following steps will need to be performed:

For each 'sample,1 a single revolution of the disk will need to be isolated from within the 40

mS snapshot (perhaps merging the data from all four revolutions to increase accuracy).

All of the samples will need to be analyzed to determine which ones are properly 'centered1

over data tracks, and which ones contain noise.

Once a proper 'track1 has been extracted, the track needs to be analyzed to determine the

beginning and end of the track, as well as how many data 'sectors' each track contains.

With each track divided into proper sectors, the binary data 'payload' can be extracted from

the raw MFM-encoded data

With the actual data extracted from each sector, work will need to be done to extract the

underlying file system structure, as well as individual files.

Although the actual data analysis is beyond the scope of this paper, some very preliminary
analysis shows somewhat promising results. As a simple experiment, a series of 39 samples (-3
data tracks) was extracted from roughly the middle of the surface recorded by head #0 (steps 5000-
5038). Each sample was analyzed for long, contiguous streams of sampled l's or 0's, under the
assumption that valid data tracks might contain such features and noisier inter-track samples would
be less likely to contain them.

12 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839

Step
Figure 14: Occurrences of 24+ continuous l's (blue) and 0's (red) vs distance

This data was captured with a theoretical spatial resolution of 13 samples per data track, so
if the number of long sequences of l's or 0's is correlated (negatively or positively) with the sensor
being properly centered over the data track, we would expect to see a pattern recurring roughly ever
13 steps or so. Figure 13 clearly agrees with our expected result, implying that this might be a
useful metric for identifying properly 'centered' samples.

Conclusions

This project has been an interesting and somewhat promising foray into the nascent world of
digital archeology. The world is currently undergoing a rapid shift from easily-readable, long-
lasting, low-density archival media such as paper or microfilm to hyper-dense digital storage
mediums. As we hurdle towards an all-digital future, it is worth pausing for a moment to consider
some of the challenges associated with maintaining long-term access to digital media. Within the
past thirty five years, the CDC 9762 disk drive used for this project transitioned from cutting-edge
storage technology to vanishingly rare antique. Fortunately, the same technological forces that have
left this drive laughably obsolete have also given us the tools to allow a single engineer to
potentially overcome these challenges. Digital archeology as a field, for both historical and
forensics-related reasons, is likely to continue to grow in importance for the foreseeable future.

References

[1] C. Tse, C. Krafft, ID. Mayergoyz, and D.I. Mircea, "System and Method for High-Speed
Massive Magnetic Imaging on a Spin-Stand," US Patent 7,005,849 (2006).

[2] "TMR Magnetic Microsensor Probe." 2011 MicroMagnetics, Inc. 28 Aug. 2011.
<http://www.micromagnetics.com/product_page_stj030.html>

[3] "CDC Storage Module Drive - BK4XX / BK5XX Hardware Maintenance Manual." 2011
Bitsavers.org 27 Aug. 2011. <http://bitsavers.org/pdf/cdc/discs/smd/>

[4] "Mud Dauber - Wikipedia, the free encyclopedia." <http://en.wikipedia.org/wiki/Mud_dauber>

[5] "CDC Storage Module Drive - BK6XX / BK7XX General Description, Operation, Theory of
Operation, Discrete Component Circuits." 2011 Bitsavers.org. 27 Aug. 2011.

<http://bitsavers.org/pdf/cdc/discs/smd/83322320H_BK6xx_BK7xx_GeneralDescription_Jul80.pdf
>

[6] "MakerBot Thing-O-Matic 3D Printing Kit." 20 1 1 Makerbot Industries, LLC. 30 Aug. 20 1 1 .
<http://store.makerbot.com/makerbot-thing-o-matic.html>

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Full text of "Digital Archeology with Drive-Independent Data Recovery: Now, With More Drive Dependence!"

See other formats