(logo)
(navigation image)
Home Wayback Machine | Archive-It | Blog | Heritrix

Search: Advanced Search

Anonymous User (login or join us)Upload
 Reference Links
Researcher access is currently not available pending redesign. This material has been retained for reference and was current information as of late 2002.

Data Available
Tools Available
Example Projects
Tool Documentation
Example Code
CDX and DAT Legend

The default first line of a CDX file is :
CDX A b e a m s c k r V v D d g M n


The letters use in dat files and cdx files are as follows :


A canonized url
B news group
C rulespace category ***
D compressed dat file offset
F canonized frame
G multi-columm language description (* soon)
H canonized host
I canonized image
J canonized jump point
K Some weird FBIS what's changed kinda thing
L canonized link
M meta tags (AIF) *
N massaged url
P canonized path
Q language string
R canonized redirect
U uniqness ***
V compressed arc file offset *
X canonized url in other href tages
Y canonized url in other src tags
Z canonized url found in script
a original url **
b date **
c old style checksum *
d uncompressed dat file offset
e IP **
f frame *
g file name
h original host
i image *
j original jump point
k new style checksum *
l link *
m mime type of original document *
n arc document length *
o port
p original path
r redirect *
s response code *
t title *
v uncompressed arc file offset *
x url in other href tages *
y url in other src tags *
z url found in script *
# comment

* in alexa-made dat file
** in alexa-made dat file meta-data line
*** future data


Terms of Use (10 Mar 2001)