( \ / J / M U D P I E no. 3 r O - J V A , 7/ / \ ,r i/ / Museum and University Data, Program, and Information Exchange SYMPOSIUM ON INFORMATION PROBLEMS IN NATURAL SCIENCES This symposium was held December 18-20, in Mexico City, under the sponsor¬ ship of the Universidad Nacional de Mexico and the Smithsonian Institution. It brought together about 60 people with mutual interest in methods of keeping up with the large amounts of information available and accumulating. The focus on various methods of coping with data pertaining to large collections, and work in progress was discussed. The storage of information by museums, such as catalogue records, and the rapid retrieval of this information on demand, re¬ ceived considerable attention. Several papers covered attempts to make bibliographic materials more easy to manipulate. Several methods of preparing, handling, and writing systematic keys were described, as was work with storage and retrieval of data derived from specific research projects. Conclusions drawn by one participant [j. A. Peters] after three days of papers and discussion include the following (it should be clear that these conclusions probably reflect my biases, and cannot be interpreted as representing the results of the symposium in any way): 1. There are two distinct approaches to storage and retrieval of data: the first involves information pertinent to a specific problem, but of such a magnitude that it is difficult if not impossible to handle it without electronic methods; the second involves information that is not immedi¬ ately in demand, and not necessarily pertinent to any existing problem, but that represents the total body of information about a specific group of objects, with constant demand for changing parts of that body, although its total is never required at any one time. The first approach is used by an individual or group in a specific research project, and the stored information, finally summarized in a single or series of publications, might as well be discarded upon completion. The second is used by many different workers interested in a multitude of projects, is constantly growing as a consequence of additional collections of objects, and its value increases both with use and with age. 2. It is now completely feasible to perform both tasks, in the sense that computers have the capacity to handle the masses of information. Use of computer methods seem to be equally practical and wise, in fact almost mandatory, in the first approach mentioned above. There was, however, a considerable division of opinion concerning the practicality and wisdom of computer use in the second approach. It seems clear that it can be practical, in the sense that money has been made available for several such projects currently in progress, although it has so far invariably been funded by sources extraneous to the budget of the museum or univer¬ sity concerned. Representatives of smaller museums repeatedly pointed out that it cannot be practical for their institutions, because their budgets will never be large enough. Even were one to assume complete * I T I 1 (V » 1 n *r vV. . ... ' ■ . 2 * practicality of such automation, however, the challenges as to its wisdom persist. These challenges include problems of reliability of taxonomic determinations, accuracy of data in the catalogues, fre¬ quency of demand by users, the non-repetiLive nature of the data, and others. It is clear that these challenges will not be answered until the supporters of the second approach have supplied acceptable answers to some of the day-to-day queries concerning the collections of a museum, and have given the scientific community accurate, detailed, and non-ambiguous data on costs. 3. It is apparent that a considerable amount of duplication of effort is taking place in the area of storage and retrieval of museum data. Many different institutions or individual workers are preparing methods for encoding locality data, scientific names, bibliographic information, and in many cases the methods will not be compatible with each other in the future. Worse still, errors are made again and again that have been experienced and solved by earlier workers, but require repeated laborious solutions by the next generation. Close coordination of work on mutual problems is mandatory. In the hopes that the abstracts of the program might prove of interest and perhaps of use to the readers of MUDPIE, they are reproduced in part on the following pages. The rest will be included with a later issue. 4 * Smithsonian Institution January 10, 1968 2. "AN INFORMATION STORAGE AND RETRIEVAL SYSTEM FOR BIOLOGICAL AND GEOLOGICAL DATA-DESIGN CONSIDERATIONS" Reginald Creighton Smithsonian Institution Washington, D.C. The system design attempts to achieve optimum processing economy while permitting the system user maximum freedom of activity regarding data prepara tion, query expression and output presentation. Due to the disparity of data parameters involved (birds, crustecia, and rocks) a completely genera¬ lized software system is untenable# A technique for retrieval program generation was devised whe¬ rein a custom program, tailored to the specific queries submitted, and to the specific data format involved, is created on each processing cycle. Despite the diversity of data parameters en¬ compassed, four major elements of similarity have been identified. All participants wish to access data by taxa name, by geographical or political location; by catalog number; and in certain cases by stratigraphic designators. The data base comprised of specimen and bibliographic data is maintained in a dynamic phylogenetic sequence and is accessed by four indices: phylogenetic, geo¬ political-, catalog number (intervals) and stra¬ tigraphic, Other basic tenets of the system include: 1. All consistently formated data is searchable. 2. No abbreviation or user coding is imposed by the system. The computer translates natural language input into code and re¬ translates output. 3. User training is minimized. 4. A machine independent language is used (to enable exchange of software as well as of dat a) . The system is now being improved and implemented by the Information Systems Division, Smithsonian Insti tution, Washington, D.C. KEYWORD ELEMENTS: Retrieval program generation,dynamic phylogeny,hu¬ man factors of query,machine independent query,generali zed data collection,locational heirarchy,common denomina tor indices,synonomy rectification,heirarchical indices, biological numericlature, query batch optimization. 3. "PREPARACION Y MANIPULACION DE CLAVES SISTEMAT1CAS UTILIZANDO COMPUTADORAS DE TIEMPO COMPARTIDO" James A. Peters Smithsonian Institution Washington, D.C. Se ha construldo una clave para identificacion de los gene- ros de culebras neotrdpicas utilizando una computadora de tiempo compartido y el lenguaje llamado "BASIC". La simplicidad de es- te lenguaje ha permitido a la computadora usar solamente catorce caracterlsticas, las cuales proveen toda la informacidn necesaria para las identificaciones. Se puaden inclulr otras caracterlsti¬ cas dentro del programa cuando hay un genera unico, y la computa¬ dora escribe la informacidn adicional en ingles, con los nombres genericos pertinentes. Cuando hay un genero nuevo o uno no inclul do anteriormente en la clave, se puede incluir con gran facilidad. Necesita nada mas que desafiar a la computadora con las catorce caracterlsticas del nuevo genero, porque esta aprovecha un norr.bre generico antiguo. Se puede comparar esta con el genero nuevo, y distinguir entre los das para facilitar la introduccidn del nuevo. Todos las generos conocidos hasta el momento de Guatemala hasta Argentina estan en la clave. Otros especialistas que quieran usar la clave, pueden pedir directamente una copia que sea aceptada por la computadora, o, si no tienen una computadora de tiempo comparti do, alias pueden mandar una lista de los ejemplares con las cator¬ ce caracterlsticas para desafiar a nuestra computadora. . . . 4. "A COMPUTERIZED INFORMATION RETRIEVAL SYSTEM FOR TAXONOMY" D. J. Rogers University of Colorado Boulder, Colorado 5, "COMPUTER APPLICATIONS IN TAXONOMIC LITERATURE* Gilbert S. Daniels Hunt Botanical Library Pittsburgh^ Pa. Bibliographical research at the Hunt Botanical Library on the early literature of botany has been organized to make maximum use of computer data handling. Lengthy citations for more than 25,OCX) volumes are stored on computer tapes end are updated and recollated ©s further research is carried out on each volume, Date on additional volumes can be merged ©s it becomes available. Sorting can be done on any category of information contained in the individual volume records with hard copy produced by the computer in a form suitable for direct publication by photo offset. Ancillary research projects not conceived of initially ©re greatly facilitated by the versatile access and sorting provided by the computer* ft Further applications end implications to taxonomic literature studies are discussed in this paper* 6. "SOMc ASPECTS OF ON LINE INFORMATION RETRIEVAL LANGUAGES'* G.K, Hutchinson Computer Center Texas Technological College Lubbock. Texas. » * Information retrieval techniques of interest to museums include: l) KWIC, 2) Current Awareness, 3) Qwick Qwery, ^and 4) On Line Dialogues. The on-line search is potentially the most powerful, but requires considerable effort end thoughtful consideration as to its language design, hardware configuration end, perhaps more important, the matching of the language to the hardware vie the software implementation. Retrieval techniques will be briefly reviewed. Implementation considerations and evaluation criteria will be discussed. ?. "TELECOMMUNICATION AND ON LINE ACCESS TO COMPUTERS" Nicholes J. Suszynski Smithsonian Institution Washington, D.C. Direct access to the computer from the deskside of a user who has m need to process his data on a computer is highly desirable. Several manufacturers moke remote terminals for the computers which facilitate such interaction. Video data terminals (basically a television tube with a typewriter keyboard) connected to & computer by way of a telephone line la by far the best method devised as yet for inputting data to the computer and for making requests against the computer based data bank. Video data terminals ere also considerably faster and generally more accurate than other meens of inputting data (keypunches, verifiers etc.). If this is the case, why then arsn ? t they used more widely?. There ©re several reasons for scarcity of remote usage end the two mejor ones ©re discussed in depth. Despite the pressure from the computer industry and the users, the common carrier monopoly is slow in improving its tele¬ communication capability. The common carriers 10 ♦ 10 cycle (ten years In development followed by ten years of manufacturing as compared to computer industry of 5*5) will continue into the forseeable future. This means that current characteristics of the communication channel will change very slowly (noise level, frequency response, capacity of channel etc.]/ ‘ +** * - ^ Also? since 'the users of computer data banks will be separate., 'jcsa .. their computers by tens and hundreds and often thousands of miles, economic impact of the telecommunication costs is crucial. The technology of data transmission is discussed only briefly end main concentration is on economic aspects of it, showing with slides and charts costs of various lines in -• -• relation to the distance® 9~ • • The second major obstacle to remote multi-processing is the scarcity the computer systems (hardware end software) capable of economic processing n the foreground of remote messages, and in the babkground of batch processing the normal load of the data center (such things as updating of the files with most current information etc*}. There are isolated exceptions to this, and they ere in the area of dedicated systems; systems that ers offered for a special mission such as airline reservation systems, or a demand deposit accounting system in the banking industry, or general purpose systems having few options with many limitations which normally are offered in a large city in order to capitalize on the relatively low cost of the telephone usage. These general purpose systems are usually offered by the - service bureaus., * The paper will strive to show that nation-wide computer utility with direct, access to various data barks, although technologically feasible, ia not economical as yet. That the biggest single cost, in addition to the initial data conversion needed to establish the date bank, is the cost of data transmission. That because of it, these networks will form at first in th® large cities when? the cost of a local telephone cell is negligible. In the immediate future (2 to 3 years), rather advanced end economical time—shared systems will be available within cities* Their use outside the city will depend largely on the size of the telephone bill that will come with it. An overview of currently available commercial systems will be presented. ✓ * * ▼ ' • 8 . "THE OKLAHOMA MUSEUMS INVENTORY PROJECT" A. F. Ricciardelli University of Oklahoma Norman, Oklahoma During the period 1965-67, a pilot study utilizing the museums in the State of Oklahoma was conducted to devise a workable system for inventorying ethnological collections. It is hoped that such a system may be applicable on a nation wide basis and e union index of ethnological materials created. Specifically, this study investigated such matters as the time and cost required to inventory ethnological specimens to* determine whether an inventory procedure could be devised > which would insure reliability end consistency in the data collected, and to determine the most efficient means for storage end retrieval of that date, including automatic data processing systems. Extensive data has been collected on these topics and various conclusions end recommendations ere presented. As a by¬ product of this study, a workable file containing the data for * the Oklahoma museums now exists at the Stovall Museum of the . University of Oklahoma end is available for use by scholars. 9. "A GENERALIZED APPROACH TO INFORMATION RETRIEVAL THE BIRD AND TAXIR SYSTEMS” Robert Brill University of Colorado Boulder, Colorado 10. "A WORLD PLANT GERM PLASM RECORD SYSTEM" C. F. Konsek, Washington State University K. W. Finlay, University of Adelaide, Australia B. SigurbjOrnsson, International Atomic Energy Agency, Vienna, Austria and G. Delhove, FAO, Roma Italia. Studies are now wall advanced on an internationally coordinated system adapted for the storage, retrieval end processing of records end data by computers. These studies are being conducted by a * working group on international standardization in crop research data recording. This working group was established on the recommendation of e group of scientists essembled at IAEA headquarters in Vienna in December, 1965, at the invitation of the Directors General of FAO end IAEA. The working group was given the task of investigating the possibilities for standardization of data recording procedures and developing a model system for records on accessions of food, feed and fiber plants. •* A series of studies on a preliminary trial system fcr storage end retrieval of records by computer were successfully completed in february, 1567 by the coordinator and colleagues with the guidance and assistance of other working group members end the IAEA computer staff. A number of preliminary recommendations on data recording procedures were prepared for review. The FAC-IAEA V.'crxing Group on Standardization in Crop Research Cate Recording will also participate in the International Biological Program (IBP) Section on Germ Plasm Exploration, Maintenance end Evaluation. As a part of this program, seme standardized procedures have been proposed for records on ns* germ plasm obtained from, plant explorations, from induced mutations end from hybridization or selection programs as well as for records on existing collections of germ plasm being maintained at stations over the world. A model for the organization and integration of internationally standardized records has been developed end it is proposed to establish e coordinated world plant germ plasm record system, including the central record file at FAO in Rome. Wide scale field tests of the proposed system are being planned. Holders of a number of the major world collections of wheat germ plasm stocks will be asked to participate in this test. The system will also be used in plant exploration and adaptation projects of IBP and in programs coordinated by FAQ and FAQ/lAEA. ' 11. IS PRQYECT0 PILOTO DE RECUPERACION AUTOMATICA DE INFORMACIOf DEL HER3ARIQ NACIONAL DE LA L s Alonso g L.Scheinvar y A. Gomez-Fcmpa, inoLituto de Biologle U, N, A. M. f Mexico De presents el proyecto piloto de recuperacion automatics de la informacion, realizado en el Herbario Nacional del Instituto de Biologla. Se tomaron como base los 3,500 ejemplares de Pteridcphyta. Se describe el formato y codificscion seguidos con ejemplifica — cion de los sistemas que fueron utilizados, El programs fue elaborado en lenguaje SPAR y se usd la com — putadora Bsndix C D C G 20 del Centro de Calculo Electronico da la U. N. A. M. Se presenta la codificacion de las pregu~tss y algunas res— puestas seleccionadas. Se mencionan algunas aplicaciones practises para el funciona miento de un herbaria y su major utilizacidn. Papers 12 and 13 were not abstracted nor presented#