124 PSYCHOLOGY IN HUMAN AFFAIRS of norms, or answer frequencies, are available). To achieve objec- tivity, test items are usually stated in some manner that requires no writing. For example, true-false items are statements that the testee recognizes and labels as being either true or false. (Columbus dis- covered America in 1607.—True, False) Multiple choice items are statements that may be completed by any one of a number of alternate answers. The testee must recognize and choose the right one. (The sum of 5 and 8 is—10, 17, 23, 13, 14.) Completion items are those statements that are not finished and the testee must fill in the missing concepts. (Columbus discovered America in------------) Matching items are usually arranged in two lists (viz., a list of dates and a list of battles) and the testee must connect the items in one list with their mates in the other list. Tests are standardized by being given to a large number of people. Then, tables of frequencies are constructed so that a score' can be compared to the scores made by others. In fact any test, whether published or homemade, must have some sort of norms or the scores are meaningless. This is likewise true with data in any field. A child weighs 36 Ib. This is meaningless unless we know how much other children of his age and height weigh. To standardize a test is to find out what scores other people make on it. A high score is high only when compared with the scores made by other people. Also, published tests usually have satisfactory (or, at least, known) reliability and validity. A high reliability means that a test can be repeated with the-same results. If a pupil makes a high score on a test one day and a low score on the same test the next day, it is then unreliable. The reliability of a test may be computed in various ways. It may be given to the same pupils on different occasions. However, so many other variables (physical condition, intervening experiences, etc.) may cause differences in scores that it is unwise to condemn, a test on this basis. It is better to compare the score made on a part of the test, usually the even-numbered items, with the score made on another part of the test, the odd-numbered items. If half the test gives relatively the same score as the other half, it is judged to be reliable. Some tests have two forms, which are equated, and either form may be used. When pupils make .the same score oa both forms, taken, at different times, the test is then said to be reliable. The validity of a test refers to its worth or value in doing what it is supposed to do. If a history test measures historical knowledge, it is then said to be valid. The validity of a test is usually determined by comparing it with some other (presumably better) measure of the same characteristic. If the scares made on a history test compare