While designing a test, it is essential for it to be valid. Validity can be defined in different ways.
“It is the extent to which a test measures what it is supposed to measure”.
“Validity is the subjective judgment made on the basis of experience and empirical indicators”.
Validity is “the agreement between test score or measure and the quality it is believed to measure”. (Kaplan and Saccuzzo 2001)
In simple words we can say that validity refers to the meaningfulness of the test. This meaningfulness can work at two levels. At the level of the design, the design of the test should be according to the requirements. At the level of context, the test should be used for the specific purpose for which it is designed. We cannot use a mathematics test to test the writing skills of the learner, it is against the context, the test is used other than its context and will not be valid.
There are various kinds and aspects of validity. Here they are tried to discuss according to their relevancy and importance.
1. Face Validity
In fact it is not a kind of scientific aspect of validity. Face Validity refers to the face of the test among general public, test-takers and other lay people/non-related persons etc. Face validity means the test should have certain characteristics. These characteristics are those which the people expect about the test. They include proper printing, a governing body, an appropriate manner of test taking, subjective/essay type of questions etc. Thus a mathematics test will not be considered valid according to face if it has not numerical questions. Numerical Question is the expectation of the lay people from a mathematics test.
A test that does not have such evidences may be rejected by the test takers and the governing body may bear criticism. Face Validity is nothing in the opinion of the expert but a test should look like a test a common man thinks. So face validity becomes an issue for the test designer and paper setter.
2. Predictive Validity
Predictive Validity refers to the future performance and success of the learner. This aspect of validity ensures that the test is providing valid information about the future performance of the learner. So it includes all those situations or skills for testing which the learner encounter or perform in his future. The example of the tests must have predictive validity are entry tests and selection tests.
Language aptitude tests should have predictive validity because they test present skills for future performance. Proficiency tests also use predictive validity. In diagnostic and achievement tests although there are other types of validity involved but they should also have predictive validity. As they are also liked with the future performance of the learner.
Predictive Validity is calculated by statistical co-relations. The validity coefficient is calculated usually by comparing success in the test with the success in job. Thus the validity of the test is checked and improved for future tests. A 0.6 value of the coefficient is considered high which indicates that all tests do not have predictive validity.
3. Concurrent Validity
There is no major difference between the two validity types i.e. Predictive Validity and Concurrent Validity except time. Predictive Validity is related to futures while concurrent validity is related to present. When the future becomes present the predictive validity becomes the concurrent validity. We compare two tests instead of a test with future or job performance. This comparison is of two test taken usually simultaneously, one written and other oral or spoken usually. It is used to limit the criterion related errors. Most suitable situation is the comparison of a new test with already established test/criterion to find its validity and meaningfulness.
So a test having concurrent validity will show its validity in a given field. Concurrent validity is a statistical measure which requires a quantifiable criterion. Although all the criterion are not quantifiable but statistical approach assumes that they are quantifiable. Co-efficient of validity is used to compare the two tests.
4. Content Validity
It is the appealing aspect for the expert. It seeks the extent upto which the test represents the content from which it is constructed.
It is required in achievement tests. They represent a content/syllabus and they should be constituted from the given syllabus and content. Similarly diagnostic tests should also have content validity because they seek certain deficiencies of the learner from a given set of skills or syllabus. The chief examiner, advisor etc. can check that the test is representing the content which it is going to test.
Teaching materials should have their own validation i.e. predictive and construct validity. Otherwise the content validity of the test will not be fruitful. So the teaching of speaking skills should involve such materials and the examples from native speakers which teach the student appropriate speaking/spoken skills.
In proficiency tests the content validity can also be employed. Learners will have to perform in certain situations so the test can be a representative of those skills testing. Those areas can be specified before the exam just like the syllabus of the achievement tests. This is a guesswork as compared to other tests where we have syllabuses.
5. Construct Validity
A construct is a theory, usually a psychological one, which explains certain mental process say learning. It will thus say how the learner learns the language and what are the factors involved and what is the nature of language etc.
On the base of such theory the test is constructed to evaluate certain factors/indicators of the learner language to measure his ability. Thus the test will have the construct validity if it represents the aspects of that particular theory on which it is based.
Here a point should be kept in mind that a construct may be wrong. So a test having construct validity will become meaningless due to the false theory on the basis of which it is constituted. Here the problem will be with that construct not with the test. Materials and syllabuses should also be evaluated on the basis of construct validity to know if they represent and teach the language according to the theory of language and language learning.
Different questions relating each validity evidence are presented in this table.
1. Do the evaluation criteria address any extraneous content?
2. Do the evaluation criteria of the test address all aspects of the intended content?
3. Is there any content addressed in the task that should be evaluated through the test, but is not?
1. Are all of the important facets of the intended construct evaluated through the scoring criteria?
2. Is any of the evaluation criteria irrelevant to the construct of interest?
Criterion (Predictive + Concurrent)
1. How do the scoring criteria reflect competencies that would suggest success on future or related performances?
2. What are the important components of the future or related performance that may be evaluated through the use of the assessment instrument?
3. How do the scoring criteria measure the important components of the future or related performance?
4. Are there any facets of the future or related performance that are not reflected in the scoring criteria?
Validity is a pre-test concern. We should develop tests in such manner that they have the validity and meaningfulness. In this regard a three step approach can be helpful.
1. First, clearly state the purpose and objectives of the assessment.
2. Next, develop scoring criteria that address each objective.
3. If one of the objectives is not represented in the score categories, then the rubric is unlikely to provide the evidence necessary to examine the given objective. If some of the scoring criteria are not related to the objectives, then, once again, the appropriateness of the assessment and the test is in question.
Sources of Invalidity
Validity can suffer due to various factors some of which are discussed.
1. Lack of reliability indicates that the test is not valid. Although the contrary may also be true, that is, a test is reliable and consistent in its results but it may be meaningless in certain context and irrelevant. Reliability should be seen to prevent invalidity.
2. Content and Construct under-representation is a situation in which important aspects of the content and construct are not included in the test. Thus the results are unlikely to reveal the true abilities of the student's abilities within that construct or content which were indicated and having been measured by the test.
3. Content and Construct over-representation is a situation in which the aspects of the content and construct are represented in the test in excess, that is, irrelevant part are also included in the test. This can further be divided in two situations:
1. One where the over-representation leads the test to easiness and the learner or test-taker can get some clues from the test to solve some problems, thus guessing increases invalidity.
2. Other situation can lead the test to difficulty and it becomes difficult for the student to score appropriately. It is not the fault of the student but the construction of the test affects him and he cannot perform well.