Tuesday, May 11, 2010

Definition: Integrity

http://www.clir.org/pubs/reports/pub92/lynch.html

Integrity

When we say that a digital object has "integrity," we mean that it has not been corrupted over time or in transit; in other words, that we have in hand the same set of sequences of bits that came into existence when the object was created. The introduction of appropriate canonicalization algorithms allows us to consider the integrity of various abstractions of the object, rather than of the literal bits that make it up, and to operationalize this discussion of abstractions into equality of sets of sequences of bits produced by the canonicalization algorithm.

When we seek to test the integrity of an object, however, we encounter paradoxes and puzzles. One way to test integrity is to compare the object in hand with a copy that is known to be "true."5 Yet, if we have a secure channel to a known true copy, we can simply take a duplicate of the known true copy. We do not need to worry about the accuracy of the copy in hand, unless the point of the exercise is to ensure that the copy in hand is correct—for example, to detect an attempt at fraud, rather than to be sure that we have a correct copy. These are subtly different questions.6

If we do not have secure access to an independently maintained, known true copy of the object (or at least a digest surrogate), then our testing of integrity is limited to internal consistency checking. If the object is accompanied by an authenticated ("digitally signed") digest, we can check whether the object is consistent with the digest (and thus whether its integrity has been maintained) by recomputing the digest from the object in hand and then comparing it with the authenticated digest. But our confidence in the integrity of the object is only as good as our confidence in the authenticity and integrity of the digest. We have only changed the locus of the question to say that if the digest is authentic and accurate, then we can trust the integrity of the object. Verifying integrity is no different from verifying the authenticity of a claim that "the correct message digest for this object is M" without assigning a name to the object. The linkage between claim and object is done by association and context—by keeping the claim bound with the object, perhaps within the scope of a trusted processing system such as an object repository.

In the digital environment, we also commonly encounter the issue of what might be termed "situational" integrity, i.e., the integrity of derivative works. Consider questions such as "Is this an accurate transcript?", "Is this a correct translation?", or "Is this the best possible version given a specific set of constraints on display capability?" Here we are raising a pair of questions: one about the integrity of a base object, and another about the correctness of a computation or other transformation applied to the object. (To be comprehensive, we must also consider the integrity of the result of the computation or transformation after it has been produced). This usually boils down to trust in the source or provider of the computation or transformation, and thus to a question of authentication of source or of validity, integrity, and correctness of code.

No comments:

Post a Comment