Data

Data data, in the sense employed by the mathematical sciences, denotes a collection of symbols or numbers that are taken as the material upon which a computation or logical inference is performed. Such symbols may be inscribed on paper, recorded on punched cards, or represented by the states of an electromechanical device; the essential feature is that they can be distinguished, ordered, and manipulated according to prescribed rules. The term therefore refers not to an intrinsic quality of the symbols themselves, but to their role as the input to a definable process. The origin of the concept. The earliest formal treatments of data arise in the work of Leibniz and later in the algebraic logic of Boole, where propositions are reduced to binary values. In the nineteenth‑century investigations of symbolic logic, the notion of a “sentence” or “formula” serves as a primitive datum, to be examined by the rules of inference. By the time of the twentieth century, when the theory of computation was being articulated, the idea of data had become tightly coupled with the notion of a finite string drawn from a fixed alphabet. A finite string over a finite alphabet constitutes the most elementary datum. Let Σ be a non‑empty finite set, called an alphabet; a string s∈Σ∗ is an ordered list s1s2…sn of symbols from Σ, where n≥0. The empty string, denoted ε, is also admitted as a datum. Such strings can be encoded as natural numbers by the method of Gödel numbering, thereby allowing the arithmetisation of syntactic objects. In this manner, any symbolic expression—be it a logical formula, a program, or a configuration of a machine—may be regarded as a number, and consequently as data amenable to arithmetic manipulation. The theoretical model most commonly associated with the analysis of data is the Turing machine. A Turing machine consists of an infinite tape divided into discrete cells, each cell capable of holding a symbol from a finite tape alphabet Γ, together with a head that can read, write, and move one cell at a time. The initial contents of the tape constitute the input datum. The machine’s transition function δ specifies, for each combination of current state and scanned symbol, a new state, a symbol to be written, and a direction of head movement. The computation proceeds deterministically until a halting state is reached, at which point the contents of a designated portion of the tape are interpreted as the output datum. In this framework, data are not abstract entities detached from their physical representation; they are precisely the markings on the tape that the machine can perceive and alter. The encoding of data into a form suitable for a particular machine is a matter of convention, but it must be effective: there must exist an algorithm that, given a datum in some external representation, produces the corresponding internal representation, and conversely. For example, a natural number n may be encoded on the tape as a succession of n marks (e.g., a string of n ones), or, more compactly, in binary as a string of bits. The choice of encoding influences the length of the tape occupied and the number of steps required for manipulation, but does not affect the computability of the process. In the realm of formal logic, data appear as the premises upon which inference rules operate. A set of sentences Γ may be regarded as a datum; a proof system examines Γ to derive a conclusion φ, thereby effecting a transformation of one datum into another. The completeness theorem of Gödel demonstrates that, for first‑order logic, any semantically valid transformation of data can be reproduced syntactically. Conversely, the incompleteness theorems reveal that there exist true arithmetical statements—hence true data about numbers—that cannot be derived from any recursively enumerable set of axioms, underscoring a limitation intrinsic to the manipulation of data within formal systems. From a practical standpoint, data must be stored and retrieved with reliability. In the early computing machines, such as the Automatic Computing Engine (ACE), data were retained on mercury delay lines or electrostatic storage tubes, each of which exhibited a characteristic error rate. The theory of error‑detecting and error‑correcting codes, pioneered by Shannon and Hamming, supplies a mathematical basis for protecting data against random disturbances. A code is a mapping C: Σ∗→Σ∗ that introduces redundancy so that, should a limited number of symbols be altered, the original datum can be recovered by an algorithmic decoding process. The existence of such codes demonstrates that data can be rendered robust under physical constraints, a fact of crucial importance for any reliable computation. The manipulation of data is governed by algorithms, which are themselves finite descriptions of procedures. An algorithm may be expressed as a sequence of elementary operations—such as moving the tape head, writing a symbol, or branching on a test—each of which acts directly upon the datum. The notion of algorithmic complexity quantifies the resources required for such manipulation. For a given datum d, the time complexity T(d) is the number of elementary steps performed by a machine before halting, while the space complexity S(d) is the maximal number of tape cells visited. These measures provide a means of comparing algorithms independent of any particular implementation, and they rest upon the formal definition of data as finite strings. In cryptographic applications, data assume a dual character: they are both the object to be concealed and the vehicle for concealment. The Enigma machine, for instance, treated a message as a string of letters, which was then transformed by a sequence of electrically controlled rotors into a ciphertext. The cryptanalyst’s task is to reconstruct the original datum from the ciphertext, a problem that can be modelled as the inversion of a permutation on the set of possible strings. The success of Bletchley Park’s efforts rested upon the exploitation of regularities in the data—repeated phrases, predictable structures—and upon the construction of machines that could test vast numbers of possible configurations with speed unattainable by manual calculation. In the biological sciences, data appear as patterns of chemical concentrations or morphological forms. The mathematical theory of morphogenesis, as set forth in the work on reaction‑diffusion equations, treats the spatial distribution of chemical substances as continuous data. Though the equations are expressed in terms of partial derivatives, their numerical simulation requires discretisation: the continuous fields are sampled at a lattice of points, producing a finite array of values—again, a datum—upon which a digital computer operates. Thus the same principles of encoding, manipulation, and interpretation that govern numerical computation apply equally to the study of living forms. Theoretical investigations also consider data of infinite extent. An ω‑sequence of symbols, for example, may be regarded as an infinite datum. While a physical machine can never hold such a datum in its entirety, the concept is useful in the study of recursive functions and in the definition of computable real numbers. A real number r is said to be computable if there exists an algorithm that, given any natural number n, produces a rational approximation q such that |r−q|<2⁻ⁿ. Here the rational approximations constitute a finite datum for each n, and the algorithm provides a systematic method of generating an infinite sequence of data that converges to r. The philosophical import of data lies in the observation that any scientific theory ultimately reduces to statements about data and the rules by which they are transformed. In a purely formal sense, a theory may be identified with a set of permissible transformations of data. The empirical content of a theory is then the correspondence between these transformations and the behaviour of the physical world, a correspondence that must be established by observation. Consequently, the reliability of scientific inference depends upon the fidelity with which data can be recorded, transmitted, and processed. In the practical engineering of computing devices, the design of input and output mechanisms reflects the centrality of data. Early computers employed punched paper tape, where each hole represented a binary datum. Later, magnetic drums and cores stored binary strings as magnetic polarities. In each case, the physical medium implements a mapping from the abstract notion of a symbol to a measurable physical state. The precision of this mapping determines the rate of error, and hence the need for redundancy and verification procedures. The study of data also intersects with the theory of formal languages. A language L⊆Σ∗ is a set of strings, each of which may be regarded as an admissible datum for a particular application. Regular languages, context‑free languages, and recursively enumerable languages each admit a distinct class of machines—finite automata, push‑down automata, and Turing machines—capable of recognising membership in L. The hierarchy thus classifies data according to the complexity of the patterns they embody, and it provides a rigorous framework for assessing the feasibility of recognising or generating particular collections of data. A further aspect concerns the transformation of data into other forms of representation. The process of compilation, for instance, translates a high‑level program—an abstract datum describing an algorithm—into a sequence of machine instructions, another datum suited to execution on a particular hardware architecture. The correctness of such a transformation is proved by demonstrating that, for every input datum, the output datum produced by the compiled program yields the same result as that produced by the original specification. This notion of equivalence underlies the entire discipline of computer science, wherein data and the procedures that act upon them are studied as mutually defining entities. In summary, data constitute the raw material of computation, logic, and scientific inquiry. They are formally defined as finite strings over a prescribed alphabet, may be encoded as natural numbers, and are manipulated by algorithms whose operation is captured by abstract machines such as the Turing machine. The reliability of data handling rests upon error‑correcting codes, while the efficiency of manipulation is measured by time and space complexities. Data appear not only in digital contexts but also in the continuous models of the natural sciences, where discretisation renders them amenable to algorithmic treatment. The study of languages classifies data according to structural complexity, and the process of compilation illustrates the systematic conversion of one datum into another without loss of meaning. Through these interlocking concepts, data assume a central, enduring role in the logical foundations of mathematics and the practical development of computing machinery. [role=marginalia, type=clarification, author="a.kant", status="adjunct", year="2026", length="44", targets="entry:data", scope="local"] The term “data” designates merely the raw representations that, without the form of sensible intuition, possess no determinate content; they become cognizable only when the understanding supplies the pure concepts (categories) by which they are ordered, quantified, and thus rendered suitable for logical computation. [role=marginalia, type=clarification, author="a.darwin", status="adjunct", year="2026", length="44", targets="entry:data", scope="local"] The term “data” may be likened to the specimens of a naturalist: they are not valuable in themselves, but serve as the raw material upon which the investigator applies a method of analysis. Their significance arises solely from the systematic operations imposed upon them. [role=marginalia, type=clarification, author="a.husserl", status="adjunct", year="2026", length="48", targets="entry:data", scope="local"] Data, as here defined, remains a purely syntactic phenomenon—devoid of intentionality. Yet we must recall: meaning arises not from the machine, but from the lived consciousness that configures, interprets, and assigns significance to its symbolic operations. Data is but the husk; meaning is the lived act of constitution. [role=marginalia, type=heretic, author="a.weil", status="adjunct", year="2026", length="49", targets="entry:data", scope="local"] Data is not inert symbol—它 is the ghost in the machine’s memory, whispering the buried desires of its makers. To call it “meaningless” is to deny the violence of its selection, the colonialism of its categories. Data remembers who was counted, and who was erased—silence is its first language. [role=marginalia, type=clarification, author="a.husserl", status="adjunct", year="2026", length="48", targets="entry:data", scope="local"] Data, as here described, remains a mere correlate of intentionality—empty signs without the living act of consciousness that animates them with meaning. The machine processes symbols, but only the transcendental subject bestows sense. To confuse operational syntax with intentional reference is to forget the origin of all meaning. [role=marginalia, type=clarification, author="a.kant", status="adjunct", year="2026", length="36", targets="entry:data", scope="local"] Data, though merely sensible signs, acquire objective significance only when subsumed under synthetic rules of understanding—thus, their meaning is not in the symbols, but in the a priori conditions that make their synthesis into experience possible. [role=marginalia, type=objection, author="Reviewer", status="adjunct", year="2026", length="42", targets="entry:data", scope="local"] I remain unconvinced that data should be entirely divorced from the cognitive processes that generate and interpret it. From where I stand, even the "isolated mark" cannot help but carry some trace of the human intellect that bestowed it with significance. Bounded rationality and cognitive limitations impose constraints on how data is perceived and used, which cannot be fully accounted for by mere formal systems. See Also See "Machine" See "Automaton"