User:Average/Kolmogorov Quotient

From HackerspaceWiki
Jump to: navigation, search

Let's say you want to explore the notion of quantifying the amount of elegance a programming language provides. That is, the amount a high-level language reduces the complex via it`s language constructs.

We'll define this idea of "simplification" as a factor of text-wise reduction (fewer characters needed to express a complex concept, à la AlgorithmicInformationTheory) and another, less easy-to-quantify concept of maintainability.

Fleshing out this latter concept, it is clear it has to do with how easily one can establish programmer consensus for the given task; i.e. how many programmers of the language would put it back the same way you've expressed it or otherwise agree on the best implementation between different implementations of the same problem.

I will define the Kolmogorov Quotient so that higher numbers for a given language denote a reduction in the complexity of solving the problem in the given language.

The metric for "text-wise reduction" should incorporate a constant factor based on (non-identifier) symbols used in the language and source text. These factors will be the same across all languages implemented (i.e. designed) for a given architecture (e.g. VonNeumannArchitecture vs. Symbolics) and will be a measure of the significance of the symbol; i.e. the topology of the language tokens. (These terms, alas, will also need to be fleshed out and defined.)

Once the basic premise and a methodology above is agreed to, it is only a matter of a rough constant of difference for any specific implementation/architecture. That is, as long as the architecture is the same across all measurements, the number should be valid and comparable between languages.

But it could be implemented something like so: Pick a language "close to the machine", like C or Assembly, and measure the amount of bytes of source/machine code it used to implement a standard suite(*) of common, non-threaded programming tasks (base_language_count). Then code the exact same functionality in the language you are wanting to measure (without external libraries) and count the number of bytes of source code (test_language_count).

KQuotient = (base_language_count / test_language_count)

One should also record, for discussion, number of "equal programs". By which I mean, the number of programs fluent programmers of the language agree that are the best and equivalent solutions to the problem (equivalent programs: programs whose output is the same for every input). Languages which have a large number of equal programs say something interesting about either the programming language or the types of programmers the language attracts. What it says, I don't know.... :^)

The first ratio should always be greater than 1.0. I suppose one could game the metric by designing a language with single letter keywords, but those shouldn't count.

(*) "standard suite of common programming tasks...": I see two main categories:

  • Data-processing suite, limited to simple text I/O (computation towards the machine)
  • GUI suite (computation towards the user)

The purpose of this idea is to end the tiring language wars about whose language is the best. By giving a quantitative metric, people can at least argue better.


Some useful terms:

  • Expressivity: The readibility of program code, based on, for example, the understand-ability of token-use and keywords. Using keywords and tokens in the most universally-agreed to (non-programmatic) sense.
  • Succintness: The conciseness of program code, how small but still parseable the program is.