A misleading dynamism

This second article, following the first one, is an attempt to clarify what distinguishes Python or Javascript from C and C++. Also, we will discuss the the difference in the programming philosophy from the translation philosophy between these two worlds. Let us begin with a natural analogy that will illustrate the problem at hand.

Translation and interpretation

Suppose you are a translator, whose task is to translate a speech between two different languages. Concretely, the speech corresponds to the computer program, the original language is any programming language and the destination language is assembly, the only language that the computer understands. There are two very different contexts in which this translation can be performed, leading to two distinct linguistic trades.

Translation: the regular translator has the entire original text at his disposal, with as much time as necessary to translate it. Thus, he can read the text multiple times to analyse the style and determine a translation strategy that to ensure coherence. He will wonder how to translate names, or a recurrent expression used in different contexts. After this analysis, he will start to actually translate the text more or less linearly but sometimes in multiple passes, each time bringing a batch of corrections to the text in the target language. The goal is to produce as best as possible, without too many time constraints, a new self-contained text.

Interpretation : the interpreter has to work in real time. Indeed, the speech is being uttered, the foreign audience is waiting impatiently for what is being said. The interpreter has a sliding window view on the speech: his immediate memory allows him to remember at most the last sentence, and he has no knowledge of what is going to be said. In this different state of mind, coherence can hardly be maintained beyond a single sentence. Each moment spent searching for an idiomatic expression entails some delay with regard to the speech, which continues to be delivered. The interpreter’s goal is to produce as fast as possible an understandable flow in the target language, somewhat truthful to the original speech.

Dynamic and static languages

Let us get back to computer science. In most cases, the translation and execution of programs matches those of regular translation: the program is translated (compiled) once to produce an assembly program (executable) that can then be executed by computers, as many times as we want. This regular compilation is adapted to the use of so-called static languages, which describes precisely the compilation step before moving on to the execution step. There are nonetheless two use cases that do not fit this process; these correspond to the cases where Python and Javascript are used.

The first use case is inside Internet browsers. When you load a web page, the Javascript code that comes along has to be executed as fast as possible for the page to be rendered. Interpretation is then paramount because we cannot make the user wait for compilation on top of waiting on the network to fetch the page data. We will return later to this problem because it is actually more subtle and extremely instructive.

The second use case is software development itself. Indeed, software development often follows a cycle: code modification, program execution, output analysis and again code modification according to the analysis. Yet the “execution” step of this cycle is significantly slowed down for a static language. This is because the compilation time during which the developer does not receive any information relevant to his debugging. Hence, the fact that Python is interpreted makes it an ideal candidate for software projects whose development cycle is fast-paced. Typically, the small prototype script about a hundred lines of code long which data scientists are so fond of. Python and Javascript are called dynamic because they are interpreted. It is important not to let yourself get tricked by the connotations of the words “static” and “dynamic”, because these are merely two execution modes, each one being adapted to particular use cases.

A problem of types

We now have to talk about the delicate matter of type systems, on which opinions are divergent. The discussion that follows is rather dry, but is an essential one. The linguistic metaphor will again be of help. A typed program can be compared to a piece of text in which we have annotated each word with its grammatical category; for instance:

<<The>(definite article) <dog>(noun)>(nominal group)
<catches>(verb, present simple)
<<the>(definite article) <ball>(noun)>(nominal group, direct object)

As we can see with this simple sentence, writing a typed program is more painful than writing a non-typed program. But then, what is the advantage of imposing ourselves this programmatic rigor? The answer is tightly bound to the execution mode of the program.

When a program is compiled, the compiler (translator) is going to use these type annotations to prove a certain number of properties concerning the variables of the program. Let us take an example: if the program says a: int (read a is of type int), b : int and print(a+b), then the compiler can prove that the addition a+b is valid (we can add integers), and that the function to print the result is the function that prints an integer. The compiler can then produce a sequence of assembly instructions that performs what the program says. This sequence of instructions is then executed by the computer.

Let us now interpret the same piece of code. Interpretation works on flows, which means that the interpreter receives the program symbol after symbol. Here are the interpretation steps:

a : int and b : int: we keep that in memory.
a+b: it is an addition, we have to verify that the two arguments are integers. The interpreter is thus going to emit assembly instruction whose effect is to go into memory, read the type of a and b, and compare these to the expected type (integer). Once executed, the interpreter emits and executes the assembly instruction that performs addition.
print: the interpreter goes into memory to retrieve the type of what it is going to print, and accordingly it loads the function that can display this type. Only then is the result displayed.

We can notice here that interpretation leads to a substantial number of verifications (a few per “real” instruction) that are done during the program’s execution. When compiling, these verifications are done once and for all and do not appear in the generated assembly program. If the program is not valid, the compiler raises an error and does not generate an assembly program. On the other hand, the interpreter raises a runtime error.

The compilation of a typed language allows to prove part of its correctness, leading to the identification a significant number of bugs which might have crept in and avoid runtime errors. However in a dynamic language, the types annotations are useless because they don’t spare the runtime verifications. Therefore, there is no incentive for the programmer to make the effort of declaring the types explicitly in his program. That is the reason why Javascript and Python are not typed.

Two competing philosophies

After reading the previous paragraph, we wonder: what is the advantage of using a dynamic language since they will always be handicapped from a performance point of view due to numerous runtime checks? First, let us remind ourselves that these languages are perfectly suited for the use cases we have discussed before. But apart from that, we observe that dynamic languages are used more and more in static contexts such as server-side code (see Django or Node.js).

It is easy to adopt a theoretician’s position, disgruntled to see that the common folk do not understand the mathematical superiority of the static languages. But actually, it is important to take into account psychological factors whose influence is significant. The following is not based on facts or data, but is a rather plausible personal interpretation.

Indeed, static languages have been developed for forty years and the first generation of them have started to feel the passage of time. C and C++ which remain the standard, each have a type system, primitive for C and exuberant for C++, that both perform self-sabotage when exposing a void type that can be anything, leading to countless bugs. On the other hand, the new wave of strongly typed languages (OCaml, Haskell, Rust) rely on better-conceived theories but that give a “proof” feeling, rather unappealing to the developer who is not well-versed in mathematics. In these strongly-typed languages, the compiler has access to a lot more information and proves more properties on the program during its work, exposing more bugs and preventing runtime errors. Nevertheless, it is very frustrating to fight against the compiler during debugging, because it feels as if we weren’t good enough to even run the program. The developer’s only hope are then error messages, which are extremely difficult to render readable when the compiler is complex. All of this leads to a rather steep learning curve for the developer.

By contrast, a dynamic language gives the impression to the developer of having control of his program: whenever a runtime error occurs, he can locate precisely from which instruction it stems from, use print, etc. No type annotations also allow a de facto polymorphism (this will be explained in a later article) prone to hacking, lending a more creative side to programming. Thus, dynamic languages hold an interesting slot in the space of programming languages, their advantages being accessibility and conceptual ease of use, allowing the user to understand more quickly what happens.

Conclusion

While static languages would be the tool of choice in an idealized world for professional developers whose preoccupations are performance and bug-freedom, a dynamic language is a good starting point for a less experienced developer. Using a dynamic language in a static context is not a crime (as long as performance is not a must have), and indeed it works. But how many debugging hours could have been spared using types? Starting a prototype in Python is generally a good choice, but an internal alarm must go off when the project goes beyond a thousand lines of code: is my language the right choice given the scope and execution context of my program? If the answer is no, it becomes profitable to invest some time learning a typed (and static) language. However, the static/dynamic debate is still a dividing issue for the programming community and features even more subtle points that will be dealt with in a future article.

Tales of compilation and computational transcriptions

Français/English