History and present of Natural Language Processing

5 min

The beginnings (1950–1970)

Natural Language Processing (NLP) is a discipline with a long history. It was born in the 1950s as a sub-area of Artificial Intelligence and Linguistics, with the aim of studying the problems derived from the automatic generation and understanding of natural language. Although works from earlier periods can be found, it was in 1950 that Alan Turing published an article entitled “Intelligence” that proposed what is now called the Turing test as a criterion of intelligence (see Turing test).


NLP is a technique used to bridge the communication gap between computer and human and has its origins in the early ideas of machine translation (MT) that were born during World War II. The Georgetown experiment in 1954 involved the automatic translation of more than sixty Russian sentences into English in a collaboration between IBM and Georgetown University. The authors claimed that within three to five years, machine translation would be a solved problem. However, actual progress was much slower, and after the ALPAC report in 1966, which found that the ten-year research had not lived up to expectations, funding for machine translation was drastically reduced.

Some notably successful natural language processing systems developed in the 1960s were SHRDLU (whose acronym has no particular meaning) written by Terry Winograd at MIT. SHRDLU was a natural language system that worked with “blocks” in a restricted vocabulary framework. One could give it instructions in colloquial language such as “Can you put the red cone on top of the green block?” and the program understood to execute the action. Another example was the ELIZA program, a simulation by psychotherapist Carl Rogers, written by Joseph Weizenbaum between 1964 and 1966. With no information about human thought or emotion, ELIZA provided a surprisingly human-like interaction. When the “patient” exceeded the very small knowledge base, ELIZA could provide a generic response, for example, answering “My head hurts” with “Why do you say your head hurts?”. ELIZA was perhaps the first antecedent of the conversational BOT and one of the first programs capable of coping with the Turing test.

…a little later (1970–1990)

During the 1970s, many programmers began writing “conceptual systems,” which structured real-world information into computer-understandable data, and by the 1980s, most natural language processing systems were based on complex sets of handwritten rules. However, due to the steady increase in computational power and the gradual decline in the dominance of Chomskyan (structural language) theories, by the late 1980s, there was a revolution in natural language processing with the introduction of algorithms known as machine learning algorithms. A new paradigm was born.

NLP with machine learning (1990–2000)

Some of the older machine learning algorithms, such as decision trees, produced rigid rule systems similar to existing handwritten rules. However, part-of-speech labeling introduced the use of Hidden Markov Models (HMMs) in natural language processing, and increasingly, research focused on statistical models, which make smooth probabilistic decisions.

Machine learning procedures can make use of statistical inference algorithms to produce models that are more robust and reliable to unknown inputs (e.g., containing words or structures that have not been seen before) or erroneous inputs (e.g., with incorrectly spelled words or accidentally omitted words). The latter is very common in real-world data. Unlike language processing systems, which were designed by hand-coding a set of rules, such as grammars or devising heuristic rules for derivation, the machine learning paradigm instead requires the use of statistical inference to automatically learn such rules through the analysis of large numbers of typical real-world examples. For this reason, the latter can be made more accurate simply by providing more input data. The others, on the other hand, can only be made more accurate by increasing the complexity of the rules, which is a much more difficult task.

In the 1990s, the popularity of statistical models increased dramatically. Recent research has increasingly focused on unsupervised and semi-supervised learning algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired responses, or using a combination of annotated and unannotated data. In general, this task is much more difficult than supervised learning, and generally produces less accurate results for a given amount of input data. However, there is an enormous amount of unannotated data available, including, among other things, the entire contents of the World Wide Web.

Neural networks are back in vogue (2000–2020)

In 2001, Yoshio Bengio and his team proposed the first neural network-based “language” model, using a “FeedForward” network. In this type of network, data moves only in one direction, from the input nodes, through the hidden nodes, and then to the output nodes. The feed-forward neural network has no cycles or loops, and is quite different from recurrent neural networks.

In the 2010s, representation learning and neural network-style machine learning methods became widespread in natural language processing, due in part to a number of results showing that such techniques can achieve state-of-the-art results on many natural language tasks such as answering questions accurately for example. Thus, deep neural network-based approaches can be seen as a new paradigm distinct from simple statistical natural language processing.

In 2011, Apple’s Siri became known as one of the world’s first successful NLP / AI assistants to be used by general consumers. Within Siri, the automated speech recognition module translates the owner’s words into digitally interpreted concepts. The voice command system then matches those concepts to predefined commands, initiating specific actions. For example, if Siri asks, “Do you want to listen to your balance?” it would understand a “Yes” or “No” answer and act accordingly.

By using machine learning techniques, the user’s speech pattern does not have to exactly match predefined utterances. The sounds only have to be reasonably close for a PLN system to translate the meaning correctly. Combining a dialog manager with PLN makes it possible to develop a system capable of carrying on a conversation and sounding like a human, with questions, prompts, and back-and-forth responses. Our modern AIs, however, still can’t pass the Alan Turing test, and currently don’t sound like real humans. At least not yet.

The future is here: NO-CODE NLP (2020~∞)

Extracting knowledge from a corpus of documents without generating code is possible. As simple as that. You don’t have to install Python or R or any other programming language, no development environments, you won’t have to struggle with dependency, syntax or library update errors, if you are more comfortable with workflows and visual programming, you have the no-code NLP option.

Customer Experience and Customer Success or Product Management areas among others will benefit from this type of tools. Here we will mention 5 of the most important ones: DeepTalk, KNIME Analytics Platform and Orange Data Mining, Obviously and Levity.

Being visual programming, the learning curve for a layman in programming is less. All of them have training channels with tutorials that allow you to start testing very easily. Also, the results are not a “hello world.” They are, or can be, real. It just depends on the data you have.

Welcome to our blog