Word Vector

Since ages, our world is bound to human languages. We always wanted to automate a lot of this work. But we always faced a big obstacle - human languages. Nobody could come up with a logical algorithm that could contain a language like English.

The developments in machine learning provided us a new way of handling this problem. And today, Natural Language Processing has taken us far ahead of what we had imagined. We still have a long way to go. But things are moving really fast. Here are some of the important concepts of NLP.

Logical Speech?

There is almost no logic to any language that humans speak. Most of our languages have evolved over time - based on convenience and random acceptance. (Only exception is Sansrit that was formally created and finalized before it could be used). One can never produce an algorithm that can map an English sentence to its meaning. Yet, the mind has absolutely no problem doing this job! How does it manage this?

NLP is a science of trying to understand the mind's way of understanding a language and producing sentences. Essentially, language cannot work with a point in time sample. That is, a word by itself does not carry much meaning in any languge. Its meaning is in relation to the words before it. Recurrent Neural Networks are used to correlate words in relation to previous words.

But, the fact is just a few words is not enough. The meaning of a word is in relation to the sentence, and the sentences before and the events that occurred before and the events that happened years ago... that could go trace to the big bang itself! The meaning of anything that we say today can be entirely different in the context of an event in the past.

We have made a huge progress in this domain but it is far from completion. We are yet to guess how we can identify sarcasm, implied messages, etc. These require a lot of background knowledge about the subject of discussion. Yet, there is a lot that we have achieved. Let us look at the major concepts that enabled us to get things thus far.

Word Vector

Machines are good at numbers. If we want them to learn a language, we must map it to numbers. But that is so simple. Do we assign numbers to alphabets? Or words? Or sentences? If we somehow do assign them numbers, what would it mean to add or subtract these numbers? Would these operations be valid? Why do we need them after all?

Let's look at this in some more detail. All the machine learning we saw before was based on minimizing the difference between the real and calculated output. All the TensorFlow and PyTorch and SageMaker and the millions of lines of code that has gone into this domain is essentially trying to reduce this "error function" - that maps the difference between the real and calculated outputs.

Great, but what is the difference between words? How do we calculate the error and how do we minimize it? In order to achieve this, we need a mathematical representation for words. We should map the words as vectors in an n-dimensional space - such that the similar words are close to each other. Only then can we think of using our typical algorithms on a natural language.

Wow! Now that is an interesting concept. The next obvious question is how do I do it? Very few had the tenacity to just count all the words in English language. Now you are telling me to map each of these words into an n-dimensional space, while ensuring such constraints! How would you do that? There is a simple technique for going about this.

It is based on the assumption that if two words are similar, they are replaceable.

That sounds quite intuitive. Now, let's formalize this statement. The context of a word (that appears in a sentence) can be defined as the set of n words before and after that word. When we scan a massive data set of different types of literature; we see many different words - in different contexts. Using this, we can generate a map of each word along with the different context's in which it appears; along with the reverse map of each context along with the different words that appear in that context.

Thus, each context has a set of words that it can contain. These words are similar. We can now push this similarity into the contexts (the context is made of words after all) - to identify similar contexts. Iteratively, this can help us generate a vector for each word.

The accuracy of this representation naturally depends upon the number of dimensions in the vector - that depends upon the number of words we include in the context.