Artificial Intelligence


AI for Fake News Detection


Social media is great for networking and sharing insights with your friends and contacts. But, there is an unfortunate outcome of this wonderful technology. Fake News - that spreads like fire and can potentially cause a lot of violence and fuel hatred. Here is a study to use AI for detecting such fake news.


Introduction to PyTorch


PyTorch is an open source machine learning library used for developing and training neural network based deep learning models. It is primarily developed by Facebook's AI research group. Pytorch uses core Python concepts like classes, structures and conditional loops - that are a lot familiar to our eyes, hence a lot more intuitive to understand.


AWS Transcribe


The Amazon Transcribe service can be used to recognize speech in audio files and convert it to text. It can identify the individual speakers in an audio clip. We can use it to convert audio to text and to create applications that incorporate the content of audio files


AWS Rekognition


Amazon Rekognition can be used to add image and video analysis to applications. For any given image or video the Rekognition API can identify objects, people, text, scenes, and activities.


What is NLP?


Since ages, our world is bound to human languages. We always wanted to automate a lot of this work. But we always faced a big obstacle - human languages. Nobody could come up with a logical algorithm that could contain a language like English.


Face Recognition


Face recognition is one of the many surprises that AI research has brought forward to the world. It is a subject of curiosity for many techies - who would like to have a basic understanding of how things work. Let us take a dip into the subject, to see how things work.


Convolutional Neural Networks


With the latest developments in Big data, we are not short of the data or infrastructure we need to train an AI model. Now, algorithmic developments are pushing the testing errors of intelligent systems to converge to Bayes optimal error.


What is Computer Vision?


With the latest developments in Big data, we are not short of the data or infrastructure we need to train an AI model. Now, algorithmic developments are pushing the testing errors of intelligent systems to converge to Bayes optimal error.


AWS SageMaker


SageMaker is one of the fundamental offerings for AWS that helps us through all stages in the machine learning pipeline - build, train, tune and deploy. It provides us with simple Jupyter Notebook UI that can be used to script basic Python code.


Machine Learning on AWS


AWS provides us several services for solving machine learning problems on different levels. Starting with high performance EC2 instances and scalable SageMaker to specialized services like Textract, Comprehend, Deepracer and many more.


Simple network with TensorFlow


Sometimes, a few lines of code say a lot more than many pages of theory. Specially in a subject like Machine Learning, one is often left with a feeling that "all sounds great, but how do I do it!". Lets have a look at a simple implementation of the classification algorithm using TensorFlow.


Introduction to TensorFlow


TensorFlow is an open-source software library from Google. It was meant for dataflow programming across a range of tasks. It is a symbolic math library, and is largely used for machine learning applications such as neural networks. Originally it was developed by the Google Brain team for internal Google use.


Mean Shift with SkLearn


Mean shift clustering uses sliding-window to find dense areas in the data points. It is a centroid-based algorithm. The goal of the algorithm is to locate the centre points of each group/class, which works by updating candidates for centre points to be the mean of the points within the sliding-window.


Logistic Regression with SkLearn


Logistic Regression is a very important concept in Supervised Machine Learning - because it helps you use the powerful techniques of Regression based learning to the Classification problems.


Neural Networks with SkLearn


Neural Networks provide infinite possibilities for holding complex models with extreme precision. Libraries like Tensorflow provide performance specialized for specialized architectures. But, Scikitlearn provides us with a clean and simple implementations for the basic models.


Random Forest with SkLearn


A forest is essentially a collection of trees. In machine learning too, the "Random Forest" is an ensemble of "Decision Trees". The Random forest algorithm fixes a lot of issues we noticed in the Decision Tree.


Regression with SkLearn


SciktLearn provides easy implementation for most regression algorithms. We can check out the major ones here. Just as all machine learning code, these modules are concept heavy and code lite. The code is trivially small. Just a couple of lines will do the job.


SVM with SkLearn


Support Vector Machine (SVM) is a classification technique. It tries to geometrically divide the data available. If the input data has N features, the data is plotted as points in an N dimensional space. Then, it identifies an N-1 dimensional structure that could separate the groups.


KNN with SkLearn


KNN algorithm is inspired by a tendency of the human mind - to go along with the crowd. Conceptually, KNN just looks at the known points around the query point and predicts that its outcome is similar to the points around it.


K-Means with SkLearn


K-Means is an interesting way of identifying clusters in the given data. Conceptually, we can think of the process as follows: If we want to identify K clusters in the data set; we start with picking K separate points in the data space.


The SkLearn Library


The SciKitLearn library has a chunk of ready implementations of most basic Machine Learning algorithms. Most of the Machine Learning libraries are based on the principle of "concept-heavy and code-lite'.


Decision Tree with SkLearn


Decision Tree is an interesting concept that mimics a very common way our mind approaches a classification problem. Suppose our training data set has n features, we can take up one feature at a time and classify data elements of that feature. The two nodes thus obtained can be classified further based on the remaining features.


Affinity Propagation with SkLearn


This is one of the most important among the different clustering algorithms. It is also more elaborate than the others. Affinity Propagation, takes the similarity between pairs of data points as input. It starts by considering all data points as potential exemplars.


Classification with SkLearn


This is a small data set of just 150 records. That may not be enough for solving a real life problem. But, because of the small size, it is often quoted in the academic world. It refers to classification of a flower called Iris.


Neural Networks Architectures


It is almost impossible to solve all problems from the first principles. Researchers have proposed many architectures of neural networks that can be a starting point for our work.


Recurrent Neural Networks


RNN takes its own output as one of the inputs - for the next training step. The output of one step has the information content of the inputs prior to that step. If that is passed into the input for the next step, we potentially pass in all that has happened so far into the input for the next step.


Convolutional Neural Networks


The complexity of the neural network grows exponentially with the number of input parameters. This is very strongly felt when we work with images. A simple tiny image of 64x64 RGB pixels implies 64x64x3 = 12288 input parameters.


A Quick Introduction to R


R is a programming language and free software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing. It is widely used among statisticians and data miners for developing statistical software and data analysis.


NumPy


NumPy is the most important library in any computational task in Python. It combines the ease and flexibility of scripting with very high performance. It is the most basic library used in any analytics or machine learning task in Python.


Imbalanced Data


The models learn from the data we provide. If the data is imbalanced, the resulting model is bound to be imbalanced - increasing the chances of over-fitting the imbalance. We need to take special care to avoid such a problem.


Error Analysis


Error analysis is the analysis of error. He he! You don't have to tell me that. In fact, the whole theory of error analysis is as intuitive. But, people tend to miss some points in real projects.


Dimensionality Reduction


We all understand that more data means better AI. That sounds great! We need all that data. But we need to look into ways of streamlining the available data so that it can be compressed without losing value. Dimensionality reduction is an important technique that achieves this end.


Training - Best Practices


Theory and implementation are often just poles apart. It is important to understand the theory. But it is even more important to understand how to put it into practice. For that, we need an idea about the kind of issues involved in solving real world problems and the typical solutions employed.


Overfitting


Over fitting is one major problem that can lead to disappointment after a lot of effort. Underfitting is not so bad because we know about it when training. Overfitting is a surprise on the fields! Hence, it is important to eliminate possibility of overfitting.


Neural Networks


A lot of Machine Learning is inspired by how the human mind works. The concept of Neural Networks goes one step further - to take inspiration from the way Neurons are laid out in the human brain.


Random Forest


A forest is essentially a collection of trees. In machine learning too, the "Random Forest" is an ensemble of "Decision Trees". The Random forest algorithm fixes a lot of issues we noticed in the Decision Tree.


Ensemble Algorithms


Often, a team performs much better than individuals. Each member in the team can contribute a unique aspect to the final product. Each provides a different view point and if one can productively combine each of these, we can end up with a wonderful solution. Ensemble Learning is based on this concept.


Clustering


Clustering is a primary component of unsupervised learning. This involves identifying related items or clusters out of the available data set. Here are some of the key algorithms and concepts involved in the process of clustering.


Classification


A lot of our decisions are based on classification - choosing between alternatives. The decision could be based on several abstract parameters. We know that the parameters contribute to the decision. But we have no idea how. We can train a machine learning model based on such data using the classification algorithms.


Regression


As the name suggests, Regression algorithms are based on iteratively pushing closer to the desired model. Among the different regression algorithms, Linear Regression is the simplest to understand. Other forms of regression extend this concept to get enhanced outcomes.


Reinforcement Learning


In simple words, supervised learning is kind of micro management. At each point, on each step, the machine is corrected by measuring how wrong it is. Reinforcement learning works by rewarding the right as much as penalizing the wrong - based on the outcome rather than penalizing each tiny step.


Unsupervised Learning


Unsupervised learning, as the name suggests, is about learning without supervision. Unsupervised learning is just making sense of the data in hand. This is not just a theoretical fantacy. Unsupervised learning has a great application for analyzing data that we know nothing about.


Supervised Learning


We rarely know everything. Most problems we solve in real life are based on the generalization of our limited knowledge. Statistics is the formal technique that helps us make sense out of such data. Statistics provides a conceptual baseline for machine learning.


Statistics - a Refresher


We rarely know everything. Most problems we solve in real life are based on the generalization of our limited knowledge. Statistics is the formal technique that helps us make sense out of such data. Statistics provides a conceptual baseline for machine learning.


Basics of Calculus


Calculus (literally 'small pebble'), is the mathematical study of continuous change, in the same way that geometry is the study of shape and algebra is the study of generalizations of arithmetic operations.


Linear Algebra - a Refresher


Mathematics is the basis of any Engineering. Much more so with abstract sciences like Machine Learning. And Linear Algebra is at the core. It is very important to understand linear algebra if we want to proceed with understanding Machine Learning.