**Mathematical & Statistical Subjects in Artificial Intelligence Algorithms**

AI & Mathematics

Shafi

Published in

The Startup

10 min readSep 29, 2020

Exploring Mathematical and Statistical Subjects of AI Algorithms.

Overview:

Introduction
Why Maths is required.
List out Sub fields/ Modules of AI.
Mathematical subjects and required topics.
Applying Mathematical subjects in Neural Network for Multi Classification Example.
Conclusion

In this article the above indexes divided in to 2 sections , Section-I (1–4) only theoretical explanations of Maths Subjects and Section-II (5) applied concepts for Neural Network for Multi-class Classification.

Section — I Theoretical Explanation

Introduction :

AI algorithms based on Mathematics and Statistics, in this article explain importance of Mathematics in AI. Maths behind AI Algorithms is tough to understand and need a steep learning curve. AI algorithms uses Mathematical subjects even though concepts taken from other disciplines (Example: Biological Neuron for Artificial Neural Networks).

Why Mathematics: Below are the few reasons need for Mathematics in AI.

You can not get clear picture or internal workings of any algorithm
Most of the readers write their research papers using equations, formulas, techniques, results , etc., and how the required subjects get involved to accomplish the task in pure mathematical perspective. It is required you to be understand Mathematics to understand Notation, Subjects and applied techniques.
In some Complex AI-Projects like SDC, Robotics , NLP you need define own framework based on already exists frameworks like PyTorch, Tensorflow, Keras , etc., In order to develop complex projects you have to be keen in internal workings of AI Algorithms.
Sometimes you have to fine-tune the algorithms by changing the parameter values, if you are not keen in algorithm perspective and understanding internal working of mathematics you won’t achieve fine-tuning.
As an AI architect or researcher you cannot convey the experimental results in a normal way, you have to explain in Mathematical way.

Modules or Fields in AI

There are many modules in AI and I listed few of them according to the book Artificial Intelligence: A Modern Approach by Stuart Russell, Peter Norvig.

Want to know the purpose of the Module in AI below is the diagram describes it, even a newbie can understand the road map of modules.

AI & its Fields/Modules/Sub-Modules purpose

**Purpose of AI Subjects/fields to be understand by newbie**

As outline of AI fields can be categorized in the following diagram.

**Broadly categorizes the fields of AI into 5 groups**

Mathematical Subjects/Concepts will cover in almost all areas (AI-fields) not only specific to Machine Learning and Deep Learning.

How AI-fields and its required Mathematical subjects/concepts involved in algorithms will be covered in the next article briefly.

AI-Mathematical subjects and required topics

Going through each subject and mention the major concepts required and where and how to use in AI Algorithms in a short way. By mentioning these reader will be familiar while learning and developing algorithms.

**Requires to to understand the concepts, notations and advanced subjects.**

Basic formulas, Functions, Exponential, Logarithms, Euclidean Distance, Plane, Hyperplane, Linear , Non-linear, slope, curves and basics, parabola , circle, etc.,

**Abstract, Linear and Vector Algebra. Linear Algebra is a computation tool in AI**

Introduction: Algebra has multiple variations like Abstract Algebra,Vector Algebra, Linear Algebra.

Abstract Algebra: Laws of Algebra , Groups,homomorphism, Isomorphism, Ring Theory, etc.,

Following are the topics required in Linear Algebra and Vector Algebra. Note that Vector Algebra concepts are few , in some text books they covered in Linear Algebra.

Linear Algebra Concepts: Vectors, Matrices — Types of Matrices(Identity, Inverse,Adjoint) , Tensors, Properties of Matrices (Trace, Determinant, orthogonal,Projections, symmetric, singular ,etc.,), Product Rules- Inner product, Outer product,Vector-Matrix, Matrix Multiplication, Linear Combination of Vectors, Hadamard, Decomposition — Eigen Value Decomposition, SVD, etc., ,Advanced Concepts (uses in QC) — Hilbert Spaces, Tensor product,Hermitian, Unitary, etc.,

You can refresh Linear Algebra in AI & QC, this article will cover almost all topics required in both fields.

Linear Algebra in Artificial Intelligence & Quantum Computing

Linear Algebra Usage Introduction: Linear Algebra is the primary computation tool in both Artificial Intelligence (AI)…

medium.com

Concepts of Vectors applied in ML and Other areas:

**Concepts, Types and Usages of Vectors**

**Deals with Reasoning and Uncertainty**

Descriptive Statistics: Mean, Variance, Median, Mode, Standard Deviation,Covariance, Expectations, Distributions (Bernoulli, Uniform, Normal (single & multivariate), Poisson, Binomial, Exponential, Gamma), Joint and Marginal Distributions, Probability, axioms of Probability, Conditional Probability, Random Variable,Bayes Rule (Most important) , Chain Rule, Estimation of Parameters: MLE (Maximum Likelihood Estimation), MAP (Maximum A Posterior),Bayesian Networks or Probabilistic Models or Graphical models.

You can see the power of Probability in AI in this article.

The Power of Probability in AI

This blog explains basic Probability theory concepts which are applicable to major areas in Artificial Intelligence…

medium.com

**Major Statistical Concepts used in AI**

**Deals with Changes in the parameters, functions , errors and Approximations.**

Derivatives: Rules of Derivatives: addition, product, division,chain rule, hyperbolic (tanh),applications of derivatives like minima , maxima, etc.,, Integrations (If your using transformations).

Note: We are not using scalar derivatives but these will help in understanding vector and matrix calculus as well as to understand Numerical Computation very well.

Multi variable Calculus, Partial derivatives, Gradient Algorithms.

Variation of Calculus with Linear Algebra: Vector Calculus and Matrix Calculus are most important in Machine Learning and Deep learning

Vector & Matrix Calculus concepts: Gradient , Chain Rule, Jacobians, Hessian.

Following diagram describes Gradient Descent algorithm , it works in Back-propagation (BP) in Neural network architecture for optimizing Parameters.

BP describes Neural Network implementation section.

**Gradient Descent Algorithm working on Parameters or Weights in an ML/DL algorithm**

**Uses to measuring the uncertainty in algorithms**

Concepts: Entropy (Shannon Entropy),Infogain, Cross Entropy, Kullback-Leibler (KL) Divergence. Entropy measures the disorder of the distribution.

Below is the Shannon Entropy diagram describes distributions.

Sets, Sequences,Limits, Metric Spaces, Single-valued and continuous functions, Convergence, Divergence and Taylor-Series.

**Converge and Diverge parameters in a model**

**Computation methods & Optimizing with respect to constraints**

Extrema, Minima, Maxima, Saddle point, Overflow, Directional derivative, Underflow,Convex,Concave, Convexity, Lagrange’s inequality.

Following concepts used in optimization of weights in ML & DL:

**Graphically Minimum, Maximum, Saddle Point, Convex ,Concave**

Optimize Cost (minimize / maximize)

Introduction: Operational Research (OR) is the study of applying Mathematics to business questions. It is a sub-field of Applied Mathematics. OR uses the Mathematics and Statistics to answer optimization question.

Algorithms & Statistics:

OR rely heavily on Algorithms, Mathematics & Statistics. The most important of algorithms in OR are Optimization Algorithms: Algorithms that try to find a maximum or minimum.

Optimization: Challenging is that the best possible solution to a question, given set of constraints. Optimization can be Maximization or Minimization of a cost or benefit.

Mainly we use optimization technique in OR on Cost function.

**Basics for Logic, Algorithms and proofs**

Sets, Functions, First order Logic, Relations, Data structures,Algorithms,Time & Space Complexity for Algorithms, Recursion, combinatorics,Trees,Graphs, Finite-state Machines, Dynamic Programming,etc.,

Please note that some subjects or concepts be the part of Discrete Mathematics like Probability, Matrices, Boolean Algebra, Languages but these will come in the respective fields.

In the below diagram only well known DM concepts mentioned which are apply in Algorithms. Various other concepts like Finite Automata, Formal Languages, Boolean Algebra, Probability , Matrices are not mentioned due to avoid confusion and collisions.

**DM Concepts applying in Algorithms for various usages**

**There are many subjects/concepts will come into picture and have to learn as and when required**

Miscellaneous subjects/concepts: Transformations (Laplace Transformations, Z-Transformations, Fourier- Transformations), distribution functions (Sigmoid, Softmax, Softplus, Tanh,etc.,), Signal Processing, Biological Neuron Concept, Topology, Physics Basics & Control Theory, etc., Only few subjects/concepts mentioned but the list is exhaustible.

Section-2 Applied Mathematical Concepts in Neural Networks

Let us combine these subjects (mentioned above) in one algorithm and see how these works. For to this , I used Multi Class text Classification example, in this example I use Neural Network architecture and explain how the Maths subjects involved to complete the task.

Following is the Diagram explains how Maths subjects gets involved in Neural Network. Implementing ML algorithm in Neural Networks , so that user can easily understand two learning techniques in one shot.

Neural Network Architecture has many nodes in each layer and we have many layers along with Input and output Layer. In this example I used 1 hidden layer and 1 output layer along with Input layer.

Layers for Multi-class Classification Algorithm:

Input layer : Features or dimensions as Input in the form of Vectors.

Hidden layer : We can have multiple Hidden layers and neurons in each layer. In this example we use only one Hidden Layer.

Output Layer: Soft-max function produces distribution.

Neural Network Architecture build on the concept of neurons. All the Neural Network architectures like NN,CNN,RNN,Generative Models, Auto Encoders, Decoders etc., part of Deep Learning and works on Artificial Neural Networks.

**Neural Network Architecture for Multi-class Classification**

The following diagram comparing Biological Neuron and Artificial Neuron.

**(a) Biological Neural Network (BNN) & (b) Artificial Neural Network (ANN) and representing in BNN**

Artificial Neural Network for Multi-Class Classification.

Neural Network Training can be done in Feedforward Propagation or Forward Propagation and Backward Propagation or Back Propagation.

Every node in each layer is the Element in Vector and every layer is vectored. Feedforward Propagation combining linear combination of weights and inputs (inputs in Input layer and nodes in hidden layer) this can be done using Vector and Matrix product as well as addition of Bias Vector.

Since we have 2 Layers hidden and output layer, so, Feedforward and Back propagation will compute in 2 phases.

Phase-1 Feedforward

Let us define intermediate variables in above Neural network.

**Defining Intermediate Variables in Neural Network**

Know the dimensions Of Parameters:

**Dimensions of Intermediate Variables**

I covered in detail about Matrices and Vectors in Deep Learning in this article.

Linear Algebra- How uses in Artificial Intelligence ?

Understand How Linear Algebra is applying in AI.

medium.com

Let’s calculate the intermediate variables in Phase-1.

**Calculation of Feedforward Propagation for Hidden Layer**

Phase-2 Feedforward:

Let’s calculate the intermediate variables in Phase-2. Now Input is hidden layer to the output layer.

**Calculation of Feedforward Propagation for Output Layer**

After Completion of Feedforward Propagation Back Propagation begins. BP is done in 2 phases. Phase-1 at Output Layer and Phase-2 at Hidden Layer.

Phase-1 Back Propagation:

BP starts from where Feedforward stops. Starting with Cost Function J or H. BP involves many of the Mathematical Subjects such as Real Analysis, Numerical Computation, Convex Optimization, Optimization Algorithms such as Gradient Descent and its variants Algorithms, Matrix Calculus/Vector Calculus,etc.,

Chain Rule and Derivatives of Sigmoid and Softmax:

Intermediate Variables and Back Propagation:

**Back Propagation goes in reverse order of Forward Propagation**

Cost Function for Multi-class Classification

Cost Function for Multi-Class Classification

We differentiate Cost Function with respect to parameters in each layer. i.e.,

**Generic Formula for param derivative**

Starting from the output layer parameters, mathematically it can be described

**Output Layer’s Weight param derivatives**

In the above formula first part’s derivative is

Next Differentiate with respect to Second part in Equation (1)

Substituting Derivatives of first and second parts in equation (1)

In the same way, we need to differentiate J with respect to Bias

**Differentiating Cost J with respect to Bias in output layer**

Phase-2 Back Propagation:

Here I am expanding the chain linked terms and substituted in exact places without giving much explanation, because there are chances to be confused.

Following Diagram clearly mention what Forward and Back Propagation output at each layer.

**Forward and Back Propagation yields activation and derivatives of parameters**

In simple terms, we train the entire training set , once number of epochs completed or reaching the Minima all parameters will be optimized and gives good results along with accuracy on unknown data.You can see more about Deep Learning usages and how different AI-Fields incorporated in Learning (ML/DL).

Maths and Stats subjects are very important , without this something like a human body without soul.You can treat the mathematical subjects as the pay as you go whenever the requirement comes on the subjects you have to grab and start to work but the above mentioned subjects are minimally required to understand any kind of topic or concept in AI Algorithms.

References:

Deep Learning

The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine…

www.deeplearningbook.org

Matrix Calculus for Deep Learning: https://arxiv.org/pdf/1802.01528.pdf

Yes you should understand backprop

When we offered CS231n (Deep Learning class) at Stanford, we intentionally designed the programming assignments to…

medium.com

Artificial Intelligence: A Modern Approach by Stuart Russell, Peter Norvig.

What is Operations Research?

A comprehensive introduction to the field of Operations Research

towardsdatascience.com

AI & Mathematics

Overview:

Section — I Theoretical Explanation

AI & its Fields/Modules/Sub-Modules purpose

AI-Mathematical subjects and required topics

Linear Algebra in Artificial Intelligence & Quantum Computing

Linear Algebra Usage Introduction: Linear Algebra is the primary computation tool in both Artificial Intelligence (AI)…

The Power of Probability in AI

This blog explains basic Probability theory concepts which are applicable to major areas in Artificial Intelligence…

Section-2 Applied Mathematical Concepts in Neural Networks

Phase-1 Feedforward

Linear Algebra- How uses in Artificial Intelligence ?

Understand How Linear Algebra is applying in AI.

Phase-2 Feedforward:

References:

Deep Learning

The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine…

Yes you should understand backprop

When we offered CS231n (Deep Learning class) at Stanford, we intentionally designed the programming assignments to…

What is Operations Research?

A comprehensive introduction to the field of Operations Research

Written by Shafi