Linear Regression 2-dimensional

A set of $N$ data points $(x_i, y_i)$, the goal is to find the best linear map $f: \mathbb{R}^2 \to \mathbb{R}^2$ such that $f(x) = mx + b$ fits the data points. In simpler terms, we assume the relation between the dependent variable $y$ and independent variable $x$ is linear and try finding the optimal $m$ and $b$ such that some error function is minimized.

Loss/Error Function

$$\begin{equation} E = \frac{1}{N}\sum_{i = 1}^{N}(y_i - \widehat{y})^2 \end{equation}$$

where, $\widehat{y} = mx_i + b$, hence

$$\begin{equation} E = \frac{1}{N}\sum_{i = 1}^{N}(y_i - (mx_i + b))^2 \end{equation}$$

Optimal $m$ and $b$

$$\begin{equation} \frac{∂E}{∂m} = -\frac{2}{N}\sum_{i = 1}^{N}(x_i \times (y_i - (mx_i + b)) ) \end{equation}$$

$$\begin{equation} \frac{∂E}{∂b} = -\frac{2}{N}\sum_{i = 1}^{N}(y_i - (mx_i + b)) \end{equation}$$

Gradient Descent

Arrive at the desired $m$ and $b$ by updating these values following the direction of the greatest descent of this function. The learning rate $L$ has to be specified.

$$\begin{equation} \bar{m} = m - L\frac{∂E}{∂m} \end{equation}$$

$$\begin{equation} \bar{b} = b - L\frac{∂E}{∂b} \end{equation}$$

Visualizing the Loss Function $E(m, b)$

Visualizing the Gradient Descent every 5 epochs

Since $E(m, b)$ has only a single global minima (A Paraboloid!), setting an appropriate learning rate $L$ is highly important. Too large a rate will eventually shoot off and skip the minima, which may or may not wobble down to the well upon even increasing the epochs.

Gradient Descent

For a multivariable function, in $N$ dimensions let's say, $F(\textbf{v})$ which is differentiable at a point $\mathbf{v}$, we say that $F(\mathbf{v})$ decreases fastest in the direction of negative of the gradient at that point $\mathbf{v}$ denoted by $-∇F(\mathbf{v})$.

Evaluating the descent

$$\begin{equation} \mathbf{v}_{i + 1} = \mathbf{v}_{i} - L \nabla F(\mathbf{v}_{i}) \end{equation}$$

where, $L$ is the learning rate and $L \in \mathbb{R}_{+}$ and

$$\mathbf{v} = \begin{bmatrix} x_1 \\\ x_2 \\\ \vdots \\\ x_N \end{bmatrix}$$

and

$$\nabla F(\mathbf{v}) = \begin{bmatrix} \frac{\partial F}{\partial x_1} \\\ \frac{\partial F}{\partial x_2} \\\ \vdots \\\ \frac{\partial F}{\partial x_n} \end{bmatrix}$$

Hence, the equation becomes:

$$\begin{equation} \begin{bmatrix} x_1^{(i + 1)} \\\ x_2^{(i + 1)} \\\ \vdots \\\ x_N^{(i + 1)} \end{bmatrix} = \begin{bmatrix} x_1^{(i)} \\\ x_2^{(i)} \\\ \vdots \\\ x_N^{(i)} \end{bmatrix} - L \begin{bmatrix}\frac{\partial F}{\partial x_1} \\\ \frac{\partial F}{\partial x_2} \\\ \vdots \\\ \frac{\partial F}{\partial x_n} \end{bmatrix} \end{equation}$$

Descent to a minima

Notice that the decrease in $F(\mathbf{v})$ is guaranteed only to the nearest well, which may or may not be the global minima. We may run for a specified number of epochs or terminate at a set tolerance too.

2-dimensional example

Let

$$F(\mathbf{v}) = F\left(\begin{bmatrix} x \\\ y \end{bmatrix}\right) = \sin(x)\cos(y)$$

then the gradient

$$∇F(\mathbf{v}) = \begin{bmatrix}\frac{\partial F}{\partial x} \\\ \frac{\partial F}{\partial y} \end{bmatrix} = \begin{bmatrix} \cos(x)\cos(y) \\\ -\sin(x)\sin(y) \end{bmatrix}$$

and starting from an initial point $\mathbf{v}_0$, we may reach the nearest local minima as:

$$\begin{equation} \begin{bmatrix} \bar x \\\ \bar y \end{bmatrix} = \begin{bmatrix} x \\\ y \end{bmatrix} - L \begin{bmatrix} \cos(x)\cos(y) \\\ -\sin(x)\sin(y) \end{bmatrix} \end{equation}$$

$F(x, y) = \sin(x)\cos(y)$

Visualizing the Descent

Convolution

For two functions $f(x)$ and $g(x)$, the convolution function $f(x) * g(x)$ is defined as:

$$\begin{equation} (f * g) (t) = \int_{-\infty}^{\infty} f(τ) ⋅ g(t - τ) dτ \end{equation}$$

for discrete samples that we deal with:

$$\begin{equation} y[n] = f[n] * g[n] = \sum_{k = -∞}^{∞} f[k] ⋅ g[n - k] \end{equation}$$

if $f$ has $N$ samples and $g$ has $M$ samples, then the convolved function has $N + M - 1$ samples. A basic rule: "flip any one of the functions, overlap it with the stationary one, multiply and add, and then traverse over."

2D Convolution

Implementing the 2d Convolution

$$A = \begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}$$

when zero-padded by 1 pixel gives:

$$A' = \begin{bmatrix}0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 2 & 3 & 0 \\ 0 & 4 & 5 & 6 & 0 \\ 0 & 7 & 8 & 9 & 0 \\ 0 & 0 & 0 & 0 & 0\end{bmatrix}$$

This is achieved as:

A_padded = np.pad(A, padding = 1, mode = "constant")

Also, before proceeding with the convolution, the kernel must be flipped Left-Right and then Upside-Down

$$ker = \begin{bmatrix}a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} ⟶ \begin{bmatrix}c & b & a \\ f & e & d \\ i & h & g \end{bmatrix} ⟶ \begin{bmatrix}i & h & g \\ f & e & d \\ c & b & a \end{bmatrix} = ker'$$

This is achieved by:

ker_flipped = np.flipud(np.fliplr(ker))

fliplr denoting a left-right flip and flipud denoting a up-down flip. Choose a stride of length 1 and perform the convolution as the dot product of kernel sized chunks of $A$ with the $ker$:

$$\begin{bmatrix}0 & 0 & 0 \\ 0 & 1 & 2 \\ 0 & 4 & 5 \end{bmatrix} \cdot \begin{bmatrix}i & h & g \\ f & e & d \\ c & b & a \end{bmatrix} = elt_1$$

$$\begin{bmatrix}0 & 0 & 0 \\ 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \cdot \begin{bmatrix}i & h & g \\ f & e & d \\ c & b & a \end{bmatrix} = elt_2$$

. . .

$$\begin{bmatrix}5 & 6 & 0 \\ 8 & 9 & 0 \\ 0 & 0 & 0 \end{bmatrix} \cdot \begin{bmatrix}i & h & g \\ f & e & d \\ c & b & a \end{bmatrix} = elt_N$$

Notice the dimensions of the final output matrix:

$$\begin{equation} R_{\text{height}} = \frac{A_{\text{height}} + 2\cdot\text{padding} - ker_{\text{height}}}{\text{stride}} + 1 \end{equation}$$

$$\begin{equation} R_{\text{width}} = \frac{A_{\text{width}} + 2\cdot\text{padding} - ker_{\text{width}}}{\text{stride}} + 1 \end{equation}$$

Function for the convolution

def convolve2d(image, kernel, padding, stride):
    image_height, image_width = image.shape
    kernel_height, kernel_width = kernel.shape

    output_height = (image_height + 2 * padding - kernel_height) // stride + 1
    output_width = (image_width + 2 * padding - kernel_width) // stride + 1
    output = np.zeros((output_height, output_width))

    padded_image = np.pad(image, padding, mode = "constant")
    kernel = np.flipud(np.fliplr(kernel))

    for i in range(0, output_height, stride):
        for j in range(0, output_width, stride):
            output[i, j] = np.sum(padded_image[i : i + kernel_height, j : j+kernel_width] * kernel)

    return output

The Edge Detection using Kernels - Sobel Operators

$$\mathbf{G_x} = \begin{bmatrix}1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{bmatrix}$$

$$\mathbf{G_y} = \begin{bmatrix}1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix}$$

Obtain two images $A_x$ and $A_y$ for detecting vertical and horizontal edges as:

$$\mathbf{A_x} = \mathbf{G_x} * \mathbf{A} = \begin{bmatrix}1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{bmatrix} * \mathbf{A}$$

$$\mathbf{A_y} = \mathbf{G_y} * \mathbf{A} = \begin{bmatrix}1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix} * \mathbf{A}$$

The final edge-detected image is obtained as:

$$\mathbf{A_{\text{sobel}}} = \sqrt{\mathbf{A_x}^2 + \mathbf{A_y}^2}$$

Original Image $(3264, 4928)$

Sobel Image $(3264, 4928)$

$A_x$ and $A_y$

Orginal - Grayscale - Sobel

Final Wrapped up function for edge detection

def edge_detect(image_org):
    padding, stride = 1, 1

    rgb_weights = [0.2989, 0.5870, 0.1140]
    image = np.dot(image_org, rgb_weights)

    Gx = np.array([[1.0, 0.0, -1.0], [2.0, 0.0, -2.0], [1.0, 0.0, -1.0]])
    Gy = np.array([[1.0, 2.0, 1.0], [0.0, 0.0, 0.0], [-1.0, -2.0, -1.0]])

    image_height, image_width = image.shape

    output_height = (image_height + 2 * padding - 3) // stride + 1
    output_width = (image_width + 2 * padding - 3) // stride + 1
    A_sobel = np.zeros((output_height, output_width))

    padded_image = np.pad(image, padding, mode = "constant")
    Gx = np.flipud(np.fliplr(Gx))
    Gy = np.flipud(np.fliplr(Gy))

    for i in range(0, output_height, stride):
        for j in range(0, output_width, stride):
            A_sobel[i, j] = (np.sum(padded_image[i : i + 3, j : j + 3] * Gx)**2 + np.sum(padded_image[i : i + 3, j : j + 3] * Gy)**2)**0.5
    
    plt.imsave("Edge.jpeg", A_sobel, cmap = "gray")
    fig, ax = plt.subplots(nrows = 1, ncols = 2, figsize=(15, 15))
    ax[0].imshow(image_org)
    ax[0].set_title("Original Image")
    ax[1].imshow(A_sobel, cmap = "gray")
    ax[1].set_title("Edge-Detected")
    plt.show()

Pooling

def max_pool(image, kernel_size, stride):
    image_height, image_width, channels = image.shape
    kernel_height, kernel_width = kernel_size[0], kernel_size[1]

    output_height = (image_height - kernel_height) // stride + 1
    output_width = (image_width - kernel_width) // stride + 1
    output = np.zeros((output_height, output_width, 3))

    for c in range(channels):
        for i in range(0, output_height * stride, stride):
            for j in range(0, output_width * stride, stride):
                output[i // stride, j // stride, c] = np.max(image[i : i + kernel_height, j : j + kernel_width, c])
                # Replace np.max() with np.mean() for Average Pooling
                # output[i // stride, j // stride, c] = np.mean(image[i : i + kernel_height, j : j + kernel_width, c])
    
    final = output.astype(np.uint8) 
    return final

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
2D Convolution		2D Convolution
Convolution		Convolution
Gradient Descent		Gradient Descent
Linear Regression		Linear Regression
Maths		Maths
Object Orientated Programming		Object Orientated Programming
.DS_Store		.DS_Store
AvgPooled.jpeg		AvgPooled.jpeg
EdgeDetect.jpg		EdgeDetect.jpg
FurtherMaxPooled.jpeg		FurtherMaxPooled.jpeg
GradX_GradY.jpg		GradX_GradY.jpg
Image.jpeg		Image.jpeg
ImageSq.jpeg		ImageSq.jpeg
MaxPooled.jpeg		MaxPooled.jpeg
README.md		README.md
Sobel.jpeg		Sobel.jpeg
V1.gif		V1.gif
V2.gif		V2.gif
V3.gif		V3.gif
V4.gif		V4.gif
V5.gif		V5.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear Regression 2-dimensional

Loss/Error Function

Optimal $m$ and $b$

Gradient Descent

Visualizing the Loss Function $E(m, b)$

Visualizing the Gradient Descent every 5 epochs

Gradient Descent

Evaluating the descent

Descent to a minima

2-dimensional example

$F(x, y) = \sin(x)\cos(y)$

Visualizing the Descent

Convolution

2D Convolution

Implementing the 2d Convolution

Function for the convolution

The Edge Detection using Kernels - Sobel Operators

Original Image $(3264, 4928)$

Sobel Image $(3264, 4928)$

$A_x$ and $A_y$

Orginal - Grayscale - Sobel

Final Wrapped up function for edge detection

Pooling

Original Image $(6069, 4855)$

Max Pooled using $3\times3$ filter and stride = $3$ Final Image: $(2023, 1618)$

Further Max Pooled using $3\times3$ filter and stride = $3$ Final Image: $(674, 539)$

Average Pooled using $3\times3$ filter and stride = $3$ Final Image: $(2023, 1618)$

About

Releases

Packages

Languages

guntas-13/ML_Scratch

Folders and files

Latest commit

History

Repository files navigation

Linear Regression 2-dimensional

Loss/Error Function

Optimal $m$ and $b$

Gradient Descent

Visualizing the Loss Function $E(m, b)$

Visualizing the Gradient Descent every 5 epochs

Gradient Descent

Evaluating the descent

Descent to a minima

2-dimensional example

$F(x, y) = \sin(x)\cos(y)$

Visualizing the Descent

Convolution

2D Convolution

Implementing the 2d Convolution

Function for the convolution

The Edge Detection using Kernels - Sobel Operators

Original Image $(3264, 4928)$

Sobel Image $(3264, 4928)$

$A_x$ and $A_y$

Orginal - Grayscale - Sobel

Final Wrapped up function for edge detection

Pooling

Original Image $(6069, 4855)$

Max Pooled using $3\times3$ filter and stride = $3$ Final Image: $(2023, 1618)$

Further Max Pooled using $3\times3$ filter and stride = $3$ Final Image: $(674, 539)$

Average Pooled using $3\times3$ filter and stride = $3$ Final Image: $(2023, 1618)$

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages