Back to blog
14 min readNgusa

Deep Learning Fundamentals: Neural Networks Explained

Deep learning has revolutionized AI. Let's break down the fundamental concepts that power modern machine learning systems.

What is Deep Learning?

Deep learning uses artificial neural networks with multiple layers to learn hierarchical representations of data.

Neural Network Basics

The Neuron

python
import numpy as np

def neuron(inputs, weights, bias):
    # Weighted sum + bias
    z = np.dot(inputs, weights) + bias
    # Activation function (ReLU)
    return max(0, z)

Multi-Layer Network

python
import torch.nn as nn

class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        return x

Key Concepts

1. Activation Functions

  • ReLU - Most common, prevents vanishing gradients
  • Sigmoid - Outputs 0-1, good for probabilities
  • Softmax - Multi-class classification

2. Loss Functions

  • MSE - Regression tasks
  • Cross-Entropy - Classification tasks
  • Custom losses - Task-specific optimization

3. Backpropagation

The algorithm that trains neural networks by:

  • Forward pass - compute predictions
  • Calculate loss
  • Backward pass - compute gradients
  • Update weights
python
# Training loop
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

for epoch in range(num_epochs):
    # Forward pass
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Common Architectures

Convolutional Neural Networks (CNNs)

Perfect for image data:

python
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.fc = nn.Linear(64 * 8 * 8, 10)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * 8 * 8)
        return self.fc(x)

Transformers

State-of-the-art for NLP:

  • Self-attention mechanism
  • Positional encoding
  • Multi-head attention
  • Feed-forward layers

Training Best Practices

  • Data Preprocessing

- Normalization

- Augmentation

- Train/val/test splits

  • Regularization

- Dropout

- L1/L2 regularization

- Batch normalization

  • Optimization

- Adam optimizer (good default)

- Learning rate scheduling

- Gradient clipping

  • Monitoring

- Training vs validation loss

- Early stopping

- Tensorboard visualization

Common Pitfalls

  • Overfitting - Model memorizes training data
  • Vanishing gradients - Use ReLU, batch norm
  • Poor initialization - Use Xavier/He initialization
  • Learning rate - Too high → divergence, too low → slow training

Conclusion

Deep learning is powerful but requires understanding of fundamentals. Start with simple architectures, understand the math, and gradually tackle more complex problems.

References & Further Reading

    Deep Learning Fundamentals: Neural Networks Explained | Samwel Ngusa