Deep Learning Fundamentals: Neural Networks Explained
Deep learning has revolutionized AI. Let's break down the fundamental concepts that power modern machine learning systems.
What is Deep Learning?
Deep learning uses artificial neural networks with multiple layers to learn hierarchical representations of data.
Neural Network Basics
The Neuron
import numpy as np
def neuron(inputs, weights, bias):
# Weighted sum + bias
z = np.dot(inputs, weights) + bias
# Activation function (ReLU)
return max(0, z)Multi-Layer Network
import torch.nn as nn
class SimpleNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.layer1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.layer2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = self.layer1(x)
x = self.relu(x)
x = self.layer2(x)
return xKey Concepts
1. Activation Functions
- ReLU - Most common, prevents vanishing gradients
- Sigmoid - Outputs 0-1, good for probabilities
- Softmax - Multi-class classification
2. Loss Functions
- MSE - Regression tasks
- Cross-Entropy - Classification tasks
- Custom losses - Task-specific optimization
3. Backpropagation
The algorithm that trains neural networks by:
- Forward pass - compute predictions
- Calculate loss
- Backward pass - compute gradients
- Update weights
# Training loop
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
for epoch in range(num_epochs):
# Forward pass
outputs = model(inputs)
loss = criterion(outputs, targets)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()Common Architectures
Convolutional Neural Networks (CNNs)
Perfect for image data:
class SimpleCNN(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
self.fc = nn.Linear(64 * 8 * 8, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 64 * 8 * 8)
return self.fc(x)Transformers
State-of-the-art for NLP:
- Self-attention mechanism
- Positional encoding
- Multi-head attention
- Feed-forward layers
Training Best Practices
- Data Preprocessing
- Normalization
- Augmentation
- Train/val/test splits
- Regularization
- Dropout
- L1/L2 regularization
- Batch normalization
- Optimization
- Adam optimizer (good default)
- Learning rate scheduling
- Gradient clipping
- Monitoring
- Training vs validation loss
- Early stopping
- Tensorboard visualization
Common Pitfalls
- Overfitting - Model memorizes training data
- Vanishing gradients - Use ReLU, batch norm
- Poor initialization - Use Xavier/He initialization
- Learning rate - Too high → divergence, too low → slow training
Conclusion
Deep learning is powerful but requires understanding of fundamentals. Start with simple architectures, understand the math, and gradually tackle more complex problems.