scoresvideos
Deep Learning Systems
Table of Contents

🧐deep learning systems review

2.1 Artificial neurons and network architecture

Citation:

Artificial neurons are the building blocks of neural networks, mimicking biological neurons to process information. They consist of inputs, weights, summation and activation functions, and outputs, working together to simulate the flow of information in neural networks.

Neural network architecture organizes neurons into layers, including input, hidden, and output layers. Different layer types serve specific purposes, and network depth and width determine model complexity. Common architectures like feedforward, CNNs, and RNNs are designed for various tasks.

Artificial Neuron Structure and Function

Structure of artificial neurons

  • Artificial neuron components mimic biological neurons process information

    • Inputs receive data from external sources or other neurons (sensor readings, outputs from previous layers)
    • Weights determine importance of each input adjustable during training
    • Summation function combines weighted inputs aggregates information
    • Activation function determines neuron's output based on summation introduces non-linearity
    • Output produces final result transmitted to next layer or as network output
  • Neuron operation process simulates information flow in biological neural networks

    1. Receive input signals from connected neurons or external sources
    2. Multiply inputs by corresponding weights to emphasize important features
    3. Sum weighted inputs to consolidate information
    4. Apply activation function to introduce non-linearity and bound output
    5. Produce output for next layer or as final network result
  • Mathematical representation formalizes neuron computation

    • Output = $f(\sum_{i=1}^n w_i x_i + b)$ encapsulates entire neuron operation
      • $f$ activation function (sigmoid, ReLU)
      • $w_i$ weights learned during training
      • $x_i$ inputs from previous layer or external data
      • $b$ bias term allows shifting activation function

Architecture of neural networks

  • Neural network layers organize neurons for efficient information processing
    • Input layer receives initial data (image pixels, text embeddings)
    • Hidden layers process information between input and output extract features
    • Output layer produces final network results (class probabilities, regression values)
  • Layer types serve different purposes in network architecture
    • Fully connected (dense) layers connect all neurons between adjacent layers
    • Convolutional layers apply filters for feature extraction (edge detection, texture analysis)
    • Recurrent layers process sequential data with memory (time series, natural language)
  • Network depth and width determine model complexity and capacity
    • Depth number of hidden layers affects abstraction level of learned features
    • Width number of neurons in each layer influences information capacity
  • Common architectures designed for specific tasks
    • Feedforward neural networks for general-purpose tasks (classification, regression)
    • Convolutional neural networks (CNNs) for image and spatial data processing
    • Recurrent neural networks (RNNs) for sequential data and time series analysis
  • Information flow describes data propagation through network
    • Forward propagation moves data from input to output during inference
    • Backward propagation transmits error signals from output to input during training updates weights

Types of activation functions

  • Sigmoid function introduces non-linearity and bounds output
    • Formula: $f(x) = \frac{1}{1 + e^{-x}}$ maps input to (0, 1) range
    • Output range: (0, 1) useful for binary classification
    • Properties: Smooth, differentiable, suffers from vanishing gradient problem
  • Hyperbolic tangent (tanh) function provides zero-centered alternative to sigmoid
    • Formula: $f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$ maps input to (-1, 1) range
    • Output range: (-1, 1) allows for both positive and negative outputs
    • Properties: Zero-centered, steeper gradient than sigmoid, still prone to vanishing gradient
  • Rectified Linear Unit (ReLU) addresses vanishing gradient problem
    • Formula: $f(x) = max(0, x)$ allows positive values to pass through unchanged
    • Output range: [0, ∞) introduces sparsity in activations
    • Properties: Non-linear, computationally efficient, helps mitigate vanishing gradient
  • Leaky ReLU allows small negative values to pass through
    • Formula: $f(x) = max(αx, x)$, where α is a small positive constant (0.01)
    • Properties: Allows small negative values, addresses dying ReLU problem prevents neurons from becoming permanently inactive
  • Softmax function used for multi-class classification problems
    • Outputs probabilities that sum to 1 across all classes
    • Useful for final layer in classification networks (image recognition, sentiment analysis)

Role of bias in neurons

  • Bias definition adds flexibility to neuron's decision-making
    • Constant term added to weighted sum of inputs
    • Allows shifting activation function left or right adjusts neuron's sensitivity
  • Purpose of bias increases model flexibility and expressiveness
    • Enables neuron to fire even when all inputs are zero
    • Helps capture underlying patterns in data not represented by input features alone
  • Effect on decision boundary influences neuron's classification behavior
    • Bias term moves decision boundary without changing its orientation
    • Allows fine-tuning of decision regions in feature space
  • Training bias learned during network optimization
    • Adjusted along with weights during backpropagation
    • Helps network fit underlying data distribution more accurately
  • Bias-variance tradeoff balances model complexity and generalization
    • High bias: Underfitting, oversimplified model fails to capture important patterns
    • Low bias: Risk of overfitting, complex model may memorize training data
  • Implementation techniques for incorporating bias in neurons
    • Often implemented as extra input with constant value of 1
    • Weight connected to this input serves as bias term allows for unified weight update process