2.1 Artificial neurons and network architecture

4 min readjuly 25, 2024

Artificial neurons are the building blocks of neural networks, mimicking biological neurons to process information. They consist of inputs, weights, summation and activation functions, and outputs, working together to simulate the flow of information in neural networks.

Neural network architecture organizes neurons into layers, including input, hidden, and output layers. Different types serve specific purposes, and network depth and width determine model complexity. Common architectures like feedforward, CNNs, and RNNs are designed for various tasks.

Artificial Neuron Structure and Function

Structure of artificial neurons

Top images from around the web for Structure of artificial neurons
Top images from around the web for Structure of artificial neurons
  • Artificial neuron components mimic biological neurons process information

    • Inputs receive data from external sources or other neurons (sensor readings, outputs from previous layers)
    • Weights determine importance of each input adjustable during training
    • Summation function combines weighted inputs aggregates information
    • determines neuron's output based on summation introduces non-linearity
    • Output produces final result transmitted to next layer or as network output
  • Neuron operation process simulates information flow in biological neural networks

    1. Receive input signals from connected neurons or external sources
    2. Multiply inputs by corresponding weights to emphasize important features
    3. Sum weighted inputs to consolidate information
    4. Apply activation function to introduce non-linearity and bound output
    5. Produce output for next layer or as final network result
  • Mathematical representation formalizes neuron computation

    • Output = f(i=1nwixi+b)f(\sum_{i=1}^n w_i x_i + b) encapsulates entire neuron operation
      • ff activation function (sigmoid, ReLU)
      • wiw_i weights learned during training
      • xix_i inputs from previous layer or external data
      • bb term allows shifting activation function

Architecture of neural networks

  • Neural network layers organize neurons for efficient information processing
    • receives initial data (image pixels, text embeddings)
    • Hidden layers process information between input and output extract features
    • produces final network results (class probabilities, regression values)
  • Layer types serve different purposes in network architecture
    • Fully connected (dense) layers connect all neurons between adjacent layers
    • Convolutional layers apply filters for feature extraction (edge detection, texture analysis)
    • Recurrent layers process sequential data with memory (time series, natural language)
  • Network depth and width determine model complexity and capacity
    • Depth number of hidden layers affects abstraction level of learned features
    • Width number of neurons in each layer influences information capacity
  • Common architectures designed for specific tasks
    • Feedforward neural networks for general-purpose tasks (classification, regression)
    • Convolutional neural networks (CNNs) for image and spatial data processing
    • Recurrent neural networks (RNNs) for sequential data and time series analysis
  • Information flow describes data propagation through network
    • Forward propagation moves data from input to output during inference
    • Backward propagation transmits error signals from output to input during training updates weights

Types of activation functions

  • Sigmoid function introduces non-linearity and bounds output
    • Formula: f(x)=11+exf(x) = \frac{1}{1 + e^{-x}} maps input to (0, 1) range
    • Output range: (0, 1) useful for binary classification
    • Properties: Smooth, differentiable, suffers from vanishing gradient problem
  • Hyperbolic tangent (tanh) function provides zero-centered alternative to sigmoid
    • Formula: f(x)=exexex+exf(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} maps input to (-1, 1) range
    • Output range: (-1, 1) allows for both positive and negative outputs
    • Properties: Zero-centered, steeper gradient than sigmoid, still prone to vanishing gradient
  • Rectified Linear Unit (ReLU) addresses vanishing gradient problem
    • Formula: f(x)=max(0,x)f(x) = max(0, x) allows positive values to pass through unchanged
    • Output range: [0, ∞) introduces sparsity in activations
    • Properties: Non-linear, computationally efficient, helps mitigate vanishing gradient
  • Leaky ReLU allows small negative values to pass through
    • Formula: f(x)=max(αx,x)f(x) = max(αx, x), where α is a small positive constant (0.01)
    • Properties: Allows small negative values, addresses dying ReLU problem prevents neurons from becoming permanently inactive
  • Softmax function used for multi-class classification problems
    • Outputs probabilities that sum to 1 across all classes
    • Useful for final layer in classification networks (image recognition, sentiment analysis)

Role of bias in neurons

  • Bias definition adds flexibility to neuron's decision-making
    • Constant term added to weighted sum of inputs
    • Allows shifting activation function left or right adjusts neuron's sensitivity
  • Purpose of bias increases model flexibility and expressiveness
    • Enables neuron to fire even when all inputs are zero
    • Helps capture underlying patterns in data not represented by input features alone
  • Effect on decision boundary influences neuron's classification behavior
    • Bias term moves decision boundary without changing its orientation
    • Allows fine-tuning of decision regions in feature space
  • Training bias learned during network optimization
    • Adjusted along with weights during
    • Helps network fit underlying data distribution more accurately
  • Bias-variance tradeoff balances model complexity and generalization
    • High bias: Underfitting, oversimplified model fails to capture important patterns
    • Low bias: Risk of , complex model may memorize training data
  • Implementation techniques for incorporating bias in neurons
    • Often implemented as extra input with constant value of 1
    • Weight connected to this input serves as bias term allows for unified weight update process

Key Terms to Review (18)

Accuracy: Accuracy refers to the measure of how often a model makes correct predictions compared to the total number of predictions made. It is a key performance metric that indicates the effectiveness of a model in classification tasks, impacting how well the model can generalize to unseen data and its overall reliability.
Activation Function: An activation function is a mathematical operation applied to the output of a neuron in a neural network that determines whether the neuron should be activated or not. It plays a critical role in introducing non-linearity into the model, allowing the network to learn complex patterns and relationships in the data.
Autoencoder: An autoencoder is a type of artificial neural network used to learn efficient representations of data, typically for the purpose of dimensionality reduction or feature learning. It consists of two main parts: the encoder, which compresses the input into a lower-dimensional representation, and the decoder, which reconstructs the input from that representation. This architecture allows autoencoders to capture essential features of the data while minimizing reconstruction error.
Backpropagation: Backpropagation is an algorithm used for training artificial neural networks by calculating the gradient of the loss function with respect to each weight through the chain rule. This method allows the network to adjust its weights in the opposite direction of the gradient to minimize the loss, making it a crucial component in optimizing neural networks.
Batch size: Batch size refers to the number of training examples utilized in one iteration of model training. This concept is crucial as it directly impacts how models learn from data and influences the overall efficiency of the training process. The choice of batch size affects memory usage, the stability of gradient updates, and ultimately, the performance of the model during and after training.
Bias: Bias refers to a systematic error in data processing or decision-making that can lead to unfair outcomes or misrepresentations. In the context of artificial intelligence and machine learning, bias can emerge from the data used to train models or the design of algorithms, affecting the performance and fairness of AI systems. Understanding bias is crucial as it impacts both the technical aspects of model training and the ethical considerations related to AI deployment and decision-making.
Convolutional Neural Network: A Convolutional Neural Network (CNN) is a type of deep learning model specifically designed for processing structured grid data, such as images. CNNs utilize layers of convolutional filters to automatically detect features and patterns, making them particularly effective for tasks like image recognition and classification. The architecture of CNNs often includes pooling layers and fully connected layers, allowing them to capture spatial hierarchies in data while reducing dimensionality and improving computational efficiency.
Epoch: An epoch is a complete pass through the entire training dataset during the training process of a machine learning model. Each epoch allows the model to learn from the data, update weights, and refine its understanding of patterns, which is essential for effective training. The number of epochs can significantly impact the model's performance, where too few epochs might lead to underfitting and too many can cause overfitting.
Feedforward Neural Network: A feedforward neural network is a type of artificial neural network where connections between the nodes do not form cycles. In this architecture, information moves in one direction—from input nodes, through hidden nodes, and finally to output nodes. This structure allows for straightforward data processing and is foundational in understanding how more complex networks function.
Gradient descent: Gradient descent is an optimization algorithm used to minimize the loss function in machine learning models by iteratively adjusting the parameters in the direction of the steepest descent of the loss function. This method is essential for training models, as it helps find the optimal weights that reduce prediction errors over time.
Input layer: The input layer is the first layer in a neural network where data enters the model for processing. It serves as the bridge between the raw input data and the subsequent layers, ensuring that the information is appropriately formatted for further computations. The input layer plays a crucial role in determining how data is presented to the network, influencing the performance of the entire model.
Layer: In the context of artificial neural networks, a layer is a collection of artificial neurons that process input data and pass their output to subsequent layers. Layers are fundamental to the architecture of neural networks, influencing how data flows and how information is transformed at each step of processing. Different types of layers, such as input, hidden, and output layers, work together to enable the network to learn complex patterns in data.
Loss function: A loss function is a mathematical representation that quantifies how well a model's predictions align with the actual target values. It serves as a guiding metric during training, allowing the optimization algorithm to adjust the model parameters to minimize prediction errors, thus improving performance.
Node: A node is a fundamental building block in artificial neural networks, representing an artificial neuron that processes input data and generates output. Each node receives signals from other nodes, applies a mathematical function to these inputs, and then transmits the result to subsequent nodes in the network. This structure enables complex computations and learning by mimicking how biological neurons communicate.
Output layer: The output layer is the final layer in a neural network that produces the predicted output for a given input, transforming the learned features from previous layers into a usable format. This layer directly influences the final prediction of the model, whether it be a classification label or a continuous value, making it essential for task-specific performance. Its structure and activation functions are critical as they determine how the information from preceding layers is interpreted and transformed into actionable results.
Overfitting: Overfitting occurs when a machine learning model learns not only the underlying patterns in the training data but also the noise, resulting in a model that performs well on training data but poorly on unseen data. This is a significant challenge in deep learning as it can lead to poor generalization, where the model fails to make accurate predictions on new data.
Recurrent Neural Network: A recurrent neural network (RNN) is a class of neural networks designed to recognize patterns in sequences of data, such as time series or natural language. Unlike traditional feedforward neural networks, RNNs maintain a form of memory by using loops within their architecture, allowing them to process input sequences of varying lengths and capture temporal dependencies between data points. This makes them particularly powerful for tasks involving sequential data, bridging concepts like artificial neurons and network architecture, dynamic computation graphs, and the implementation and evaluation of deep learning models.
Regularization: Regularization is a set of techniques used in machine learning to prevent overfitting by introducing additional information or constraints into the model. By penalizing overly complex models or adjusting the training process, regularization encourages simpler models that generalize better to unseen data. It’s essential for improving performance and reliability in various neural network architectures and loss functions.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.