Light

Formal Language Theory

🔤Formal Language Theory Unit 2 – Regular Languages & Finite Automata

Regular languages and finite automata form the foundation of formal language theory. They provide a simple yet powerful framework for describing patterns in strings, using concepts like alphabets, strings, and languages. Deterministic and non-deterministic finite automata are key tools for recognizing regular languages. These abstract machines, along with regular expressions, offer equivalent ways to represent and manipulate regular languages, with applications in various fields of computer science.

Study Guides for Unit 2

2.1

Deterministic finite automata (DFA)

6 min read

2.2

Nondeterministic finite automata (NFA)

6 min read

2.3

Regular expressions

4 min read

2.4

Equivalence of DFAs, NFAs, and regular expressions

4 min read

2.5

Pumping lemma for regular languages

5 min read

2.6

Closure properties of regular languages

4 min read

2.7

Minimization of finite automata

5 min read

Key Concepts and Definitions

Alphabet ( $\Sigma$ ) finite, non-empty set of symbols used to form strings and languages
String finite sequence of symbols from an alphabet
Language ( $L$ $L$ ) set of strings over a given alphabet
- Can be finite or infinite
- Denoted as $L \subseteq \Sigma^*$ , where $\Sigma^*$ represents the set of all possible strings over the alphabet $\Sigma$
Empty string ( $\epsilon$ ) string with no symbols, included in every language
Regular language language that can be described by a regular expression or accepted by a finite automaton
Deterministic Finite Automaton (DFA) finite state machine that accepts or rejects strings, transitioning deterministically between states
Non-deterministic Finite Automaton (NFA) finite state machine that can have multiple transitions for the same input, and can transition on $\epsilon$

Regular Languages: Basics and Properties

Regular languages are the simplest class of languages in the Chomsky hierarchy
Can be described using regular expressions or accepted by finite automata (DFAs or NFAs)
Closed under various operations, such as union, intersection, concatenation, and Kleene star
- If $L_1$ and $L_2$ are regular languages, then $L_1 \cup L_2$ , $L_1 \cap L_2$ , $L_1 \cdot L_2$ , and $L_1^*$ are also regular languages
Have a finite number of equivalence classes under the Myhill-Nerode relation
Pumpable property if a language is regular, then there exists a constant $p$ (pumping length) such that any string in the language longer than $p$ can be pumped
Not all languages are regular (e.g., $\{a^nb^n \mid n \geq 0\}$ , the language of balanced parentheses)

Finite Automata: Structure and Types

Finite automata are abstract machines that accept or reject strings
Consist of a finite set of states, a transition function, a start state, and a set of accepting states
Two main types Deterministic Finite Automata (DFAs) and Non-deterministic Finite Automata (NFAs)
- DFAs have a unique transition for each input symbol and state
- NFAs can have multiple transitions for the same input symbol and state, and can also have $\epsilon$ -transitions
Formally, a DFA is a 5-tuple $(Q, \Sigma, \delta, q_0, F)$ $(Q, Σ, δ, q_{0}, F)$ , where
- $Q$ is the finite set of states
- $\Sigma$ is the input alphabet
- $\delta$ is the transition function $Q \times \Sigma \rightarrow Q$
- $q_0$ is the start state
- $F \subseteq Q$ is the set of accepting states
NFAs have a similar structure but with a transition function $\delta Q \times (\Sigma \cup \{\epsilon\}) \rightarrow \mathcal{P}(Q)$ , where $\mathcal{P}(Q)$ denotes the power set of $Q$
Every NFA can be converted to an equivalent DFA using the subset construction algorithm

Regular Expressions and Their Equivalence

Regular expressions are a concise way to describe regular languages
Consist of symbols from the alphabet, along with operators union ( $|$ $∣$ ), concatenation ( $\cdot$ $\cdot$ ), and Kleene star ( $*$ $*$ )
- Union $L_1 | L_2$ represents the set of strings that are in either $L_1$ or $L_2$
- Concatenation $L_1 \cdot L_2$ represents the set of strings obtained by concatenating a string from $L_1$ with a string from $L_2$
- Kleene star $L^*$ represents the set of strings obtained by concatenating zero or more strings from $L$
Regular expressions can also include parentheses for grouping and the empty string $\epsilon$
Equivalent in power to finite automata (DFAs and NFAs)
- For every regular expression, there exists an equivalent NFA
- For every NFA, there exists an equivalent regular expression
Can be converted to finite automata using algorithms like Thompson's construction or the state elimination method

Closure Properties of Regular Languages

Regular languages are closed under various operations, meaning that applying these operations to regular languages results in another regular language
Closed under union if $L_1$ and $L_2$ are regular languages, then $L_1 \cup L_2$ is also a regular language
Closed under intersection if $L_1$ and $L_2$ are regular languages, then $L_1 \cap L_2$ is also a regular language
Closed under concatenation if $L_1$ and $L_2$ are regular languages, then $L_1 \cdot L_2$ is also a regular language
Closed under Kleene star if $L$ is a regular language, then $L^*$ is also a regular language
Closed under complement if $L$ $L$ is a regular language, then its complement $\overline{L}$ $\overline{L}$ is also a regular language
- The complement of a language $L$ is the set of all strings over the alphabet that are not in $L$
These closure properties allow for the construction of new regular languages from existing ones and are useful in proving the regularity of languages

Pumping Lemma and Its Applications

The Pumping Lemma is a powerful tool for proving that certain languages are not regular
States that for every regular language $L$ $L$ , there exists a constant $p$ $p$ (the pumping length) such that any string $s \in L$ $s \in L$ with $|s| \geq p$ $∣ s ∣ \geq p$ can be decomposed into three parts $s = xyz$ $s = x yz$ , satisfying
- $|xy| \leq p$
- $|y| \geq 1$
- $xy^iz \in L$ for all $i \geq 0$
To use the Pumping Lemma to prove that a language $L$ $L$ is not regular
- Assume $L$ is regular
- Choose a string $s \in L$ with $|s| \geq p$
- Decompose $s$ into $xyz$ according to the Pumping Lemma
- Show that there exists an $i \geq 0$ such that $xy^iz \notin L$ , contradicting the Pumping Lemma
Examples of non-regular languages that can be proven using the Pumping Lemma
- $L = \{a^nb^n \mid n \geq 0\}$
- $L = \{ww \mid w \in \{a, b\}^*\}$ (the language of palindromes)
The Pumping Lemma is a necessary but not sufficient condition for regularity, meaning that some non-regular languages may satisfy the Pumping Lemma

Limitations of Regular Languages

Regular languages have several limitations in terms of the types of patterns they can describe
Cannot count or match an arbitrary number of occurrences of a symbol
- For example, the language $\{a^nb^n \mid n \geq 0\}$ is not regular because it requires matching the number of $a$ 's with the number of $b$ 's
Cannot recognize nested or recursive structures
- The language of balanced parentheses $\{(^n)^n \mid n \geq 0\}$ is not regular because it requires matching opening and closing parentheses
Cannot express languages that require an unbounded amount of memory to recognize
- The language of palindromes $\{ww^R \mid w \in \{a, b\}^*\}$ is not regular because recognizing palindromes requires remembering an arbitrary number of symbols
These limitations stem from the finite nature of the automata that recognize regular languages (DFAs and NFAs)
More powerful language classes, such as context-free languages and recursively enumerable languages, can describe patterns that regular languages cannot

Real-World Applications and Examples

Regular languages and finite automata have numerous applications in computer science and beyond
Lexical analysis in compilers regular expressions are used to specify the valid tokens of a programming language, and finite automata (usually DFAs) are used to recognize these tokens
Pattern matching in text editors regular expressions are used to search for and manipulate specific patterns in text files
Network protocol validation finite automata can model the valid sequences of messages in network protocols (e.g., TCP handshake)
Hardware design finite state machines, which are essentially finite automata, are used to design digital circuits and control systems
Natural language processing regular expressions are used for tasks like tokenization, sentence segmentation, and named entity recognition
Bioinformatics regular expressions are used to search for patterns in DNA and protein sequences

Examples of regular languages in practice

Email addresses

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

Phone numbers

(\+\d{1,3}\s?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}

Simple arithmetic expressions
```
[0-9]+(\+|-|\*|/)[0-9]+
```

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

Stay Connected

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Glossary

🔤Formal Language Theory Unit 2 – Regular Languages & Finite Automata

Study Guides for Unit 2

Key Concepts and Definitions

Regular Languages: Basics and Properties

Finite Automata: Structure and Types

Regular Expressions and Their Equivalence

Closure Properties of Regular Languages

Pumping Lemma and Its Applications

Limitations of Regular Languages

Real-World Applications and Examples

About Us

Resources

Stay Connected

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

2.1 Deterministic finite automata (DFA)