Light

2.3 PGAS languages (UPC, Coarray Fortran)

11 min read•august 20, 2024

PGAS languages like and offer a shared memory view for distributed systems. They simplify parallel programming by allowing data access across nodes using a , while maintaining distributed memory performance benefits.

These languages extend C and Fortran with PGAS features, aiming to boost productivity in Exascale computing. They provide a balance between ease of use and performance, addressing challenges in large-scale parallel programming.

Overview of PGAS languages

PGAS (Partitioned Global Address Space) languages provide a shared memory programming model for distributed memory systems, simplifying parallel programming for Exascale computing
PGAS languages allow programmers to access and manipulate data across multiple nodes using a global address space, while maintaining the performance benefits of distributed memory architectures
Two prominent PGAS languages are UPC (Unified Parallel C) and Coarray Fortran, which extend the C and Fortran languages respectively to support PGAS concepts

UPC and Coarray Fortran

Top images from around the web for UPC and Coarray Fortran

Introduction to parallel & distributed algorithms View original
Is this image relevant?
Fortran View original
Is this image relevant?
Introduction to parallel & distributed algorithms View original
Is this image relevant?
Fortran View original
Is this image relevant?

1 of 2

Top images from around the web for UPC and Coarray Fortran

Introduction to parallel & distributed algorithms View original
Is this image relevant?
Fortran View original
Is this image relevant?
Introduction to parallel & distributed algorithms View original
Is this image relevant?
Fortran View original
Is this image relevant?

1 of 2

UPC is an extension of the C programming language that incorporates PGAS features, allowing programmers to write parallel code using familiar C syntax and semantics
Coarray Fortran extends Fortran with coarrays, which are distributed arrays that can be accessed and manipulated by multiple processes simultaneously
Both UPC and Coarray Fortran aim to provide a more productive and efficient way to write parallel programs for Exascale systems compared to traditional message passing approaches

Key characteristics of PGAS

PGAS languages provide a global address space that is logically partitioned across multiple processes or threads, enabling each process to access both local and remote data
The global address space is typically divided into private and shared regions, with each process having its own private memory space and a portion of the shared memory space
PGAS languages support one-sided communication, allowing processes to access remote data directly without explicit coordination with the remote process, reducing communication overhead
Synchronization mechanisms are provided to ensure data consistency and prevent race conditions when accessing shared data across multiple processes

Partitioned global address space

The partitioned global address space is a key concept in PGAS languages, enabling a shared memory view of distributed memory systems
In PGAS, the global memory is logically partitioned across multiple processes or nodes, with each process having a portion of the global address space
This partitioning allows for efficient local memory access while still providing a global view of the memory space

Logical partitioning of memory

The global memory space in PGAS is logically divided into partitions, with each partition assigned to a specific process or node
Each process has fast access to its local partition of the global memory, while accessing data in remote partitions may incur communication overhead
The logical partitioning of memory allows programmers to exploit data and minimize remote memory access for improved performance

Local vs global memory access

PGAS languages distinguish between local and global memory access, with local access being faster than global access
Local memory access refers to a process accessing data within its own partition of the global address space, which typically involves no communication overhead
Global memory access involves a process accessing data in a remote partition, which requires communication between processes and may have higher and lower compared to local access

Implications for performance

The performance of PGAS applications depends on the balance between local and global memory access, as well as the efficiency of communication between processes
Minimizing global memory access and optimizing communication patterns can significantly improve the performance of PGAS applications
Proper data distribution and locality-aware programming techniques are crucial for achieving high performance in PGAS languages, especially at Exascale

UPC (Unified Parallel C)

UPC is an extension of the C programming language designed for parallel programming using the PGAS model
UPC adds new keywords, data types, and constructs to the C language to support parallel programming, while maintaining backward compatibility with standard C

Extensions to C language

UPC introduces the
```
THREADS
```
keyword to specify the number of threads or processes in a parallel program
The
```
shared
```
keyword is used to declare variables that are accessible by all threads in the global address space
UPC also provides synchronization primitives, such as barriers and locks, to coordinate access to shared data and prevent race conditions

PGAS memory model in UPC

In UPC, the global address space is partitioned into shared and private regions, with each thread having its own private memory space and a portion of the shared space
Shared variables are distributed across the threads in a round-robin fashion by default, but programmers can specify custom data layouts using the
```
layout
```
keyword
UPC provides pointer-to-shared and pointer-to-local data types to distinguish between pointers that reference shared and private memory, respectively

UPC parallel programming constructs

UPC supports parallel loops using the
```
upc_forall
```
construct, which distributes loop iterations across the available threads
The
```
upc_barrier
```
function is used to synchronize all threads at a specific point in the program, ensuring that all threads have completed their work before proceeding
UPC also provides functions for collective communication, such as
```
upc_all_broadcast
```
and
```
upc_all_reduce
```
, which perform operations across all threads

UPC shared vs private variables

UPC distinguishes between shared and private variables, with shared variables accessible by all threads and private variables only accessible by the owning thread
Shared variables are declared using the
```
shared
```
keyword and are distributed across the threads in the global address space
Private variables are declared without the
```
shared
```
keyword and are only accessible within the local memory space of each thread

Synchronization mechanisms in UPC

UPC provides various synchronization mechanisms to ensure data consistency and prevent race conditions when accessing shared variables
UPC supports barriers, which synchronize all threads at a specific point in the program, ensuring that all threads have completed their work before proceeding
Locks, such as
```
upc_lock_t
```
, are used to protect critical sections of code and prevent multiple threads from simultaneously accessing shared data
UPC also provides non-blocking communication primitives, such as
```
upc_memput_async
```
and
```
upc_memget_async
```
, which allow for overlapping computation and communication

Coarray Fortran

Coarray Fortran is an extension of the Fortran programming language that supports PGAS programming using coarrays
Coarrays are distributed arrays that can be accessed and manipulated by multiple processes simultaneously, providing a shared memory view of distributed data

Fortran extensions for PGAS

Coarray Fortran introduces the
```
codimension
```
keyword to declare coarrays, which are arrays that are distributed across multiple processes
The
```
sync
```
keyword is used to synchronize access to coarrays, ensuring data consistency and preventing race conditions
Coarray Fortran also provides intrinsic functions for communication and synchronization, such as
```
co_sum
```
and
```
co_broadcast
```

Coarray syntax and semantics

Coarrays are declared using the
```
codimension
```
keyword followed by the dimensions of the coarray in square brackets
Each process has its own local instance of a coarray, and the
```
codimension
```
specifies the distribution of the coarray across the processes
Coarray elements can be accessed using the usual array indexing syntax, with the addition of a
```
codimension
```
index to specify the remote process

Coarray data distribution

Coarray Fortran allows programmers to specify the distribution of coarray elements across the processes using the
```
codimension
```
declaration
By default, coarrays are distributed in a block fashion, with each process receiving a contiguous block of elements
Programmers can also specify custom data distributions using the
```
cobounds
```
directive, which allows for more control over the distribution of coarray elements

Synchronization with coarrays

Coarray Fortran provides synchronization mechanisms to ensure data consistency and prevent race conditions when accessing coarray elements
The
```
sync
```
keyword is used to synchronize access to coarrays, ensuring that all processes have completed their updates before any process can access the data
The
```
sync all
```
statement synchronizes all processes, while
```
sync images
```
synchronizes a subset of processes specified by an integer array
Coarray Fortran also provides critical sections and locks for more fine-grained synchronization of shared data access

Performance considerations

Achieving high performance in PGAS languages requires careful consideration of data distribution, communication patterns, and synchronization
Minimizing remote memory access, optimizing communication, and balancing computation and communication are key factors in maximizing the performance of PGAS applications

Minimizing remote memory access

Remote memory access in PGAS languages typically incurs higher latency and lower bandwidth compared to local memory access
To minimize remote memory access, programmers should strive to distribute data across processes in a way that maximizes local access and reduces the need for remote communication
Techniques such as data replication, caching, and prefetching can help reduce the impact of remote memory access on application performance

Optimizing communication patterns

Efficient communication is crucial for the performance of PGAS applications, especially at large scales
Programmers should aim to minimize the number and size of messages exchanged between processes, using techniques such as message aggregation and collective communication operations
Overlapping computation and communication can help hide communication latency and improve overall application performance

Balancing computation and communication

Achieving a balance between computation and communication is essential for the scalability and performance of PGAS applications
Programmers should aim to distribute the computational workload evenly across processes while minimizing the communication overhead
Techniques such as load balancing, asynchronous communication, and communication-computation overlap can help achieve a better balance and improve application performance

Scalability of PGAS applications

The scalability of PGAS applications depends on various factors, including the problem size, data distribution, communication patterns, and synchronization requirements
To ensure good scalability, programmers should aim to minimize global synchronization, exploit data locality, and use efficient communication primitives
Proper performance analysis and tuning are essential for identifying and addressing scalability bottlenecks in PGAS applications

Comparison of UPC and Coarray Fortran

UPC and Coarray Fortran are both PGAS languages but differ in their language features, syntax, and performance characteristics
Understanding the differences between these languages can help programmers choose the most suitable language for their specific application and performance requirements

Language features and syntax

UPC is based on the C programming language and extends it with PGAS features using keywords such as
```
shared
```
and
```
upc_forall
```
Coarray Fortran, on the other hand, extends Fortran with coarrays and uses keywords such as
```
codimension
```
and
```
sync
```
The syntax and programming style of UPC and Coarray Fortran reflect their respective base languages, which may influence the choice of language for programmers with different backgrounds

Performance tradeoffs

The performance of UPC and Coarray Fortran applications can vary depending on factors such as the problem size, data distribution, communication patterns, and compiler optimizations
UPC's performance is often influenced by the efficiency of its shared memory access and the overhead of its synchronization primitives
Coarray Fortran's performance depends on the efficiency of its coarray communication and synchronization, as well as the optimization capabilities of the Fortran compiler
Comparative studies have shown that the performance of UPC and Coarray Fortran can be similar for certain applications, but the specific performance characteristics may vary depending on the problem and the implementation details

Interoperability with other languages

Both UPC and Coarray Fortran can interoperate with other programming languages and parallel programming models, such as MPI and OpenMP
UPC can interface with C and C++ code, allowing programmers to leverage existing libraries and code bases
Coarray Fortran can interoperate with other Fortran code and can also interface with C and other languages using Fortran's interoperability features
The interoperability of UPC and Coarray Fortran with other languages and programming models is important for integrating PGAS into existing applications and workflows

PGAS vs message passing

PGAS languages, such as UPC and Coarray Fortran, offer an alternative to traditional message passing models like MPI for parallel programming
Understanding the differences between PGAS and message passing can help programmers choose the most appropriate programming model for their specific application and performance requirements

Productivity and ease of use

PGAS languages aim to provide a more productive and user-friendly programming model compared to message passing
The shared memory abstraction in PGAS allows programmers to access and manipulate distributed data using familiar programming constructs, such as arrays and pointers
PGAS languages often require less explicit communication and synchronization compared to message passing, which can simplify the development of parallel applications
However, the learning curve for PGAS languages may be steeper for programmers who are already familiar with message passing models like MPI

Performance at scale

The performance of PGAS and message passing applications at scale depends on various factors, such as the problem size, communication patterns, and hardware characteristics
Message passing models like MPI have been widely used and optimized for large-scale parallel applications, with extensive support for efficient communication and synchronization primitives
PGAS languages, while offering productivity advantages, may face challenges in terms of performance at extreme scales due to the overhead of remote memory access and the need for efficient synchronization
The scalability of PGAS applications depends on the ability to minimize remote memory access, optimize communication patterns, and leverage hardware support for efficient PGAS operations

Suitability for different problem domains

The choice between PGAS and message passing depends on the specific characteristics of the problem domain and the application requirements
PGAS languages are well-suited for applications with irregular data structures, dynamic communication patterns, and fine-grained data sharing, such as graph algorithms and adaptive mesh refinement
Message passing models like MPI are often preferred for applications with regular communication patterns, bulk synchronous parallelism, and coarse-grained data exchange, such as stencil computations and matrix operations
Hybrid programming models that combine PGAS and message passing can offer the best of both worlds, allowing programmers to leverage the strengths of each model for different parts of the application

Advanced topics in PGAS

As PGAS languages continue to evolve and mature, several advanced topics have emerged that are relevant for Exascale computing and beyond
These topics include hybrid programming, support for irregular data structures, fault tolerance, and the integration of PGAS with other parallel programming models

Hybrid programming with PGAS and MPI

Hybrid programming models that combine PGAS languages with message passing models like MPI can offer the benefits of both approaches
In a hybrid PGAS-MPI model, PGAS can be used for fine-grained, irregular communication within a node, while MPI can be used for coarse-grained, regular communication between nodes
Hybrid programming can help optimize the performance and scalability of PGAS applications by leveraging the strengths of each programming model for different aspects of the application
However, hybrid programming also introduces additional complexity and requires careful design and tuning to achieve optimal performance

Irregular data structures in PGAS

PGAS languages have been traditionally used for applications with regular data structures and communication patterns, but there is growing interest in supporting irregular data structures
Irregular data structures, such as graphs and unstructured meshes, pose challenges for PGAS languages due to their dynamic nature and non-uniform data access patterns
Research efforts have focused on extending PGAS languages with support for irregular data structures, such as global pointers, distributed containers, and partitioned global address space maps
Efficient support for irregular data structures in PGAS can enable a wider range of applications to benefit from the productivity and performance advantages of PGAS programming

Fault tolerance in PGAS applications

Fault tolerance is a critical concern for Exascale computing, as the increasing scale and complexity of systems make failures more likely
PGAS languages face challenges in providing efficient fault tolerance mechanisms due to their global address space abstraction and the need for consistent data access across processes
Research efforts have explored various fault tolerance techniques for PGAS, such as checkpoint-restart, message logging, and redundant computation
Integrating fault tolerance into PGAS languages and applications requires careful consideration of the trade-offs between performance, scalability, and resilience, as well as the development of efficient and transparent fault tolerance mechanisms

Key Terms to Review (17)

Bandwidth: Bandwidth refers to the maximum rate at which data can be transferred over a communication channel or network in a given amount of time. It is a critical factor in determining system performance, especially in high-performance computing, as it affects how quickly data can be moved between different levels of memory and processors, impacting overall computation efficiency.

Coarray Fortran: Coarray Fortran is an extension of the Fortran programming language that introduces the Partitioned Global Address Space (PGAS) model, allowing for easy parallel programming. It enables multiple processes to share data in a distributed memory environment by providing a simple syntax for accessing remote data, making it easier to develop applications that run on high-performance computing systems. This feature is particularly relevant in the context of exascale computing, where performance and scalability are crucial.

Coarray Fortran 2008: Coarray Fortran 2008 is an extension of the Fortran programming language that introduces support for parallel programming through a Partitioned Global Address Space (PGAS) model. It allows multiple instances of a program, called images, to communicate and share data with each other efficiently, making it suitable for high-performance computing applications. This feature enables developers to write code that can run on distributed memory systems while maintaining the simplicity and familiarity of the Fortran language.

Data partitioning: Data partitioning refers to the process of dividing a large dataset into smaller, manageable pieces, often to improve performance and enable parallel processing. This technique is essential for optimizing the efficiency of computation in high-performance environments, allowing multiple processes or threads to work on different segments of data simultaneously. Effective data partitioning ensures balanced workloads, minimizes communication overhead, and enhances overall scalability.

Distributed memory model: The distributed memory model is a parallel computing architecture where each processor has its own local memory, and processors communicate with one another through explicit message passing. This model enables scalability and efficiency in high-performance computing, as it allows multiple processors to work on different parts of a problem simultaneously while minimizing memory bottlenecks. The distributed memory model is particularly relevant for programming models that support partitioned global address space, allowing programmers to utilize languages designed for this architecture effectively.

GasNet: GasNet is a communication library designed for Partitioned Global Address Space (PGAS) programming models, providing low-level network communication support. It acts as an abstraction layer that enables efficient communication between processes, allowing PGAS languages like UPC (Unified Parallel C) and Coarray Fortran to utilize high-performance networking features without needing to manage the complexities of the underlying hardware directly. GasNet facilitates one-sided communication, which is crucial for achieving high performance in distributed memory systems.

Global Address Space: A global address space refers to a unified memory model that allows all processes in a parallel computing environment to access memory locations as if they are part of a single, shared memory. This concept is fundamental for programming models that aim to simplify communication and data sharing among distributed systems by allowing different nodes to read and write to a common memory space seamlessly.

High-Performance Computing: High-performance computing (HPC) refers to the use of supercomputers and parallel processing techniques to solve complex computational problems at high speeds. HPC systems are designed to handle vast amounts of data and perform a large number of calculations simultaneously, making them essential for tasks such as simulations, data analysis, and modeling in various fields like science, engineering, and finance.

Latency: Latency refers to the time delay experienced in a system, particularly in the context of data transfer and processing. This delay can significantly impact performance in various computing environments, including memory access, inter-process communication, and network communications.

Locality: Locality refers to the principle that the performance of a computational task can be significantly improved by minimizing the distance data has to travel between memory and processors. This concept is crucial in parallel computing, particularly when working with Partitioned Global Address Space (PGAS) languages, where understanding and managing data locality allows for more efficient memory access patterns and reduced communication overhead.

Opencoarray: opencoarray is an extension of the Fortran programming language that provides support for the Partitioned Global Address Space (PGAS) model, enabling parallel computing through coarrays. This allows programmers to write code that can efficiently utilize distributed memory architectures by enabling data sharing across different processing units while maintaining a simple and intuitive syntax.

Original UPC Specification: The original UPC specification refers to the first formal design of the Unified Parallel C (UPC) programming language, which is a parallel extension of the C programming language specifically designed for shared-memory and distributed-memory architectures. This specification establishes the foundational principles and syntax that allow programmers to express parallelism in a straightforward manner, facilitating efficient multi-threaded applications in high-performance computing environments.

PGAS vs. MPI: PGAS (Partitioned Global Address Space) and MPI (Message Passing Interface) are two different programming models used for parallel computing. PGAS languages like UPC and Coarray Fortran allow for a shared memory-like view of data, enabling easier data access across different nodes, while MPI focuses on message passing between distributed processes. Understanding the differences between these two models is essential for optimizing performance in high-performance computing applications, especially as we move towards exascale computing.

PGAS vs. Shared Memory: PGAS (Partitioned Global Address Space) and shared memory are two different programming models used for parallel computing. While shared memory allows multiple threads to access the same memory space, PGAS divides the memory into partitions that can be accessed by different processes, making it easier to manage data locality and reduce communication overhead. This distinction influences how languages like UPC and Coarray Fortran handle parallelism, allowing developers to optimize performance and scalability in high-performance computing environments.

Scientific simulations: Scientific simulations are computational models used to replicate and analyze complex systems or phenomena in various scientific fields. They allow researchers to explore scenarios that may be difficult or impossible to study in the real world, enabling predictions and insights into behavior, interactions, and outcomes. This is particularly relevant in programming environments that support parallel computing, as well as in cutting-edge applications involving artificial intelligence.

Shared memory model: The shared memory model is a programming paradigm where multiple processes or threads can access a common memory space to read and write data. This model allows for efficient communication between processes, as they can directly share data without the need for explicit message passing. It is particularly important in parallel computing, enabling faster data access and manipulation, especially when utilizing PGAS languages that optimize memory access patterns.

UPC: UPC stands for Unified Parallel C, which is a parallel programming language based on the C programming language. It allows developers to write applications that can efficiently utilize multiple processors or cores, making it well-suited for high-performance computing. By supporting a Partitioned Global Address Space (PGAS) model, UPC facilitates easier data sharing and communication among processes, which is essential for scalable applications in Exascale computing environments.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Practice QuizGlossary

Practice Quiz Glossary

2.3 PGAS languages (UPC, Coarray Fortran)

Overview of PGAS languages

UPC and Coarray Fortran

Top images from around the web for UPC and Coarray Fortran

Top images from around the web for UPC and Coarray Fortran

Key characteristics of PGAS

Partitioned global address space

Logical partitioning of memory

Local vs global memory access

Implications for performance

UPC (Unified Parallel C)

Extensions to C language

PGAS memory model in UPC

UPC parallel programming constructs

UPC shared vs private variables

Synchronization mechanisms in UPC

Coarray Fortran

Fortran extensions for PGAS

Coarray syntax and semantics

Coarray data distribution

Synchronization with coarrays

Performance considerations

Minimizing remote memory access

Optimizing communication patterns

Balancing computation and communication

Scalability of PGAS applications

Comparison of UPC and Coarray Fortran

Language features and syntax

Performance tradeoffs

Interoperability with other languages

PGAS vs message passing

Productivity and ease of use

Performance at scale

Suitability for different problem domains

Advanced topics in PGAS

Hybrid programming with PGAS and MPI

Irregular data structures in PGAS

Fault tolerance in PGAS applications

Key Terms to Review (17)

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Next guide