Functional Data Structures:Introduction and Stacks,A Simple Example
A functional data structure is a data structure that is suitable for implementation in a functional programming language, or for coding in an ordinary language like C or Java using a functional style. Functional data structures are closely related to persistent data structures and immutable data structures—in fact, the three terms are often used interchangeably.
However, there are subtle differences.
• The term persistent data structures refers to the general class of data structures in which an update does not destroy the previous version of the data structure, but rather creates a new version that co-exists with the previous version. See Chapter 31 for more details about persistent data structures.
• The term immutable data structures emphasizes a particular implementation technique for achieving persistence, in which memory devoted to a particular version of the data structure, once initialized, is never altered.
• The term functional data structures emphasizes the language or coding style in which persistent data structures are implemented. Functional data structures are always immutable, except in a technical sense discussed in Section 40.4.
In this chapter, we will discuss the main issues surrounding the implementation of data structures in functional languages, and illustrate these issues with several extended exam- ples. We will also show how to adapt functional data structures to a mainstream language such as Java, for use when a persistent data structure is required. Readers wishing more details about functional data structures should consult Okasaki [24].
Data Structures in Functional Languages
Functional programming languages differ in several important ways from ordinary programming languages like C or Java, and these differences can sometimes have a large effect on how data structures are implemented. The main differences (at least from the perspective of data structures) are immutability, recursion, garbage collection, and pattern matching.
Immutability
In functional languages, variables and records cannot be modified, or mutated, once they have been created.∗ Many textbook data structures depend critically on the ability to mutate variables and records via assignments. Such data structures can be difficult to adapt to a functional setting.
Recursion
Functional languages frequently do not support looping constructs, such as for-loops or while-loops, because such loops depend on being able to mutate the loop control variable.
Functional programmers use recursion instead.†
Garbage Collection
Functional languages almost always depend on automatic garbage collection. Because objects are immutable in functional languages, they are shared much more widely than in ordinary languages, which makes the task of deciding when to deallocate an object very complicated. In functional languages, programmers ignore deallocation issues and allow the garbage collector to deallocate objects when it is safe to do so.
Pattern Matching
Pattern matching is a method of defining functions by cases that are essentially textual analogues of the kinds of pictures data-structure designers often draw. Pattern matching is not supported by all functional languages, but, when available, it allows many data structures to be coded very concisely and elegantly.
Functional Data Structures in Mainstream Languages
Even if you are programming in a mainstream language, such as C or Java, you may find it convenient to use a functional data structure. Functional data structures offer three main advantages in such a setting: fewer bugs, increased sharing, and decreased synchronization.
Fewer Bugs
A very common kind of bug arises when you observe a data structure in a certain state and shortly thereafter perform some action that assumes the data structure is still in that same state. Frequently, however, something has happened in the meantime to alter the state of the data structure, so that the action is no longer valid. Because functional data structures are immutable, such alterations simply cannot occur. If someone tries to change the data structure, they may produce a new version of it, but they will in no way effect the version that you are using.
Increased Sharing
Precisely to avoid the kinds of bugs described above, programmers in mainstream languages are careful to limit access to their internal data structures. When sharing is unavoidable, programmers will often clone their data structures and share the clones rather than granting access to their own internal copies. In contrast, functional data structures can be shared safely without cloning. (Actually, functional data structures typically perform a substantial amount of cloning internally, but this cloning is of individual nodes rather than entire data structures.)
Decreased Synchronization
Again, precisely to avoid the kinds of bugs described above, programmers in concurrent settings are careful to synchronize access to their data structures, so that only a single thread can access the data structure at a time. On the other hand, because functional data structures are immutable, they can often be used with little or no synchronization. Even simultaneous writes are not a problem, because each writer thread will get a new version of the data structure that reflects only its own updates. (This assumes, of course, an application where participants do not necessarily want to see changes made by all other participants.)
Stacks: A Simple Example
Stacks (see Chapter 2) represented as singly-linked lists are perhaps the simplest of all data structures to make persistent. We begin by describing functional stacks supporting four main primitives:
• empty: a constant representing the empty stack.
• push(x,s): push the element x onto the stack s and return the new stack.
• top(s): return the top element of s.
• pop(s): remove the top element of s and return the new stack.
We can see right away several differences between this interface and the interface for ordinary stacks. First, for ordinary stacks, push and pop would implicitly change the existing stack s rather than returning a new stack. However, the hallmark of functional data structures is that update operations return a new version of the data structure rather than modifying the old version. Second, ordinary stacks would support a function or constructor to create a fresh, new stack, rather than offering a single constant to represent all empty stacks. This highlights the increased sharing possible with functional data structures. Because pushing an element onto the empty stack will not change the empty stack, different parts of the program can use the same empty stack without interfering with each other.
Figure 40.1 shows an implementation of stacks in Haskell [28], a popular functional programming language.
Figure 40.2 shows a similar implementation in Java.
Like all code fragments in this chapter, these implementations are intended only to illustrate the relevant concepts and are not intended to be industrial strength. In particular, all error handling has been omitted. For example, the top and pop operations should check whether the stack is empty. Furthermore, programming conventions in Haskell and Java have been ignored where they would make the code harder for non-fluent readers to understand. For example, the Haskell code makes no use of currying, and the Java code makes no attempt to be object-oriented.
The Haskell code illustrates a simple use of pattern matching. The declaration
states that stacks have two possible shapes, Empty or Push, and that a stack with the Push shape has two fields, an element and another stack. The tags Empty and Push are called constructors. Later function declarations can match against these constructors. For example, the declaration
top(Push(x,s)) = x
says that when top is called on a stack with the Push shape, it returns the contents of the first field. If desired, more clauses can be added to deal with other shapes. For example, a second clause could be added to the definition of top to handle the error case:
top(Push(x,s)) = x
top(Empty) = ...signal an error...
How do these implementations achieve persistence? First, consider the push operation. Calling push creates a new node containing the new element and a pointer to the old top of stack, but it in no way alters the old stack. For example, if the old stack s contains
the numbers 3, 2, 1 and we push 4, then the new stack sl contains the numbers 4, 3, 2, 1.
Figure 40.3 illustrates the relationship between s and sl. Notice how the nodes containing 3, 2, and 1 are shared between both stacks. Because of this sharing, it is crucial that the nodes are immutable. Consider what would happen if nodes could be changed. If we were to change the 3 to a 5, for example, perhaps by calling an operation to update the top element of s, that change would affect not only s (which would now contain 5, 2, 1) but also sl (which would now contain 4, 5, 2, 1). Such unintended consequences would make sharing impossible.
Next, consider the pop operation, which simply returns the next pointer of the current node without changing the current node in any way. For example, Figure 40.4 illustrates the result of popping the stack sl to get the stack sll (which shares its entire representation with the original stack s). Notice that, after popping sl, the node containing 4 may or may not be garbage. It depends on whether any part of the program is still using the sl stack.
If not, then automatic garbage collection will eventually deallocate that node.
Comments
Post a Comment