=head1 TITLE Perl 6 Theories - Grand unified object model =head1 SYNOPSIS # theory representing a total order theory Ordered{::T} { multi infix:«<=» (T $x, T $y --> Bool) {...} multi infix:«>=» (T $x, T $y --> Bool) { $y <= $x } multi infix:«>» (T $x, T $y --> Bool) { not $x <= $y } ... } # a cons list data type union MyList[::T] ( MyNil, MyCons(T $.head, MyList[T] $.tail), ); # Lists of ordered things are lexographically ordered # (read <= as "if" rather than "less than or equal to") model Ordered{MyList[::T]} <= Ordered{::T} { multi infix:«<=» (MyNil, _) { true } multi infix:«<=» (MyCons, MyNil) { false } multi infix:«<=» (MyCons ($h1, $t1), MyCons ($h2, $t2)) { when $h1 <= $h2 { true } when $h2 > $h1 { false } default { $t1 <= $t2 } } # the theory automatically fills in <, >, >=, ==, etc. } =head1 DESCRIPTION If Perl 6 is to have a strongly typed dialect but still maintain its dynamicity, it must have a theoretical model for its typing. Strong typing without a model ends up with kludges and special cases (see C++). This document presents a model that is precise and general, but still feels like Perl. The root of this model is the distinction between interface and implementation. In particular, it holds the idea of a I very dearly; that is, if you can create an object that pretends to be of a particular type, then it can be used anywhere that a real object of that type can. In fact, this proposal destroys the concept of a mock object, by saying that any object that conforms to the interface of a type I of that type. You can never refer to a concrete type explicitly, and you cannot create a concrete type without creating an interface to go along with it. Before we get to the type theory, there are two pieces of machinery that need to be built: tuples and pattern constructors. =head2 Tuples The first piece of machinery we need is the tuple. The tuple underlies all of the semantics here, but you never really see it. Perl 6's tuple is much richer than tuples of other languages; in short, it is an object representing a parameter list. The tuple has three parts: positional, named, and block, any of which can be empty. It is constructed using the comma. my $tuple = (1, :foo(2), 3, 4):{ $_ + 1 }; This tuple has three positionals, C<1,3,4>; one named, C<< foo => 2 >>; and one block, C<{ $_ }>. The block parameters are specified as adverbs on the tuple as a whole. If you want a block inside a positional, just put one there. If you want a pair inside a positional, enclose it in parentheses (that is, named elements are I determined; in the tuple (1,$x), $x will always be positional, whether or not it contains a pair): my $tuple = ({ $_ + 1 }, (foo => 2)); # two positionals Unlike in most other languages, a single-argument tuple is not the same as a value. To create a single-argument tuple (which ought to be an uncommon thing), use a null comma at the end: my $tuple = ("foo",); # a tuple with one positional Tuples, like lists, may be flattened into other tuples: my $a = (1,2,3); # a tuple with positionals 1,2,3 my $b = (0, [,]($a), 4); # a tuple with positionals 0,1,2,3,4 As a special case, function application will automatically (lexically) tuplize a single argument: foo($x); # equivalent to foo($x,) And therefore, you can think of all functions as taking only a single argument: the tuple of the argument list. This will be important later. Lists and tuples should interact in a reasonable way so that you can pretend that they're the same thing (but in a strongly typed dialect, they are quite different). XXX handwave Since a tuple is essentially an argument list, a tuple pattern is essentially a parameter list. The meaning of the slurpy scalar C<*$x> has changed. Now it is just like a slurpy list, except that it gathers the rest of the signature into a tuple C<$x> instead of a list. To facilitate the technique that the slurpy scalar was originally introduced for, we can use this: sub first (*[ $x, *@xs ]) { $x } The upshot is that you can easily delegate an arbitrary argument list to another function: sub foo (*$args) { bar([,] $args) } # call bar with whatever args foo # was called with Tuples also have a pattern constructor, comma, for use in signatures and bindings. So you can say: multi snd (($x, $y)) { $y } C takes only one argument (no, I mean that in the usual sense, not the every-function-takes-exactly-one-argument sense), where that argument is a two-tuple, and it extracts the second element. Now, what the heck is a pattern constructor? =head2 Pattern Constructors Before we get to exactly what a pattern constructor is, let's define what a pattern is. A pattern is a thing that can extract "subinformation" out of an object as it's being bound. It appears in argument lists like so: rule parameter { ? | ? ? } There are several patterns with special syntax: square brackets for matching inside arrays, curly brackets for matching out of hashes, bare parentheses for matching a tuple, all numeric and string literals (for matching themselves under C<===>), perhaps among others. These either require a set of syntactic categories (like C<< pattern:term:<...> >>) or a straight-up grammar munge. The patterns that are easy to define are the I. These are given by the rule: rule normal_pattern { ? } (If the C is omitted, the empty tuple C<()> is assumed) So you'll have things like: multi traverse (Leaf ($v), *&code) { &code($v) } multi traverse (Branch ($l, $r), *&code) { traverse($l):&code; traverse($r):&code; } Where that first line could have also been either of: multi traverse ($leaf Leaf ($v), *&code) {...} multi traverse (Tree _ Leaf ($v), *&code) {...} But those don't add anything to the readability in this case. How does one define a normal pattern constructor? By placing names into the C grammatical category. A normal pattern constructor takes a type and returns a tuple, which is then unified against the given tuple pattern. For instance, I could define an identity pattern that would just return the one-tuple of the object: sub pattern: ($obj) { ($obj,) } Or the typed version: sub pattern: (::T $obj --> (::T,)) { ($obj,) } You could use this pattern to give more than one name to an argument: sub mul3 ($x Id ($y Id ($z))) { $x + $y + $z } Other than that, it's not useful, of course. But that goes to show that if the pattern constructors are typed, they imply those type constraints on the corresponding argument: sub pattern: (Tree $t --> (Any,)) {...} multi traverse (Tree _ Leaf ($value)) {...} The C in the second declaration is redundant; the C pattern already implies it. =head2 Classes and Roles There are two ways to create a concrete type, otherwise known as a class. One is with the C declarator, the other is the C declarator (which we'll get to in a bit). A class defined by its I (or attributes) and its I (or methods). class Account { has $.balance; # instance data and interface has $.name; # instance data and interface has $._password; # instance data method withdrawl ($amount, $password) { # interface # ... } } This definition does two things: it defines the package C with a C constructor (which puts together the instance data), and it defines the role C that contains pretty much the body of the class, specifying its interface with default implementations (when you say C, you are specifying the interface method C with a default implementation of object state). Whenever you say C in type position (after a C or in a signature), you are referring to this role. Whenever you say C elsewhere, you are referring to the package. To subclass a class, you say that a new class C the old one: class CreditAccount { does Account; method withdrawl ($amount, $password) { # try to borrow some money } } Of course, you are specializing the role C, not the class. In fact, the class doesn't exist at all; it is just a notation to define the package, concrete type, and role all at once. =head2 Unions and Factories Perl 6 supports union types (also called case types or algebraic data types). Union types are to factores what classes are to roles (more on that later). To demonstrate the syntax, let's define a simple binary tree: union Tree ( Leaf($value), Branch(Tree $left, Tree $right), ); The case types (C and C) are defined as a I followed by a tuple pattern; i.e. an argument list. Like C, there is no such thing as the union C; it is just a notation to define a bunch of related things: It defines the packages C, C, and C. It defines (so far empty) roles C and C, and the I C: # automatically defined factory Tree { generator Leaf ($value) {...} generator Branch (Tree $left, Tree $right) {...} } A factory is the dual of a role: it defines a bunch of functions that I Cs (where roles define functions that I trees). C is the corresponding dual to C (it returns the thing in question). We'll see why there's all this vocabulary later. Unions can be parameterized over one or more types: union Tree[::T] ( Leaf(T $value), Branch(Tree[T] $left, Tree[T] $right), ); And the return type of the constructor functions can be stated explicitly to form a GADT: union AST[::] ( # :: is a siglet meaning "a type" Const(::T --> AST[::T]), Cond(AST[Bool], AST[::T], AST[::T] --> AST[::T]), Succ(AST[Int] --> AST[Int]), ... ); If the argument names to the constructors are given C<$.attribute> form, then attribute accessors are defined on the type: union Tree ( Leaf($.value), Branch(Tree $.left, Tree $.right), ); Leaf(4).value; # 4 Branch(Leaf(2), Leaf(3)).right.value; # 3 Leaf(4).left; # method found but pattern match failed Unions can be extended: union AnnotatedTree does Tree ( Annotation($.str, Tree $.tree), ); When you extend a union, you create a superset (but still a subtype). That is, everything that is a C is also an C (superset), but an C may be used anywhere a C can. =head2 Theories Roles and factories are special cases of a much more general structure called a I. A theory declares a general algebraic structure over several types. A good example of a theory that is not a role or a factory is the definition of a ring (roughly an integer-like thing): theory Ring{::R} { multi infix:<+> (R, R --> R) {...} multi prefix:<-> (R --> R) {...} multi infix:<-> (R $x, R $y --> R) { $x + (-$y) } multi infix:<*> (R, R --> R) {...} # only technically required to handle 0 and 1 multi coerce: (Int --> R) {...} } This says that in order for a type R to be a ring, it must supply these operations. The operations are necessary but not sufficient to be a ring; you also have to obey some algebraic laws (which are, in general, unverifiable programmatically), for instance, associativity over addition: C<(a + b) + c == a + (b + c)>. To I the theory, that is, declare that some type obeys these laws, you I it: model Ring{Int} { # define all needed operations } C is really just a shorthand for an anonymous theory that models another: theory { models Ring{Int}; # definitions } Saying that a theory C another theory says that if you obey the laws of the former, then you automatically obey the laws of the latter. How do we actually use theories? We can specify constraints over type variables with them: multi pow (::T $x, Int $power --> ::T <= Ring{::T}) { my $result = $x; for (2..$power) { $result *= $x } return $result; } This says that C works on any first argument, as long as it belongs to a type that models C. C<< <= >> is to be read as "where" or "requires" or "is implied by". It needs a better spelling. Roles and factories reduce to theories using a I. So the role: role Foo { method bar () {...} } Is equivalent to the theory: theory Foo{::T} { multi bar (T) {...} } Because the role specifies a topic type (which we named C<::T> when we converted to a theory), and C defines a multi with its first argument being the topic type. It is an error to define a C when there is no topic type. Conversely, a factory just reduces to a theory with a topic type. C defines a multi with its I type being the topic type. But roles and factories are a bit more special than theories. They have one additional property each: =over =item * If every A is also a B, and R is a role, then R{B} implies R{A}. =item * If every A is also a B, and F is a factory, then F{A} implies F{B}. =back That is, every subset of something that does a role also does that role, and every superset of something that does a factory also does that factory. This is self-evident when you notice that roles can only have methods and factories can only have generators. When you specify a type in a type signature, you're really specifying a type variable with a constraint: multi foo (Bar $x) {...} Is equivalent to: multi foo (::T $x <= Bar{::T}) {...} And so a role that returns the type the role defines is still a role, even though it appears to depend on the type contravariantly: role Foo { method bar (--> Foo) {...} } Is equivalent to: theory Foo{::T} { multi bar (T --> Foo) {...} # which is multi bar (T --> ::U <= Foo{U}) {...} } That is, you don't return the I type, you just return one that conforms to the same interface. Finally, let's look at the full reduction of our C union type and try to understand it: union Tree ( Leaf($value), Branch(Tree $left, Tree $right), ); This turns into: theory Leaf{::T} { # expanded from "role Leaf" multi pattern: (T --> (Any,)) {...} } theory Branch{::T} { multi pattern: (T --> (::U, ::V) <= Tree{::U} && Tree{::V}) {...} } theory Tree{::T} <= Leaf{::T} || Branch{::T} { # note the or multi Leaf ($value --> T) {...} multi Branch (::A $left, ::B $right --> T <= Tree{::A} && Tree{::B}) {...} } Whew, good thing we don't have to write that. As you can see, even though it looks like C is behaving covariantly by looking at the union, it isn't. A branch accepts any two things that behave like this C, not the tree itself. =head2 Vocabulary Here is a summary of the object-oriented vocabulary in Perl: =over =item * C The big one in the structures department. Specifies a general algebra over one or more types. =item * C A I, I theory, that is, a theory over a single type where the functions defined within only take the type as a parameter, never return it. =item * C A I, I theory, that is, a theory over a single type where the functions defined within only return the type, never take it as a parameter. =item * C A notation that simultaneously defines a role, a concrete type, and a package of the given name. =item * C A notation that simultaneously defines a factory together with a concrete type, a role, and a package for each case in the factory. =item * C A notation for a zero-parameter theory that models another theory. =item * C Declares implication over structures. If one structure models another, that means that if a type obeys the former, then it obeys the latter. =item * C A unary shorthand for C. C is a shorthand for C. =item * C The big one in the functions department. Specifies a multiply-implemented function dispatched based on the involved types. =item * C Goes with roles. Defines a multi that implicitly takes the topic type as its first parameter. =item * C Goes with factories. Defines a multi that implicitly returns the topic type. =back =head2 Notes There's a little problem with the GADT form: union AST[::] ( Const(::T --> AST[::T]), ); This convolves to: theory AST{::A, ::B} { multi Const(::T --> ::U <= AST[::T]{::U}) {...} } Which doesn't even say anything about C<::B>. It should be: theory AST{V} { multi Const(::T --> ::U <= V{::T, ::U}) {...} } That is, it is a theory-valued theory. I guess this makes sense, as all types are really theories. But so far type variables can only assume concrete types; that's the cool thing about them. You can never explicitly mention a concrete type, but that's the only thing a variable can be, so you have to go through interfaces. I suppose theory-valued theories are necessary if we want GADT (and there are probably some other uses as well), but I fear what it will do to the type inferencer. Maybe there is another way. UPDATE: There is, and here it is. Each case gets its own theory, which lifts the return value from the constructor. Let's consider a two-constructor union: union AST[::] ( Const(::T --> AST[::T]), TestZero(AST[Int] --> AST[Bool]), ); This convolves to: theory AST::Const{::T, ::ASTT} { multi Const(T --> ASTT) {...} } theory AST::TestZero{::ASTBool} { multi TestZero(::T --> ASTBool <= AST{::TInt, ::T} && Int{::TInt}) {...} } # I'll use => as a shorthand for "implies"; i.e. a => b is # equivalent to !a || b theory AST{::Param, ::Tree} <= AST::Const{::Param, ::Tree} && (Bool{::Param} => AST::TestZero{::Tree}) { } So it all reduces to complex logic between concrete (non-variable) theories C and type variables C<::V> in the form C.