= Laziness A draft on the runtime view of how Lazy things (like Arrays) work. # by fglock The text currently has 2 main sections: - a review of the Perl 6 Bible, with some interpretation and guessing - implementation notes. Each section has 3 subsections: - Native data - about the internal representation of data types - Values - about the basic Perl 6 "OO" things - Containers - about the glue that hold things together in data structures == TODO - include drawings - comparing with Ruby/Python should help explaining some concepts - this text is supposed to talk about all these things: Scalar / Array / Hash Container Tuple / List / Generator / Iterator / Range file iterator Signature / Argument List Array / Hash slice enum tie / proxy / rw / env Constrained Types (eg: Int / Even / 0..100 where Even) Piddle lazy string / int / num = Part I - Perl 6 Bible Readings References: - S02 Bits and Pieces - for "native data" - S09 Data Structures - for Object types - A bit of each of the other AES = Native data Native data is built with a thin, (possibly non-OO) library over the virtual machine data types. For example, you may need a wrapper to implement 1-bit values, if the virtual machine only provides bytes. - native "singular" value Native values are: num, int, uint, bit, complex, ref. Native values can have a bit-size specification: int1, int16. A "native container" is whatever a "ref" value points to. A "num" value must support IEEE floating point conventions - which means there are native "Inf" and "NaN". Native values don't need to be able to represent "undef". - native "plural" value Native plural values are: array, str, hash. A "str" is an array of "int". There may be separate implementations for fixed-size and for auto-extending allocation. These are fast, in-memory structures provided by the virtual machine. These structures are assumed to be one-dimensional. Native hash is mostly used internally, for holding "namespaces": Classes, variables, labels. There may be separate implementations for string-indexed and for object-indexed hashes. = Value Class Values can be "singular" or "plural". Singular Values are: Str, Int, Num, Pair, Ref, Junction, Code, Block, Class, Grammar, Macro ... Plural Values are things that look like single-dimensional arrays: List, Generator, Iterator. -- List, Lazy List, Tuple classes List is implemented as a native Array, or as a contiguous Stack area if this is supported by the virtual machine. List is set of things. List is a subclass of "Value". List is one-dimensional. A List may be built with a mix of Values and Containers. A List might exist in 4 "states": @a = ( 7, 20..30 ); - this is the Array that will be used in the examples @a = ( ( (Cell = 7), ), ( Generator 20..30) ); - this is more-or-less what it looks like internally - unflattened List ( 1, 5..10, @a ) - a List built from a Scalar, a Range, and an Array - lazily flattened with "*" operator ( 1, 5..10, (Scalar := @a[0]), (Generator (Scalar := @a[1])..(Scalar := @a[10])) ) - the Array was "flattened", but we still have generators. - non-lazily flattened (eager) with "**" operator ( 1, 5, 6, 7, 8, 9, 10, (Scalar := @a[0]), (Scalar := @a[1]), (Scalar := @a[2]), ..... ) - everything is flattened - fully evaluated (Tuple) ( 1, 5, 6, 7, 8, 9, 10, @a[0].value, @a[1].value, @a[2].value, ..... ) - fully evaluated and immutable List only implements a subset of the Array API. A List slice is a new List - elements are shallow copied. List don't support adding, removing or substituting elements. List has no Cells - thus you can't bind a Container to a List element or slice. ??? $a := (1,2,3)[2]; # compile error - or is this creating a Constant ??? If a List is built with Generators, it is a Lazy List. -- Generator / Range Classes Generator is an object that creates Values on demand. Generator is "functional" rather than "physical memory" Range is a special case, in which there is a start, an end, and a step. #At runtime, you only access Generator elements through Containers. Generator also implements "match": 5 ~~ 1..10 The "*" operator flattens a Generator eagerly. TODO - detail the behaviour of "Str Range" -- Iterator Class TODO =<> The "*" operator flattens an Iterator eagerly. -- List Slice A List Slice is a shallow copy of a part of a List. A List Slice is just a List - there is not a separate Class for it. = Container Class Containers are the glue that hold Values together in data structures. The container classes are: Scalar, Array, Hash. Array, like List, is Lazy by default. Hash and Scalar can be lazy too, but they are not Lazy by default. == Scalar Container Class Scalar is a "singular Value" Container. One of the reasons to use a container is that you can easily change it's semantics by replacing what's inside it - that is, a Container implements a "variable": value '1' - always means 'one' container ('1') - it means 'one' too, but you can increment it to mean 'two' == Array Container Class The Array class defines all objects that implement the Array API. Array is a subclass of "Container" Array contains an ordered set of Cells. Array is multi-dimensional. Array is not meant to be subclassed - use "roles" to make different implementations. The actual implementation depends on the "Eager" or the "Lazy" roles. Other implementations can be specified. --- Eager Role An Array object that wraps a native array. --- Lazy Role An Array object that generates Cells on demand. This is the default implementation. --- Shape (Role or parameter? trait?) --- Compact Array Role (piddle) TODO - piddle is different from Compact -- Container Slice A Container Slice is a specialized Array/Hash, in which all the elements are bound to objects inside other Containers (Array, Hash or Scalar). A Lazy Container Slice contains pointers to Generators that are inside other Containers. Container Slice is multi-dimensional. == Hash The Hash class defines all objects that implement the hash API. Hash is like an Array, in which the Cell indexes can be of any type (not just numbers). Hash is multi-dimensional. = Part II - Internals == Generator Internals - TODO == Generator API - match This operator is used in a match statement: 5 ~~ 1..10 - shift / pop TODO - not sure if "Generator" is an iterator (read-only) or uses destructive read (shift/pop) - sum / reverse / map / zip - flatten Takes all Values from the Generator, and returns an Array (possibly Native). - elems The number of elements === This part of the Generator API is only used by the runtime internals. - head(n) This operator is used by the Array splice operator. Takes "n" Values from the Generator, like "shift" does. The result can be a Generator (when n is big), an Array (possibly Native), a single Value (when n == 1), or Void (when n <= 0). TODO - not sure if "Generator" is an iterator (read-only) or uses destructive read (head/tail) - tail(n) This operator is used by the Array splice operator. Takes "n" Values from the back of the Generator, like "pop" does. The result can be a Generator (when n is big), an Array (possibly Native), a single Value (when n == 1), or Void (when n <= 0). TODO - check if "Generator" is an iterator (read-only) or uses destructive read (head/tail) - is_contiguous This is used for the optimization of Slice, when the sequence is used as a slice index. is_contiguous() is True is the generated sequence is like "2 .. 10". = Array internals == Array (base Class) TODO == Eager (Role) An Eager Array contains a Native Array. TODO - explain "Cell" Contents: 1 2 3 Internals: [ 1, 2, 3 ] == Lazy (Role) A Lazy Array contains a "specs" Array, which contains Generators, or Native Arrays. Contents: 1 2 3 Internals: [ [ 1, 2, 3 ] ] Contents: 10 20 30 100..10000 50 400..900 Internals: [ [ 10, 20, 30 ], Generator( 100, 10000 ), [ 50 ], Generator( 400, 900 ) ] = Array API - TODO Array can do "zip", "map", "grep", splice, push, pop, store, fetch. Array can be "bound" and "tied". == map() map() applies a 'Code' to an Array state: 1: @a = 1..100; # 1, 2, 3, ... 2: @b = @a.map:{ $_ * 2 }; # 2, 4, 6, ... 3: 4: @a[2] = 0; # 1, 0, 3, ... - original has changed 5: say @b[1..3]; # 2, 4, 6, ... - result didn't change Line 1: Container: @a Internals: [ Generator( 1, 100 ), ] Line 2: Container: @_TEMP_ Internals: [ Generator( 1, 100 ), ] Container: @a Internals: [ RefArray( @_TEMP_ ), ] Container: @b Internals: [ MapArray( @_TEMP_, { $_ * 2 } ), ] Line 4: Container: @_TEMP_ Internals: [ [ 1, 2 ] Generator( 3, 100 ), ] Container: @a Internals: [ [ 1, 0 ], SpliceArray( @_TEMP_, 2 ), ] Container: @b Internals: [ MapArray( @_TEMP_, { $_ * 2 } ), ] == splice() This pseudo-code implements splice(). It assumes that the internal representation of Array can do .head(n) and .tail(n) - see "Generator" for details. sub splice ( @array, $offset? = 0, $length? = Inf, @list? = () ) { my ( @head, @body, @tail ); if ( $offset >= 0 ) { @head = @array.head( $offset ); if ( $length >= 0 ) { @body = @array.head( $length ); @tail = @array; } else { @tail = @array.tail( -$length ); @body = @array; } } else { @tail = @array.tail( -$offset ); @head = @array; if ( $length >= 0 ) { @body = $tail.head( $length ); } else { @body = @tail; @tail = $body.tail( -$length ); } }; @array = ( @head, @list, @tail ); return @body; } = Hash internals A Hash contains a Native Hash. Hash keys can be objects, but the objects must be immutable Contents: a:1 b:2 c:3 Internals: { a:1 b:2 c:3 } A Hash may contain Lazy items. This structure contains a Native Array, which contains Generators or Native Hashes. This structure permits near O(1) complexity for most operations. Contents: a:1 b:2 c:3 Internals: [ { a:1, b:2, c:3 } ] Contents: a:10 b:20 c:30 (1:100 .. 10000:10100) d:50 Internals: [ { a:10 b:20 c:30 }, Generator( 1:100, 10000:10100 ), { d:50 }, ] = Hash API - TODO = Slice API - TODO = Slice Internals - TODO - hash slice / array slice __END__