GitHub - twhitehead/trellis

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
include		include
tests		tests
README		README
Repository files navigation

% Program Overview
% Tyson Whitehead


Notation
========

* The suffix *s* is added multiple times to indicate nestings of
  plurals (e.g., individualss would be a collection of a collection of
  individuals).
* the prefix *r* is used to indicate the reduced versions of data
  (e.g., rindividualss would be a collection of a collection of
  individual reduction data)


System
======

The system consists of space, world, variety, and individual data:

space
  ~ the size of the world and what boundaries are periodic,
world
  ~ global user defined parameters (e.g., current simulation year, etc.),
variety
  ~ per variety user defined parameters (e.g., growth rate, etc.), and
individual
  ~ per individual user defined parameters (e.g., height, etc.).

The world, variety, and individual data is conceptually organized into
a three level tree (i.e., one world, an arbitrary number of varieties
for this world, and an arbitrary number of individuals for each of
these varieties).  The parts are maintained separately to minimize
mutation operations.  The world doesn't contain the varieties, nor do
the varieties contain the individuals.  Instead, there is a world,
there is a collection of varieties, and there is a collection of a
collection of individuals:

world
  ~ the one world
varieties
  ~ the collection (for the world) of variety
individualss
  ~ the collection (for the world) of collections (for the varieties)
    of individuals

Simulation
==========

To start things, the space and world are initialized, an arbitrary
number of varieties are created, and, for each variety, an arbitrary
number of individuals are created.  This gives a space and a (world,
varieties, individualss) tuple.  These two represent the simulation at
any one moment in time.  The simulation progresses by alternating
between performing reduction and generation across the current (world,
varieties, and individualss) tuple.

1. The reduction generates the (rworld, rvarieties, rindividuals_in,
   rindividuals_out) tuple (e.g., highest individual in a variety,
   number of individuals in a variety, average height of individuals
   around each individual, etc.) from the current (world, varieties,
   individualss) tuple.
2. The generation creates a new (world, varieties, individualss) tuple
   (e.g., growing an individual by replacing it with an identical but
   higher individual in the same spot, spawning new individuals of the
   same variety but smaller, etc.) from the current (world, varieties,
   individualss) and (rworld, rvarieties, rindividuals_in,
   rindividuals_out) tuples.


Reduction
=========

Reduction is performed across across the (world, varieties,
individualss) to extract the (rworld, rvarieties, rindividualss)
parameters of interest.

rworld
  ~ reduction across all (world, variety, individual) tuples
rvarieties
  ~ collection of per variety reductions across all (world, variety,
    individual) tuples
rindividualss_in
  ~ collection of collections of per individual reductions across all
    (world, variety, individual) tuples where the later individual is
    in the the former individual's in range
rindividualss_out
  ~ collection of collections of per individual reductions across all
    (world, variety, individual) tuples where the former individual is
    in the the later individual's out range

The reductions are initialized by calling a user provided function
with the data available at that level.  Another user provided function
is then called to update this data for each relevant (world, variety,
individual) tuple below that level.

rworld
------

* reduced across all (world, variety, individual) tuples at the world
  level (i.e., a single instance),
* rworld is initialized by calling a user provided function with
  (world),
* rworld is updated by repeatedly calling a user provided function
  with each (world, variety, individual) tuple, and
* can be used to calculate desired world level statistics (e.g.,
  highest individual in the simulation).

rvarieties
----------

* reduced across all (world, variety, individual) tuple at the variety
  level (i.e., a single instance for each variety),
* each rvariety is initialized by calling a user provided function
  with the (world, variety) tuple for the variety,
* each rvariety is updated by repeatedly calling a user provided
  function with each (world, variety, individual) tuple such that
  (world, variety) is the tuple for the variety, and
* can be used to calculate desired variety level statistics across all
  individuals of that variety (e.g., highest individual for each
  variety).

rindividualss_in
----------------

* reduced across all (world, variety, individual0, individual1) tuples
  at the individual level (i.e., a single instance for each
  individual),
* each rindividual_in is initialized by calling a user provided
  function with the (world, variety, individual) tuple for the
  individual,
* the in region for each individual is given by a user provided function
  passed the (world, variety, individual) tuple for the individual,
* each rindividual_in is updated by repeatedly calling a user provided
  function with each (world, variety0, variety1, individual0,
  individual1) tuple such that (world, variety0, individual0) is the
  tuple for the individual and (world, variety1, individual1) is the
  tuple for the other individual in that individual's in range, and
* can be used to calculate desired individual level statistics across
  all individuals encompassed by each individual's in range (e.g.,
  tallest individual encompassed by the in range of each individual).

rindividualss_out
-----------------

* reduced across all (world, variety, individual0, individual1) tuples
  at the individual level (i.e., a single instance for each individual)
* each rindividual_out is initialized by calling a user provided
  function with the (world, variety, individual) tuple for the
  individual,
* the out region for each individual is given by a user provided function
  passed the (world, variety, individual) tuple for the individual,
* each rindividual_out is updated by repeatedly calling a user
  provided function with each (world, variety0, variety1, individual0,
  individual1) tuple such that (world, variety0, individual0) is the
  tuple for the other individual whose out range the individual is in
  and (world, variety1, individual1) is the tuple for the individual,
  and
* can be used to calculated desired individual level statistics across
  all individuals whose out range encompasses each individual (e.g.,
  tallest individual whose out reach encompasses each individual).


Generation
==========

The next step is generated from outside in.  Each (individual) is
mapped to a list of new individuals; each (variety, individuals) tuple
is mapped to a list of new (variety, individuals); and the (world,
varieties, individualss) tuple is mapped to a new (world, varieties,
individualss) tuple.

For the individual and variety levels the corresponding component
in the final (world, varieties, individualss) can be

1. deleted by returning an empty list,
2. left alone by returning a list with the a copy of the input,
3. updated by returning a list with an updated copy of the input,
4. replaced with an arbitrary number of new ones by returning
   a list of the replacement elements.

As there is only a single world returned, the world level only
supports the second and third of these.

individual
----------

* each individual is mapped to a list of new individuals by calling a
  user defined function with the (world, variety, individual,
  rworld, rvariety, rindividual_in, rindividual_out) tuple for the
  individual,
* for each variety, the combined collection of new individuals and
  from this mapping of current individuals becomes the set of
  individuals mapped with that variety at the variety level

variety
-------

* each variety and its associated individuals (produced by the above
  individual mappings) are mapped to a list of new varieties and
  associated individualss by calling a user defined function with
  the (world, variety, individuals, rworld, rvariety) tuple for the
  variety
* the combined collection of new varieties and associated individualss
  from this mapping of current varieties and associated individualss
  become the set of varieties and associated individualss mapped with
  the world at the world level

world
-----

* the world and its associated varieties and individualss (produced
  above by the variety mappings) are mapped to a new world and
  associated varieties and individualss by calling a user defined
  function with the (world, varieties, individualss, rword) tuple
* the new world and associated varieties and individualss from this
  mapping becomes the new (world, varieties, individualss) tuple


Complexity
==========

The most complex part of the program is performing the individual to
individual reductions efficiently.  A double loop across all
individuals (i.e., for each individual check every other individual to
see it it falls in the desired region) would be $\mathcal O(N^2)$,
were $N$ is the total number of individuals.^[Saying $f(x)$ is
$\mathcal O(g(x))$ means $\limsup_x f(x)/g(x) < \infty$ (i.e., $f(x)$
is eventually bound by constant multiple of $g(x)$).]

To avoid this, the system maps the individuals to their location along
a space filling 1D Z-curve and stores them in this sorted order.  The
mapping to a Z-curve is given by interleaving the binary digits of the
coordinates.  A key property of this mapping is that the Z-value of
the upper-left and lower-right coordinates of a rectangular region
bound the Z-values of all points within that region.

Combined with storing individuals in sorted order by their Z-value,
this allows all the individuals in a rectangular region to be located
in better than $\mathcal O(N)$.  The algorithm works by alternating
between Z-values and individuals because not all Z-values between the
lower and upper and bounds (the Z-values of the upper-left and
lower-right corners of the region) fall within the region.

- The first Z-value is the upper-left corner of the region (the
  Z-value of the upper-left corner of rectangular region is the the
  smallest Z-value within that region).
- The first individual with a greater or equal Z-value is then located
  (this is $\mathcal O(\mathrm{ln}(N))$ due to the individuals
  being sorted by Z-value).
- This located individual and all subsequent ones are processed until
  the first individual whose Z-value does not fall in the region is
  located.
- If this individual's Z-value falls outside of the upper-bound (the
  Z-value of the lower-right corner) then all individuals have been
  located.
- Otherwise the next point at which the Z-curve intersects the
  region is calculated and the process resumes with locating the
  first individual with a greater or equal Z-value.

The number of iterations in the above algorithm is a combination of
the number of individuals in the region plus the number of times the
algorithm switches between being inside and outside the the region.
The later is bound by the number of times the Z-curve enters and
exists the region.  This, in turn, is bound by a multiple (a function
of the precision of the Z-curve) of the region's boundary.

Assuming the number of individuals in a region is proportional to the
area of the region, the former (the number of individuals in the
region) will dominate the later (the number of times the Z-curve
enters and exits the region).  Assuming some upper bound on the size
of all regions, it follows that finding all individuals in a given
region is $\mathcal O(\mathrm{ln}(N))$.

The complexity of finding all individuals in the (globally bounded)
regions about all individuals is then $\mathcal O(N \mathrm{ln}(N))$.
The sorting of all individuals by their Z-values (required for the
algorithm) is also $\mathcal O(N \mathrm{ln}(N))$.  It follows that
the overall complexity is $\mathcal O(N \mathrm{ln}(N))$.


Future
======

The current system can't handle overlapping in and out regions where
the individuals themselves fall outside of both regions (i.e.,
individuals are located at a point within their regions so it is
possible for the overlap of the regions to not include the
individuals).  The out version is also less desirable because it is a
scatter operation.  This forces simultaneous reducers to synchronize
or maintain separate reduction space for later merger.

Replacing the individual_in and individual_out aggregation/reductions
with just a single individual reduction to address these issues.

individual
----------

* reduced across all (world, variety, individual0, individual1) tuples
  at the individual level (i.e., a single instance for each individual)
* each rindividual is initialized by calling a user provided function
  with the (world, variety, individual) tuple for the individual,
* the region for each individual is given by a user provided function
  passed the (world, variety, individual) tuple the individual,
* each rindividual is updated by repeatedly calling a user
  provided function with each (world, variety0, variety1, individual0,
  individual1) tuple such that (world, variety0, individual0) is the tuple
  for the individual and (world, variety1, individual1) is the tuple
  for the other individual whose range overlaps, and
* can be used to calculated desired individual level statistics across
  all individuals whose ranges overlap with each individual (e.g.,
  tallest individual whose reach overlaps).

The precision of the Z-curve plays a critical role in determining the
constant factor in the overall $\mathcal O(N \mathrm{ln}(N))$
complexity.  Less precise Z-values mean less stepping in and out of
regions (and thus a smaller constant factor), but they also mean the
Z-value filtering is more crude so more individuals are processed as
being potentially in a region.  Early profile results indicate that
there could be significant performance gains to be made by reducing
the current precision of the Z-values to obtain some optimal balance
between these two effects.