Trees configuration reference
===============================================================================

Copyright 2016-2017 Mellanox.

License
-------------------------------------------------------------------------------

See LICENSE file.

Applied to
-------------------------------------------------------------------------------

SHARP v1.1.0

Terminology
-------------------------------------------------------------------------------

- **Aggregation tree** is a reduction tree. It describes data reduction topology
  available for collective operation.
- **Aggregation Node (AN)** is a node of Aggregation Tree implemented in SwitchIB-2,
  including hardware (ASIC) and local firmare implementation.
- **Compute port** is IB port in a server running MPI application and used for
  network communication.
- **Aggregation Manager (AM)** is central entity  used for system level
  configuration and management.

Overview
-------------------------------------------------------------------------------

AM implements two ways for defining Aggregation Trees. It can build trees
automatically for quasi fat-tree topology based on switch ranking. Also, it
supports manually created configuration for Aggregation Trees.
A tree configuration file is a text file used for static trees configuration.

Trees configuration file location
-------------------------------------------------------------------------------

AM has a configuration parameter which controls location for trees configuration file.
The parameter could be passed through command line, configuration file and
environment variable.


```
-t, --trees_file <value>:
        SHArP trees file
        If NULL, calculate trees automatically
        default value: (null)
```

<SHARP FOLDER>/sbin/sharp_benchmark.sh script can be used as a reference for
running AM. This script creates a configuration file and run AM.

Configuration file format
-------------------------------------------------------------------------------

The format of this configuration file is one element per line. Each line has
a prefix and an identification. The file format supports comments. Lines begins
with `#` are comment and ignored during parsing.

Tree configuration file includes list of Aggregation Trees. The tree has list 
of nodes and parent-child relations. Parent-child relation can be a relation 
between two Aggregation Nodes or between Aggregation Node and compute port.
The fist type defines static, inner part of the tree. The second type defines 
lower layer of the tree connected only as long as a job using the compute ports
is running.

Connection between ANs are mandatory for the tree definition. Connections 
between compute ports and ANs are optional. If a connection between AN and 
compute port is absent in the configuration file, AM connects the port to AN
running on direct connected switch. If the direct switch is not sharp capable,
AM doesn't connect the port to any AN. In this case, manual configuration is 
only one way to attach the port to the aggregation tree. Configuration file 
overrides automatic AN to compute port assignment. A user has to define all AN
to AN connections in the tree and can customize connections between compute 
ports and ANs.

Following prefixes are supported:

| Prefix         | Description                                | Identification          |
|----------------|--------------------------------------------|-------------------------|
| `tree`         | Defines a tree                             | int: 0-63               |
| `node`         | Define a tree node                         | {Description} and GUID  |
| `subNode`      | Defines children of the node               | {Description} and GUID  |
| `computePort`  | Defines compute port connected to the node | GUID                    |


- `tree` defines Aggregation Tree. The tree has an integer ID belongs
   [0..63] interval. In current version up to 63 trees are supported. All line
   between the line with `tree` prefix until the next line with `tree` prefix
   or EOF belong to the tree.

   ```
   tree <tree-id>
   ```

- `node` defines Aggregation Node. IB node description and port GUID can be used
   for AN identification.

   ```
   node {<node description>} [GUID:<port_guid_num>]
   ```

   If AM can't find the unique Aggregation Node using node description it uses 
   port GUID as a key. Port GUID is hex integer with `0x` prefix. It can be 
   found in ibnetdiscover output.

   If a user don't set a custom node description, 
   `Mellanox Technologies Aggregation Node` is used as default. In that case 
   port GUID is only one way to describe Aggregation Node. In SwitchIB2, 
   Aggregation Nodes, usually, are connected to virtual 37 port.

- `subNode` defines a child Aggregation Node for nearest upper `node`.
   Identification of the sub-node is similar to node.

   ```
   subNode {<node description>} [GUID:<port_guid_num>]
   ```

   Leaf AN is a AN without children. For leaf AN is enough to put it as
   `subNode` for upper AN, no need `node` element in this case.

- `computePort` defines a connection between nearest upper `node` compute port.
   The Aggregation Node will serve the specific compute port in runtime.

   ```
   computePort {<node description>} [GUID:<port_guid_num>]
   ```

   In case of compute port, the description string is meta-data only. AM 
   searches for port using only GUID.


Aggregation trees dump
-------------------------------------------------------------------------------

AM can dump structure of created aggregation trees. `generate_dump_files`
enables dump file creation. Trees structure is in 
`dump_trees_structure.txt` file.

```
--generate_dump_files <value>:
        Dump internal state to files for debug and diagnostics
        default value: FALSE
```

`dump_dir` controls location of AM dumps.


```
--dump_dir <value>:
        Path to dump files directory
        default value: .
```

`dump_trees_structure.txt` includes only ANs connections without
compute ports. It can be used as basis for custom configuration.
`dump_trees_state.txt` also includes mapping between ANs and
compute ports.

Examples
-------------------------------------------------------------------------------

1.  In this example, tree configuration defines 4 trees with 9 ANs. All
    compute ports connected will be connected automatically to ANs running
    on directly connected switches. Each tree has two layers: 8 leaf switches
    and one backbone root switch.

	```
	tree 0
	node {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300a2d788
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b466c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46688
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46668
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46648
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46628
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46608
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b465c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300ac2cb8

	tree 1
	node {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300bf8558
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b466c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46688
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46668
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46648
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46628
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46608
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b465c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300ac2cb8

	tree 2
	node {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300bf8578
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b466c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46688
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46668
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46648
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46628
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46608
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b465c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300ac2cb8

	tree 3
	node {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300bf85f8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b466c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46688
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46668
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46648
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46628
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46608
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b465c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300ac2cb8
	```

2.  In this example, compute port GUID:0x7cfe900300ac2cb8 in tree 0
    to non-directly connected AN (GUID:0x7cfe900300ac2cb8). All other
    compute port in the tree connected automatically to ANs running on
    directly connected switches. Port GUID:0x7cfe900300ac2cb8
    in other trees also connected to default AN.

	```
	tree 0
	node {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300a2d788
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b466c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46688
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46668
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46648
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46628
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46608
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b465c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300ac2cb8
	node {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300ac2cb8
	computePort { } GUID:0x7cfe900300726fd2

	tree 1
	node {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300bf8558
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b466c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46688
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46668
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46648
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46628
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46608
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b465c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300ac2cb8

	tree 2
	node {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300bf8578
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b466c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46688
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46668
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46648
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46628
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46608
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b465c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300ac2cb8

	tree 3
	node {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300bf85f8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b466c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46688
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46668
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46648
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46628
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b46608
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300b465c8
	subNode {Mellanox Technologies Aggregation Node} GUID:0x7cfe900300ac2cb8

	```
