This test checks both the performance and the correctness of SHARP operations.
This test supports both HOST and CUDA buffers

Build
=====

To build the tests, just type `make`.

If SHARP is not installed in /opt/mellanox/sharp, you can specify SHARP_HOME.
If CUDA is not installed in /usr/local/cuda, you may specify CUDA_HOME.

This test depends on MPI to work. Please make sure you have "mpicc" in your environment.

```shell
$ make SHARP_HOME=/path/to/sharp
```
For CUDA support:
```shell
$ make CUDA=1 CUDA_HOME=/path/to/cuda SHARP_HOME=/path/to/sharp
```

Usage
=====


```shell
   $mpirun <mpi_options> -x ENABLE_SHARP_COLL=1  -x SHARP_COLL_LOG_LEVEL=3 sharp_coll_test <test options>
```
Example:

1. Run allreduce barrier perf test on 2 hosts using port mlx5_0
```shell
	$ mpirun -np 2 -H host1,host2 --map-by node -x ENABLE_SHARP_COLL=1 -x SHARP_COLL_LOG_LEVEL=3 ./sharp_coll_test  -d mlx5_0:1 --iters 10000 --skip 1000 --mode perf --collectives allreduce,barrier
```

2. Run allreduce perf test on 2 hosts using port mlx5_0 with CUDA buffers
```shell
	$ mpirun -np 2 -H host1,host2 --map-by node -x ENABLE_SHARP_COLL=1 -x SHARP_COLL_LOG_LEVEL=3 ./sharp_coll_test  -d mlx5_0:1 --iters 10000 --skip 1000 --mode perf --collectives allreduce -M cuda
```

3. Run allreduce perf test on 2 hosts using port mlx5_0 with Streaming aggregation from 4B to 512MB
```shell
	$ mpirun -np 2 -H host1,host2 --map-by node -x ENABLE_SHARP_COLL=1 -x SHARP_COLL_LOG_LEVEL=3 -x SHARP_COLL_ENABLE_SAT=1 ./sharp_coll_test  -d mlx5_0:1 --iters 10000 --skip 1000 --mode perf --collectives allreduce -s 4:536870912
```

For more test options: 
```shell
	$sharp_coll_test -h
```
