Tuning¶
Have Maestro Core perform.
Architecture¶
The Pool Manager is a multi-threaded app that can be expected to need some memory to fit metadata, which grows with the number of CDOs in the workflow.
The Pool Manager communicate with pool clients, that is Maestro resources on
the client side. Applications using Maestro Core transparently use extra
threads for Maestro usage, threads for pool communication (MSTRO_PM_PC_NUM_THREADS
) and
threads to handle operations (MSTRO_OPERATIONS_NUM_THREADS`
) to
parallelise pool operations at the network level.
Knobs¶
Maestro features a range of environment variables that can be played with.
MSTRO_LOG_LEVEL
Log level (Error|Warning|Info|Debug). (Optional)
MSTRO_LOG_MODULES
Selection of modules that should log (Optional) Typically
MSTRO_LOG_LEVEL=Info MSTRO_LOG_MODULES="stats"
Says Maestro Core should record logs up to
Info
level, but just for the stats module. This would typically be used for a benchmarking run, where we do not want to record logs that would slow us down, but still want the stats report atmstro_finalize()
time.MSTRO_LOG_COLOR
Log color. (green|blue|brightblue|…) (Optional). Typically helps visually parse small logs between a couple apps.
MSTRO_LOG_DST
Select logging output channel. (stdout|stderr|syslog) (Optional)
MSTRO_TRANSPORT_DEFAULT
Which transport method to choose by default. (RDMA|GFS|MIO) (Optional)
MSTRO_TRANSPORT_GFS_DIR
Directory for GFS transport. (Optional)
MSTRO_OFI_NUM_RECV
Number of concurrent receive operations per endpoint. (Optional)
MSTRO_PM_PC_NUM_THREADS
Number of threads servicing OFI completion events. (Optional)
MSTRO_OPERATIONS_NUM_THREADS
Number of threads handling maestro operations. (Optional)
Maestro is relying on OFI for network operations, therefore the usual OFI knobs can also be played with.
Telemetry¶
Maestro Core log lines look like
[I:pm] Simple_Pool_Manager:0 1 CQ-H-0-0 (nid00001 777) 22222479341864000: mstro_pm__handle_join_phase2(pool_manager.c:2540) JOIN message received. Caller Client:2 is now known as app #2
Which reads as
[<log level>:<log module>] <component_name>:<rank_id> <app_id> <thread_id> (<hostname> <pid>) <timestamp>: <function>(<file>:<lineno>) <message>
Profiling¶
A couple of utilities shipped with Maestro core may complement well existing profiling tools reports to analyse Maestro-enabled workflows:
$(MAESTRO_PATH)/examples/core_bench
runs a benchmark that shows some basic numbers$(MAESTRO_PATH)/visualise/vis.py
proposes an in-browser interactive visualisation of a Maestro-enabled workflow$(MAESTRO_PATH)/examples/transport_bars.py
plots timings of Maestro operations relative on transport, based on a Maestro logs input
Scheduling¶
TODO