Using Tensorboard efficiently in AzureML
Begin logging stats to tensorboard from your training scripts by following this AzureML documentation.
This tutorial focuses on improving the client side experiment. Azure SDKs give basic functionality to view tensorboard logs in local machine.
This tutorial wil enable
- Viewing tensorboard logs across experiments and workspaces.
- Running tensorboard in a VM and share links
- More control over the tesorboard process
The approach is to download the experiment logs using azureml-sdk APIs and starting tensorboard manually.
First, install azureml sdk
conda install notebook ipykernel
ipython kernel install --user --name aml --display-name "Python (aml)"
pip install azureml-sdk[notebooks]
In case of pyyaml error
pip install --upgrade azureml-sdk\[notebooks\] --ignore-installed PyYAML
Check current running processes in port 6006. Import azureml packages.
! ps -axu | grep 6006
from azureml.core import Experiment, Workspace
from azureml.core.workspace import Workspace
from azureml.tensorboard import Tensorboard
expr_runs = []
Workspace 1
subscription_id = None # enter from portal
resource_group = None # enter from portal
workspace_name = None # enter from portal
ws = Workspace(subscription_id, resource_group, workspace_name)
print(ws)
expr_runs.extend([
(ws, 'WS1_myamlexp1','latest'),
(ws, 'WS1_myamlexp2','running')
])
Workspace 2
subscription_id = None # enter from portal
resource_group = None # enter from portal
workspace_name = None # enter from portal
ws = Workspace(subscription_id, resource_group, workspace_name)
print(ws)
expr_runs.extend([
(ws, 'WS2_myamlexp1','running'),
(ws, 'WS2_myamlexp1','WS2_myamlexp_12324')
])
Fetch run ids from azure
runs = []
print([e[1] for e in expr_runs])
max_runs = 10
for ws, experiment_name, run_id in expr_runs:
exp =Experiment(workspace=ws, name=experiment_name, _create_in_cloud=False)
for run in exp.get_runs():
if run_id == 'running' and run.status == 'Running':
runs.append(run)
elif run_id == 'latest':
runs.append(run)
break;
elif run_id == 'all' or run.id == run_id:
runs.append(run)
if len(runs) >= max_runs:
break
elif run.id == run_id:
runs.append(run)
break
print(f'Found {len(runs)} runs')
Set Port
port = '6006'
local_root = 'logs_'+port
Delete previous runs
! ls $local_root
! rm -rf $local_root
! mkdir $local_root
Download logs to local disc
Azure API documentations
download_file for a single file
download_files for bulk download
def download():
import os
for run in runs:
output = os.path.join(local_root, run.id)
print(f'\nChecking event files for {run.id} status = {run.status}, output = {output}')
run.download_files(prefix = 'logs/',output_directory=output)
print(os.listdir(local_root))
download()
Run tensorboard
To keep it running in background, run this in a shell and don’t run the below commands. But unless restarted, tensorboard keeps showing of deleted logs too.
(azureml_py36_tensorflow) elr@elr-tensorboard:~/notebooks$ nohup tensorboard --logdir krishan/logs_6006 --host 0.0.0.0 --port 6006 & 2> krishan/tb_run.log
!dir $local_root
!tensorboard --logdir $local_root --host 0.0.0.0 --port $port
Open https://<IP>:6006
in your browser to view tensorboard graphs
Stop this cell and run two last cells to refresh the graphs and download new data.
If not intending to stop tensorboard while refreshing, run tensorboard in background for your specific port.