Latest Posts

Pax Layer Basics

In this blog we are going to use a new library called praxis to create neural networks.
Building a classifier using JAX

In this blog, we will create a realistic dataset for binary classification. Then, we will use the JAX library along with other JAX-based ecosystem libraries like FLAX and OPTAX to train a LogisticRegression Model.
Using TPU runtimes in colab for JAXs

In colab, TPU runtime can be selected in the menu option Runtime -> change runtime type -> Harware Acclerator
functools partial

The functools module is for higher-order functions: functions that act on or return other functions. In general, any callable object can be treated as a function for the purposes of this module.
Building a simple java app with external dependencies

This blog describes, how we can build a simple java application using couple of external dependencies. This can help in decopuling certain components from a big corporate application and test it.
Installing tensorflow on M1 Macs

This blog provides step by step instructions for installing tensorflow on M1 Macbooks with Apple Silicon.
File transfer from blob storage using azure cli

Downloading and uploading files from blob should be simple but I often come across a lot of errors while doing so. This blob documents the steps that have worked for me.
Runing ML Training code on a VM

Krishan Subudhi 12/08/2020
Train a Covid19 Tweet sentiment classifier using Bert

Setup
Automatically activate conda evironment in Powershell for VSCode

VSCode automatically links conda environments in the integrated terminal through the python extension.
Host python code documentation using azure app service CI CD pipeline

This blog gives step by step guidance on.
1. Create webapp for python flask
2. Deploy to azure.
3. Add AAD authentication.
4. Create CI CD pipeline using azure devops.
5. Create documentation using mkdocs.
6. upload mkdocs documentation to a separate static html webapp.
7. Set up CI CD for mkdocs
Visualizing Bert Embeddings

Set up tensorboard for pytorch by following this blog.
Using PyTorch 1.6 native AMP

This tutorial provides step by step instruction for using native amp introduced in PyTorch 1.6. Often times, its good to try stuffs using simple examples especially if they are related to graident updates. Scientists need to be careful while using mixed precission and write proper test cases. A single mis-step can result is model divergence or unexpected error. This tutorial uses a simple 1x1 linear layer and converts an FP32 model training to mixed precission model training. Weights and Gradients are printed at every stage to ensure correctness.
Zero shot NER using RoBERTA
```
import torch
import transformers
```
Type faster using RoBERTA

The goal of the experiment is to detect and correct the mistakes during fast typing on phone while using the swipe feature. Fast gestures in swipe currently produce some wrong results and there is no flagging/correction done after a sentence is typed. User has to go back and check correctness or reduce the swiping speed. Using language models we can detect the mistakes and improve the typing speed.
Using Tensorboard efficiently in AzureML

Begin logging stats to tensorboard from your training scripts by following this AzureML documentation.
Using Tensorboard in Pytorch

Clear everything first

Using Tensorboard in Tensorflow-Keras (windows version)

# https://www.tensorflow.org/install/pip
# !pip install tensorboard
# !pip install tensorflow-cpu

Powershell bashrc equivalent

Linux has a file called .bashrc which gets executed whenever a new terminal starts. This .bashrc file is generally used for
Resize image using Python

Resize image in python using pillow library
Issue with Gradient accumulation while using apex (Fix included)
1. Nvidia Apex is used for mixed precission training. Mixed precission training provides faster computatio using tensor cores and a lower memory footprint.
2. Gradient accumulation is used to accomodate a bigger batch size than what the GPU memory supports. If my gradient accumulation is 2, I will be doing optimizer.step() once in every 2 steps. For steps where optimizer is not stepping up, only the gradients are accumulated.
3. In distributed training, gradients are averaged across all the processes at every loss.backward step which is also called the all-reduce step.
4. Apex mixed precission training does the communication in floating point 16.
5. Even with floating point 16, doing reduction at every step can be costly. To avoid reduction at every step, an obvious optimization will be to avoid reduction when optimizer is not stepping up.
NVIDIA mixed precission training

Amp: Automatic Mixed Precision
Undo a git rebase

Suppose you did a git rebase in your local branch but mistakenly rebased to an older branch and pushed changes to remote, then here is the solution to revert your changes and go back to the previous state.
Challenges of using HDInsight for pyspark

The goal was to do analysis on the following dataset using Spark without download large files to local machine.
Insertion transformer summary

Insertion Transformer paper
Spark Quickstart on Windows 10 Machine

Apache Spark™ is a unified analytics engine for large-scale data processing.
PyTorch distributed communication - Multi node

WRITING DISTRIBUTED APPLICATIONS WITH PYTORCH

Bert Attention Visualization

#!pip install pytorch_transformers
#!pip install seaborn
import torch
from pytorch_transformers import BertConfig,BertTokenizer,  BertModel

How to create a new docker image

Steps to create, test and push a docker image
LAMB paper summary

LAMB paper
Bert Memory Consumption

This document analyses the memory usage of Bert Base and Bert Large for different sequences. Additionally, the document provides memory usage without grad and finds that gradients consume most of the GPU memory for one Bert forward pass. This also analyses the maximum batch size that can be accomodated for both Bert base and large. All the tests were conducted in Azure NC24sv3 machines
Introduction to Transformers

What are Transformers?
Contingency table and Chi-squared distribution

Contingency table

Contingency table and chi squared distributions are used to determing if two categorical varibles are independent or not.
Azure Machine Learning Tutorial

Original Documentation Video link: https://channel9.msdn.com/Events/Connect/Microsoft-Connect–2018/D240/
PyTorch IsRead Predictor on my email

This is a GRU based RNN classifier to predict the read probability of a user from his/her email data.
PyTorch BERT
```
#! pip install pytorch-pretrained-bert
```
Using BERT

BERT Paper
PyTorch RNN

A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle.
Word analogy using Glove Embeddings

Word Embeddings
Convolution Explained

Convolution

Convolution is the building block of Convolutional Neural Networks (CNN). CNNs are used both for image and text processing. Online diagrams do a great job explaining CNNs. I, however failed to find a good diagram with explanation of the convolution operation. This diagram aims to explains the details of convolution operation in a neural networks. I have also provided python scripts explaining details of the convolution operation inside pytorch.
Activation and Loss function implementations

Deep learning Functions
How to use Jekyl!
1. Quickstart
  - Install a full Ruby development environment
  - Install Jekyll and bundler gems
    gem install jekyll bundler bundle add jekyll-sitemap bundle install
  - Create a new Jekyll site at ./myblog
    jekyll new myblog
  - Change into your new directory
    cd myblog
  - Build the site and make it available on a local server
    bundle exec jekyll serve
Python Flask web application in azure linux

Even though the tutorial involves azure, the instructions will work in any ubuntu based linux machine.

Latest Posts

Setup

WRITING DISTRIBUTED APPLICATIONS WITH PYTORCH

Steps to create, test and push a docker image

What are Transformers?

Contingency table

Using BERT

Word Embeddings

Convolution

Deep learning Functions