My Profile Photo

Kanishk Jain


I am a Physics PhD candidate at Berman lab, Emory University studying the Physics and Neuroscience of animal behavior using Machine Learning.

Open to work starting May 2023!


Open-source deep learning tools for Markerless Pose Estimation

Example postural tracking on a rat dataset using DeepLabCut. Video dataset on left taken by brilliant colleagues Elena Menichini and Tomaso Muzzu at the Saleem lab, UCL London.



Markerless pose estimation of animals (particularly rodents) in unrestricted environments has played a key role in my doctoral research. Markerless is important here - since it means no piercings/IR markers/odors on the body or skeleton that might impede or affect natural behavior of the animal. Having access to reliable and remarkably accurate way of extracting pose from videos has opened many new avenues of experimental designs in Neuroscience and Physics of behavior that were previously closed. There are now quite a handful of such network architectures and toolboxes and it is becoming increasingly hard to stay on top of all these cool methods. So, here I create a note of such methods for future reference for me, and maybe you might find a new tool to add to your arsenal as well.

I will do my best to update this post as I come across any new additions.

Oh wait - I almost forgot one of the most important aspect of these tools. These are open-source softwares which are free to use (yay!) and licensed with minimal restrictions. Go open research!

So here goes a list (in no specific order):

  1. DeepLabCut (or DLC)
    Deeplabcut uses transfer learning (using a pre-trained weights) and features a ResNet (an overview here) backbone with a few convolutional layers added on top to convolve to heatmaps. It uses transfer learning and uses pre-trained weights to ease up the training process, and does surprising well with a training dataset of just a few hundred frames. Since it first came about there has been quite a few significant updates to it, including nice GUIs for labelling and training process, 3D triangulation using 2 camera recordings, and recent support for multi-animal tracking.

  2. sLEAP Social LEAP (sLEAP) tries to tackle the social (aka multi-animal) tracking by taking two approaches - a top-down approach which looks for individual animals first and then tracks body parts on each animal (in single animal cropped images), and a bottom-up approach which finds all instances of bodyparts in the image and groups them together for different individuals. They propose their approach as backbone network agnostic and show examples with the ResNet and UNet architectures as the backbone.

  3. DeepPoseKit DPK uses a stacked encode-decoder like model for tracking and provides a nice and slightly custom implementation of DLC and LEAP (precursor of sLEAP with a lightweight but capable network) architectures, employing connections between body-parts
    in the heatmaps at the output layers. It also allows tradeoff between network speed and accuracy by cleverly allowing tuning of the heatmap dimension.

In addition to these networks, there has also been numerous other toolkits that enhance the capabilites of tracking toolkits above. In no particular order, these are -

  1. Aniopose Anipose provides the ability for 3D tracking from keypoint detection in 2D images using Deeplabcut.

  2. Open Monkey Studio Provides a very nice implementation for 3D tracking of Rhesus Macaques using multiple camera systems, and is backbone agnostic (DLC, DPK, LEAP).

  3. LiftPose3D LiftPose3D is a tool that predicts 3D skeletons from 2D tracked points - from a single camera! You do require some 3D tracking to train the network first.

  4. OptiFlex OptiFlex predicts subsequent heatmaps from model outputs (DLC, DPK, LEAP) using optical flow and uses a convolution to essentially do a weighted sum on a batch of consecutive optical flow predicted heatmaps to generate the final heatmap for bodypart inference.

  5. DeepGraphPose DGP is built upon DLC and use a probabilistic graphical model that subjects the heatmaps from DLC to spatial and temporal constraints using structured variational inference.

[saleemlab]: