One million molecules docked


I did my postdoc in a lab where docking of one million molecules was easy. However, while few select people dock even billions of molecules today (Ref1, Ref2), docking of one million molecules is still a challenge for most people – and a low budget.

Docking of one molecule. Pose #4 (of the 10 poses) overlaps exactly with the crystallographic ligand (green) which is the assumed correct solution. In the experiment described here, the above was repeated with more than one million molecules.

To address the challenge, I used VirtualFlow screening workflows and 108 CPUs of my Threadripper to dock 1.25 million molecules from the REAL library within 16 hours. The docking was done with Quick Vina 2 and the settings were as described in this Nature paper from 2020 for a Stage-1 screen (exhaustiveness = 1). On average it took 5 seconds per molecule and CPU.

16 hours for 1.25 million molecules will not impress experts. But for CADD Consulting this is a great step forward. Upcoming next: Docking of 1 million molecules in the cloud. How fast will this be, what is the cost?

CPU usage of docking jobs
CPU usage on Threadripper during 16 hours. Maximum is at 85% because 108 out of 128 CPUs were used.

Since my start as independent CADD consultant, I am definitely appreciative of cost effective software. The setup here used exclusively freely available software. This comes at a price: there was testing of the slurm cluster software, and quite some debugging of the VirtualFlow scripts necessary to dock one million molecules. This is a price you may pay for free software. What is your experience with free software?