Docking billions of molecules


by

The vast chemical space available for screening is expanding rapidly. Molecular docking is a powerful method to screen molecules in silico. In-house clusters often fail to scale effectively for this task, making cloud-based docking an attractive alternative.[1]

Advancements in Large-Scale Docking

In 2020, an open-source drug discovery platform elevated the field by enabling large-scale docking screens.[2] Building on this innovation, VirtualFlow 2.0 (VF) was introduced in 2023, allowing adaptive virtual screens of 69 billion molecules and pushing the boundaries of large-scale docking.[3] CADD Consulting GmbH tested VirtualFlow screens[4] against the comprehensive ‘ready-to-dock’ 69 billion molecule Enamine Library.[5] Our evaluations focused on setup, user-friendliness, docking speed, and the cost-effectiveness of VirtualFlow 2.0.

Methods and Results from VirtualFlow 2.0 Evaluation

The findings of this evaluation were first presented in a poster at the European User Group Meeting (UGM) of the Chemical Computing Group (CCG) in Lyon, in May 2024.

Feel free to download and redistribute the poster below. (663 KB, CC-BY-4.0 license)

We docked between 1.0 and 1.4 million molecules from the SPARSE10 REAL Library and fully enumerated Enamine REAL libraries using Qvina2 with exhaustiveness=1. The targets for this study were glucokinase and CSF1R.

Since VF is optimized for AWS, we used this cloud. The docking speed observed with Qvina2 on AWS was moderate at 6.5 seconds per molecule per vCPU. This efficiency translated to compute costs ranging between $30-$40 per 1 million molecules.

Notably, the top 500 molecules from the CSF1R primary screen featured one cluster of diazanaphthalen-ones, which form crucial H-bonds to the kinase hinge. This scaffold presents an interesting variation on the well-known quinazolinones, meriting further testing.

Discussion and Outlook: The Impact of ATG-VS

VirtualFlow is amazing in three ways: it is open source, it can be run using free software, and it offers the largest available library of 69 billion molecules as per today. VirtualFlow’s implementation is a great example of open-source software engineering, featuring comprehensive documentation. Setting up the system requires substantial Linux expertise and a proactive approach to problem-solving.

Adaptive Target-Guided Virtual Screening (ATG-VS) represents a cost-efficient method for screening the 69 billion molecule space. While the preprint lacks biological validation experiments, these are crucial for confirming the efficacy of this innovative approach.

At a docking cost of $30-$40 per 1 million molecules using AWS and Qvina2, this method is significantly more economical than comparable screening processes. We anticipate that this will reduce the time to identify promising hits and, consequently, accelerate the time-to-market compared to current technologies.

References

[1] Tingle BI and Irwin JJ (2023). JCIM. Large-Scale Docking in the Cloud. DOI.
[2] Gorgulla C. et al. (2020). Nature. An open-source drug discovery platform enables ultra-large virtual screens. DOI.
[3] Gorgulla C. et al. (2023). bioRxiv [Preprint]. VirtualFlow 2.0 – The Next Generation Drug Discovery Platform Enabling Adaptive Screens of 69 Billion Molecules. DOI.
[4] Github repository for VirtualFlow: https://github.com/VirtualFlow/
[5] VirtualFlow website: https://virtual-flow.org/