In the field of histopathology, a new solution called QUILT-1M has been introduced. This solution aims to address the shortage of comprehensive datasets by utilizing the wealth of information available on YouTube. QUILT-1M is a massive dataset consisting of 1 million image-text samples, making it the largest dataset for vision-language histopathology.
Historically, the scarcity of datasets has been a hindrance in histopathology research. QUILT-1M solves this problem by offering a unique contribution to histopathology knowledge and providing rich textual descriptions from educational videos. The dataset also includes multiple sentences per image, offering diverse perspectives.
To curate this dataset, the research team used a combination of models, algorithms, and human knowledge databases. They expanded the dataset by including data from Twitter, research papers, and PubMed. The quality of the dataset is evaluated using various metrics.
Compared to existing models like BiomedCLIP, QUILT-1M outperforms in different tasks related to sub-pathology types. QUILTNET performs better than other models in various zero-shot tasks across different sub-pathologies. The potential of QUILT-1M is highlighted, benefiting both computer scientists and histopathologists.
Overall, QUILT-1M is a significant advancement in histopathology, providing a large, diverse, and high-quality dataset. It opens up new possibilities for research and the development of more effective histopathology models.
About the Author
Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology (IIT), Kharagpur. She has a keen interest in the field of AI and ML.
If you like our work, you will love our newsletter. Subscribe here to stay up to date with the latest AI research news and projects.