top of page
lbschreiber

2024 FARR Workshop Brings Together Experts for Innovation at Intersection of AI and Data

Written by Kimberly Mann Bruch, SDSC Communications


Funded by the U.S. National Science Foundation, the FARR Research Coordination Network (RCN) focuses on three themes: the intersection of the FAIR Principles and Machine Learning, AI Readiness and AI Reproducibility. The network held a workshop on October 9-10 in Washington D.C. with a significant gathering of interdisciplinary researchers from academia and government to address the community’s challenges and advancements. 


“The scientific community has scarcely figured out how to implement the FAIR principles that strive to make research objects such as data, software and machine learning models, findable, accessible, interoperable and reproducible. Now we layer on the new challenges of promoting ‘Open Science’ in the age of deep learning: generative-AI and LLMs,” said FARR-RCN Principal Investigator Christine Kirkpatrick, who also directs the Research Data Services Division at the School of Computing, Information and Data Sciences’ San Diego Supercomputer Center (SDSC) at UC San Diego. “Our workshop emphasized gaps such as a lack of FAIR implementation advice for these topics, the myriad of definitions for AI reproducibility - sometimes in conflict with one another and gaps in AI readiness across domains like geoscience and biomedical research. The community was tremendously optimistic about AI readiness driving better metadata and data curation, and the momentum from the computer science community to influence other domains through their requirement of AI reproducibility checklists for paper submissions at premier AI conferences.”


Kirkpatrick said that key insights included discussions on improving AI reproducibility, FAIR data practices and leveraging large language models (LLMs) for increasing the FAIRness of domain-specific applications. Emerging themes among discussions during the workshop included the need for automation in dataset preparation, better standards for metadata and the role of private AI models tailored to specific contexts.


One of the most lively sessions was on AI reproducibility, where experts highlighted three dozen sources of irreproducibility - such as hyperparameter optimization and random weight initialization - as well as methods to improve data reproducibility through improved tools and community outreach. Another key discussion focused on the challenges of making data AI ready from the perspective of repository providers. Aside from implementation advice, presenters pondered how to prioritize innovations, especially for smaller or underresourced repositories. 


Participants heard from each of the AI Readiness cohort projects including Sanjib Sharma of Howard University. With a small mini-grant from FARR, Sanjit and his collaborator, Yogesh Bhattarai, incorporated generative-AI and LLMs into an undergraduate Earth Sciences course. Another mini-grant recipient, Denys Godwin from Clark University, discussed their project to map rooftop solar across New England using a combination of AI models. Their project output will eventually be incorporated into a solar rooftop registry to aid other researchers currently struggling with inaccurate AI assessments drawn from satellite imagery.


The sessions also underscored the growing importance of interdisciplinary collaboration, with the AI research community recognizing the gaps in standards and best practices across various fields. The need for sustained effort in FAIR and AI readiness was made clear, and participants voiced the need for increased funding, training and community-driven solutions. In summary, the workshop set the stage for future research directions - prioritizing AI reproducibility, AI ready data and effective community engagement for a more integrated, reproducible and FAIR-aligned AI ecosystem.


For more details on the workshop, refer to the FARR Workshop 2024 website.



Comments


bottom of page