Implementing best practices in bioinformatics, issues of data and computational tool access that will affect data science in the near future

David Molik

Scientific Informatics Developer, Bioinformatics Shared Resource, Cold Spring Harbor Laboratory

Bioinformatics research is dependent on the computational analysis tools and software libraries, as well as data libraries, created by scientists. Tools and libraries are published, which have the dual purpose of proving the scientific validity as well as the announcement of the software. Often, there are problems with the publishing of tools and libraries: Software dependencies may not be documented or automatically included, Data may not be reproducible because the methods and software to create are not included, Code may not be maintained and software environments may have changed. Inevitably the usability and scientific validity of code and data affects its use and citations. The introduction of Version Control, Modularized Code, and other Best Practices for Scientific Computing can help provide benefits such as reproducibility and ease of use. However, the adaption of these practices can be slow to nonexistent. Experiences in the contention of attitudes, and the design of solutions, as well teaching, present the footing on which the benefits of use of these best practices can be reached, furthermore activities and experiences are reviewed towards the ends of teaching best practices as well as the analytic abilities of uses. We present centralized software repositories, examples of documentation, education, and software utilized to analyze data in Bioinformatics