Open Call project, 2021 Neuroscience data processing and analysis

Total developer time: 4 months

Contact person: Dr. Rebecca Mease, Institute of Physiology and Pathophysiology

Outline

Voltage recordings from live mice brain experiments produce large amounts (terabytes) of raw data, which then require post-processing and a variety of further analyses. There are three main bottlenecks in the existing analysis setup which limit the possible size and length of experiments:

  • Processing/analysis software constraints limit the number of probes / density of experimental data that can be used
  • RAM constraints limit the length of experiments
  • Spike sorting is done manually, which is time consuming and does not scale well to larger experiments

The goal of the project is to overcome these limitations through improvements to the processing scripts, transferring the analysis pipeline to run on HPC resources, using standardized data formats and tools, and replacing the manual curation step with automated comparison of multiple spike sorting algorithms.

SSC Role

  • Improve/expand existing Python implementation for data preprocessing & standardization to Neurodata Without Borders (NWB)
  • Guide a student in optimizing Generalized Linear Modeling (GLM) code for batch processing.
  • Robustify existing Generalized Linear Model-Cross Correlation (GLMCC) connectivity algorithm.
  • Knowledge transfer to our group for sustainable development practices & version control, including assistance/mentorship in making existing analysis scripts more robust and compatible with NWB.
  • Time permitting: Develop Python implementation of Demixed Principal Components Analysis (dPCA) algorithm to run on large datasets.

Results

  • Reproducible HPC software installation
    • Installation scripts
      • Scripts to reproduce full installation on HPC
    • Documentation
      • How to register for and use HPC
    • Example scripts
      • Example analysis scripts with HPC job submit scripts
    • setup-jupyter
      • Command line tool for creating and interacting with Jupyter notebook jobs on HPC
    • added phy to bwVisu
      • Worked with URZ to develop a singularity container to deploy phy to bwVisu (remote GUI HPC service)
  • Enhanced processing pipeline
    • mease-elabftw
      • Python library for interacting with and extracting NWB metadata from experiments on eLabFTW
    • extended mease-lab-to-nwb
      • Spike sorting and processing pipeline
      • Added tests and created a repository of test data
      • Fixed bugs, improved performance, migrated to new spikeinterface API
      • Added laser stimulus data to output
      • Added integration with mease-elabftw to add NWB metadata to from eLabFTW experiment to output
    • improved performance of closed source library sonpy
      • created pybind11-numpy-example to motivate & explain the improvement
  • Improved performance of existing analysis / modelling codes using CPU and GPU resources and ported to HPC