Books

Parallelization of Queries on Genomic Data

Apr 5, 2015 - LAP Lambert Academic Publishing

Recent progress in bioinformatics and especially high-throughput sequencing has enabled us to sequence and analyze genomes of many individuals, which can lead to improved diagnostics and treatment for patients suffering from genetic diseases. To achieve this, tools used in both clinical and research environments need to be enhanced to handle large amounts of data. This book analyzes application-level parallelization of database query processing by means of sharding as a technique for improving performance and scalability of an open-source search engine for genomic variants. We describe the challenges of designing and implementing a data access layer, the core of which is a general sharding framework. The approach allows for utilization of multiple processors as well as machines when querying the underlying data. This enables the system to scale in a near-linear fashion as more servers are added, with many queries achieving even superlinear speedup. This book should be useful to software engineers and scientists interested in an intriguing problem in the area of parallelization as well as anyone curious about what happens under the hood of modern genome analysis systems.


Journals

Federated discovery and sharing of genomic data using Beacons

Mar 4, 2019 - Nature Biotechnology

The Beacon Project (https://github.com/ga4gh-beacon/) is a Global Alliance for Genomics & Health (GA4GH) initiative that enables genomic and clinical data sharing across federated networks. The project is working toward developing regulatory, ethics and security guidance to ensure proportionate safeguards for distribution of data according to the GA4GH-developed “Framework for Responsible Sharing of Genomic and Health-Related Data”. Here we describe the Beacon protocol and how it can be used as a model for the federated discovery and sharing of genomic data.


Registered access: authorizing data access

Aug 2, 2018 - European Journal of Human Genetics

The Global Alliance for Genomics and Health (GA4GH) proposes a data access policy model—“registered access”—to increase and improve access to data requiring an agreement to basic terms and conditions, such as the use of DNA sequence and health data in research. A registered access policy would enable a range of categories of users to gain access, starting with researchers and clinical care professionals. It would also facilitate general use and reuse of data but within the bounds of consent restrictions and other ethical obligations. In piloting registered access with the Scientific Demonstration data sharing projects of GA4GH, we provide additional ethics, policy and technical guidance to facilitate the implementation of this access model in an international setting.


Theses

Beacon Network: A System for Global Genomic Data Sharing

Feb 15, 2016 - Master's thesis

Beacon is a genetic mutation sharing platform developed by the Global Alliance for Genomics and Health. It defines a web service that answers questions of the form "Do you have information about the following mutation?" and responds with one of "yes" or "no", and potentially more information. This work presents the Beacon Network, a search engine across the world's public beacons, and the driving force behind the specification of the Beacon API. The system enables global discovery of genetic mutations, federated across a large and growing network of shared genetic datasets.


Parallelization of Query Processing in MedSavant

Feb 10, 2014 - Master's thesis

This work analyzes application-level parallelization of database query processing by means of sharding. The approach is driven by the need for better performance and scalability of MedSavant, an open-source search engine for genomic variants. The result of this thesis is a module for MedSavant providing a complete data access layer, the core of which is a general framework for application-level sharding. The approach allows the application to utilize multiple processors as well as machines when querying the underlying data. This enables the system to scale in a near-linear fashion as more servers are added, with many queries achieving even super-linear speedup.


Application for Education Plans Management on Enterprise Portal Platform

Jun 24, 2011 - Bachelor's thesis

The thesis describes basic principles, methods and standards related to the development of applications for enterprise portals. It analyses management of education plans of employees in an IT company and gives a description of the portal application we designed and implemented, which allows us to manage these individual education plans. The thesis was developed for the company IBA CZ as part of the Association of Industrial Partners of the Faculty of Informatics.