The complete 1,000 Genomes Project is now available on Amazon Web Services (AWS) as a publicly available data set. This was announced by AWS and the US’ National Institutes of Health (NIH) at the White House Big Data Summit. Amazon Web Services LLC is part of US-based online retailer Amazon.com.
The announcement makes the largest collection of human genetics available to researchers worldwide, free of charge. The project is an international research effort coordinated by a consortium of 75 companies and organisations to establish a detailed catalogue of human genetic variation.
The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research. The 1000 Genomes Project aims to include the genomes of more than 2,600 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the public data set this year.
The NIH is part of the US Department of Health and Human Services, and serves as one of the data coordinators for the 1000 Genomes Project. Public Data Sets on AWS provide a centralised repository of public data stored in Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Block Store (Amazon EBS). The data can then be directly accessed from AWS services such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), eliminating the need for organisations to move the data in house and then procure enough technology infrastructure to analyse it effectively. AWS’s highly scalable compute resources are reportedly being used to power big data and high performance computing applications such as those found in science and research.
Alliances, Partnerships & Consolidations
More News in this Theme