Blazegraph, a provider of highly scalable software for solving complex graph and machine learning algorithms, has announced version 2.1.0, with significant updates that give users faster, easier access to key data sets, such as new support for processing geospatial coordinates and optimizing queries against the National Center for Biotechnology Information's (NCBI) PubChem database.
In addition, Blazegraph 2.1.0 delivers new tools that enable semantic search on even the largest data published in the Linked Open Data structure, which is heavily used in global publishing, cultural and open government projects. To deliver the speed and performance needed to work with these massive data sets, version 2.1.0 includes significant improvements to its bulk load and query performance capabilities.
Blazegraph 2.1.0 users are already powering complex SPARQL queries to quickly uncover new insights. For instance, Wikidata, the free knowledge base community, has deployed version 2.1.0 to power its query service. With this, data experts are using the geospatial capabilities to, for example, create graphs such as shared state borders in the United States, this map of all earthquakes, and this map of chemical elements and their discovery locations.
Another Blazegraph user, Seven Bridges, is a biomedical data analysis company selected by the National Cancer Institute to develop the Cancer Genomic Cloud program. This first complete ecosystem gives cancer researchers immediate access to one of the world’s largest genomic data sets — The Cancer Genome Atlas (TCGA) — and the computational resources to analyse it.
Businesses and governments are making their data accessible for complex analysis and deep learning. To utilize this data, researchers in a wide range of fields — including materials science, precision medicine, genomics and cyber security — need new tools to achieve insights and innovative results. Blazegraph 2.1.0 makes it even easier for businesses and researchers to leverage graph databases in these complex, data-intensive use cases with its exceptional combination of standards support and features for building graph applications at very large scale, up to 50 billion nodes on a single server.
Blazegraph 2.1.0 provides a new API that enables users to store latitude and longitude coordinates directly within the database, enabling users to integrate even the largest geospatial searches into their query. This feature supports a wide range of convenient and powerful capabilities, ranging from proximity searches to more complex routing and topology analysis. It has been shown to deliver sub-second graph queries to geolocate mobile devices running over billions of edges on the Amazon EC2 platform.
PubChem is a public repository for information on millions of chemical substances and their biological activities. Consisting of three interlinked databases (substance, compound and bioassay), PubChem is a critical resource for life science and materials science applications.
Blazegraph 2.1.0 includes a pre-configured integration with the PubChem vocabulary, enabling researchers to download the PubChem core data set into an Amazon EC2 instance with minimal set-up and configuration. They can search billions of chemical structures and combine that data with other information to research interactions, develop new compounds, and use in new, innovative applications.
More governments and corporations are using the W3C standards, known as the Linking Data (LD) project, to overcome the challenges of sharing data and making it transparent and readily searchable. Widely used in applications, such as open government, publishing and global heritage projects, this data is messy, disconnected and subject to unexpected structural updates. Data scientists need tools to connect and query data from many different and shifting sources. However, today's tools are not capable of scaling to handle the vast quantities of information available in LD projects. Blazegraph 2.1.0, with integrated support for emerging data indexing and interchange standards such as JSON-LD and Linked Data Fragments (LDF), will become the platform of choice for exploring and analyzing this data.
Also included in this release is the Blazegraph-based TPF Server, a LDF server that provides a Triple Pattern Fragment (TPF) interface using Blazegraph Database as the backend. Developed in partnership with Dr. Olaf Dartig, a researcher at the Hasso Plattner Institute, the TPF server fills the gap in delivering open data at true web scale, and will be critical as LD becomes more widely used.
Since its inception, Blazegraph has been committed to delivering solutions for data scientists and researchers who need to work with billion-edge data sets but have hit a scalability wall with other graph database solutions. Blazegraph 2.1.0 enables them to load and process data sets at least 20 percent faster than previously possible. Other performance enhancements include "out of the box" compatibility with popular frameworks, such as text indexing library Apache Lucene 5.5.0. As part of this release, Blazegraph announced that it will be migrating towards Github for the open source releases in the future. Releases will still be available on Sourceforge.
Brought to you by Scope e-Knowledge Center, a world-leading provider of metadata services, abstraction, indexing, entity extraction and knowledge organisation models (Taxonomies, Thesauri and Ontologies).