Science and Research Content

O’Reilly’s new survey explores tools and best practices for advanced analytics and AI -

O'Reilly, a source for insight-driven learning on technology and business, has announced the results of its 'Evolving Data Infrastructure' survey, which explores the tools companies are using for their advanced analytics and Artificial Intelligence (AI) projects and the best practices they have acquired along the way.

The research found that more than half (58 percent) of the present companies are either building or evaluating data science platforms - which are essential for companies that are keen on growing their data science teams and machine learning capabilities - while 85 percent of companies already have data infrastructure in the cloud.

Other key findings reveal that the companies are building or evaluating solutions in foundational technologies needed to sustain success in analytics and AI. These include data integration and Extract, Transform and Load (ETL) (60 percent), data preparation and cleaning (52 percent), data governance (31 percent), metadata analysis and management (28 percent) and data lineage management (21 percent).

Companies are building data infrastructure in the cloud. Eighty-five percent indicated that they had data infrastructure in at least one of the seven top cloud providers, with two-thirds (63 percent) using Amazon Web Services (AWS). The results also showed that users of AWS, Microsoft Azure or Google Cloud Platform (GCP) tended to use multiple cloud providers.

The use of durable cloud storage is prevalent. Sixty-two percent of all respondents indicated they used at least one of the following: Amazon S3 or Glacier, Azure Storage, or Google Cloud Storage.

The survey also highlights that the data scientists and data engineers are in demand. When asked what skills their teams needed to strengthen, 44 percent said data science and 41 percent said data engineering.

Respondents used a variety of streaming and data processing technologies. Half of the respondents (49 percent) used either Apache Spark or Spark Streaming, while other popular tools included open source projects (Apache Kafka, Apache Hadoop) and their related managed services in the cloud (Elastic MapReduce, AWS Kinesis).

Business intelligence uses a mix of open source and managed services. When it comes to SQL, respondents favored open source tools (Spark SQL, Apache Hive) and managed services in the cloud (AWS RedShift, Google BigQuery).

Although a majority (60 percent) aren’t using serverless technologies, one-third (30 percent) are already using AWS Lambda. In fact, 38 percent indicated that they were using at least one serverless technology - a pattern that remained consistent across geographic regions.

O’Reilly will present the full research findings at the Strata Data Conference, to be held on March 25-28, 2019 in San Francisco at the Moscone Center. The conference will bring together cutting-edge science and new business fundamentals to help attendees build a solid foundation for their AI strategy and machine learning initiatives. The event programming offers a deep dive into emerging data science techniques and technologies, including case studies, in-depth tutorials and emerging best practices.

Registration is now open and a limited number of media passes are available for qualified journalists and analysts.

Brought to you by Scope e-Knowledge Center, a trusted global partner for digital content transformation solutions - Abstracting & Indexing (A&I), Knowledge Modeling (Taxonomies, Thesauri and Ontologies), and Metadata Enrichment & Entity Extraction.

Click here to read the original press release.

STORY TOOLS

  • |
  • |

sponsor links

For banner ads click here