Visual indexing tool promises to make public data residing on various cloud services simpler for data scientists to discover.
Quilt Data today launched a visual indexing tool that promises to make all the public data residing on various cloud services simpler for data scientists to discover. To foster adoption of that tool, Quilt Data also unfurled open.quiltdata.com, an online portal based on the company’s software through which data analysts and scientists can employ an Elasticsearch to discover 4,000 TB of public data spanning more than 100 topics from within a standard web browser.
See also: What are the 5 Most Common Data Visualization Mistakes?
While Quilt Data is initially focused on Amazon Web Services (AWS), the startup plans to extend the reach of its visual indexing tool to a wider variety of S3-compatible public data sources in addition to Microsoft Azure and Google Cloud platforms, says Quilt Data CTO Aneesh Karve.
The goal is to reduce the amount of time organizations waste trying to discover relevant public data, says Karve.
“You will be able to make more efficient use of your data scientists,” says Karve.
The productivity of data scientists is becoming a major issue because after giving six-figure salaries to data scientists many organizations are not enjoying the return on that investment as quickly as they like because data wrangling has proven to be a more complicated challenge than many initially anticipated.
Once data scientists do discover public data it can be massive. The Quilt Data platform makes it easier to browse, search and visualize data sets such as Terrain Tiles for planet earth that contain more than 1.2 billion terrain tiles to reveal elevations across planet earth. Another example of a massive data source is the Jupyter Notebook archive, which contains more than 1.2 million searchable Jupyter notebooks that demonstrate how to manipulate and model data. The Quilt platform makes it easier to identify which specific subsets of all that data is relevant to a business based on the query launched against the index engine.
The Quilt Data platform consists of a web application, Python client and suite of backend services. That approach allows Quilt Data approach to copy public data any user discovers directly on to a virtual private cloud that customers retain control over. Just as significant, Quilt indexes actual data versus merely providing a link to a Web page, says Karve. In addition, Quilt provides a mechanism through which organizations can more easily apply version controls to public data sources that tend to be frequently updated, adds Karve.
Quilt Data is also making available a Quilt Business, a commercial application that organization can buy to visually index their own data.
At the heart of many data science projects is an effort to analyze the data an organization already has against publicly available data. The challenge is identifying what public data is relevant enough to first feed into an artificial intelligence (AI) model alongside proprietary data and then determining at what rate to either update that AI model or replace it altogether. Those decisions will be largely driven by how often additional relevant data sources are discovered. The IT challenge, of course, will come down implementing the management processes that enable those updates to actually occur.