A few weeks ago, I covered Tableau’s release of a dataset updated daily with a simplified presentation of Johns Hopkins Center for Systems Science and Engineering(JHU’s) global COVID-19 dataset. It was an important step in the democratization of the data, allowing people to connect with and analyze them on a self-service basis. I was such a person and performed a few simple analyzes of what I shared in the post.
Meanwhile, people are hungry for more. Data enthusiasts and epidemic specialists want access to measurements beyond confirmed cases and deaths, as well as demographic data beyond the scope of COVID-19. There is a lot of public data out there, but tracking them down, cleaning them, mixing them and modeling them is not trivial. So now various companies in the data room are working to address pain points and make it easier to work with this wider range of data.
OLAP to COVID
Let’s start with AtScale, the San Mateo- and Boston-based company that focused on OLAP over big data in the cloud. The company today announces its COVID-19 Cloud OLAP Model, ready for drilling analysis. AtScale hosts the model on its own platform and makes it available for queries for free. Datasets include Stars chart: COVID-19 Epidemiological data, which is available through the Snowflakes Data Exchange and data from Boston Children’s Hospital COVIDNearYou.org. AtScale says their model is updated daily, as are the source data sets.
To access the AtScale model, interested parties can request access here. AtScale will respond with an email containing login information and connection instructions. Attached to this email are fully developed Excel and Tableau workbooks based on the model (the Excel ones are pictured above). Users can open these workbooks, plug in their unique user ID and password, and then start cutting, cube and analysis.
Databricks provides easy data access, launches hackathon
In the meantime, data Bricks, Whose Kick-based platform acts as a workbench for data engineers and data scientists, also adds value to the COVID-19 data scene. For starters, Databricks has added various COVID-19 datasets to be found naturally on its platform (on both Amazon Web Services and Microsoft Azure clouds). Specifically, developers can find the data in the “/ databricks-datasets / COVID /” directory built into the Databricks file system (DBFS), either on the paid service or the free Community Edition. In other words, any Databricks cluster will spin up and COVID-19 data will automatically be in its file system. The company has also created examples of workbooks that show how to open and analyze the data – details of the datasets and links to the laptops can be found in a blog posts by Databricks’ Denny Lee.
In addition to data availability and in coordination with Databricks’ future Spark + AI Summit virtual event, Databrick’s launches a related hackathon under the banner “Datateams unite!” Teams participating in the hackathon are asked to focus on COVID-19, climate change or challenges in their own communities (using open data resources provided by national, regional, state and local authorities). As Databricks’ event is a virtual event this year and is free, the company expects a significant increase in attendance and hopes to see a robust participation in the hackathon. Teams of up to 4 people can participate in the hackathon. Three finalist teams are selected and Databricks makes direct donations to charities of the team choice; the winner of the grand prize will also receive free training and a ticket to a future Spark + AI event. The hackathon begins today, and signups are expected June 12. Assessment takes place between June 15 and 19.
Viewer and others
A number of other companies have their own offers. For example, just yesterday, looker, now part of Google Cloud, announced yesterday it COVID-19 data block, including LookML models, ready-to-run dashboards, and Looker “explorer” (which allows ad hoc cutting and cube of the data). The Looker offer uses COVID-19 data that its superiors have made available free of charge on its BigQuery service (details here), and is offered on a hosted example of Looker, which is also free. Data in the models is taken from JHU, The New York Times. COVID tracing project. Definitive healthcare, the Kaiser Family Foundation, and Italy’s Dipartimento della Protezione Civile.
And there is more. Starschema and snowflake has teamed up to offer a data sharing that is preloaded with COVID-19-related data (it’s one of the data sources used by AtScale in its model). The share is available to current Snowflake customers or those with trial accounts; request access here. Yellow Brick provides free access to its data warehouse service to help researchers and companies actively working on a COVID-19 vaccine (details here). MariaDB offers healthcare, medical and academic non-profit organizations fighting for COVID-19 free access to MariaDB SkySQL. Location Intelligence Focused HERE Technologies offers its Coronavirus COVID-19 site tracking. Not enough? Even more resources are available data.worlds Coronavirus (COVID-19) Data Resource Hub.
There are many resources out there that go far beyond CSV files. Specialists who focus on the crisis have plenty of choices; It should help them gain insight into – and hopefully good policies and effective protocols – faster. And if you’re a non-specialist and find yourself at home because of closure, need a project to focus on, you might also benefit from all these great COVID-19 data resources.
Updated April 22 at 4 p.m. 12:40 ET to review Databrick’s hackathon due date and adjudication period from May 29 and June 1 to June 5 through June 12 and June 15, respectively. June.