Since the beginning of the COVID-19 outbreak, many developers quickly began tapping into the growing number of authoritative data sources to better understand, visualise and model the data – mainly out of curiosity (me) or for a business purpose for their company or clients.
Reports, Dashboards and data sources started popping up in many social networks that cover Business Intelligence (BI) topics – early on it ranged from basic reports (not so visually appealing with unfortunate inaccuracies) to today with many professionally designed and enterprise level informative solutions, which add real value to individuals and businesses globally.
Being a professional Microsoft Developer in Power BI, Power Apps, Office 365 and Azure – the topic relates generally to development in Power BI – but of course any BI tool can be used to achieve similar outputs.
Many Power BI examples published by individuals can be viewed on the community Data Stories Gallery:
John Hopkins University quickly became the go-to source of truth, having first seen it cited as a source on the news myself – and after some quick searching online, it also seemed they were way ahead of other sources by offering the data in Git Hub, in a machine readable CSV format updated daily. Many global and national organisations contribute to the data, including the WHO and ECDC.
GOV.UK, NHS, ONS and PHE have also publicised datasets in varying formats that I have later incorporated into the data model.
Power BI has great methods of connecting to web sources – and being able to schedule refreshes and transform source formatting to suit the developers or customers’ requirements.
If you need it – have a look yourself:
JHU CSSE have used ArcGIS to create a very informative and interactive Report, which from now observing many examples elsewhere seems to serve as the basis of inspiration for many reports, and rightly so.
John Hopkins University
This report itself has evolved over time, most recently to include testing in the US, and the data has also undergone change – not always to a developers benefit! So, a word of caution on using any Open Data Source that is required to be refreshed on a schedule or may pertain decision making, check the source regularly! Put in place data quality checks and regular reviews of the underlying sources – that’s a given for anything obtained from the Web rather than an organisational data source such as a Data Warehouse, SQL database or internally maintained source such as Excel or CSV.
GOV.UK have kept the approach simple, on a single page – but have not yet offered a direct URL or repository for the data (that I have yet found).
So this cannot easily be automated without programmatic solutions currently – hopefully this may improve in future.
Actual data aside, BI has always got to serve the needs of the end user/consumer being able to interpret the data and gain insight.
I will admit, some other BI tools offer some better out of the box visualisations and ways to represent data in exciting ways. However, as Power BI is typically an enterprise solution for business – not ‘fun’ (I know – arguable if you are a nerd!), the out of the box visuals and Microsoft’s AppSource for additional visuals has always enabled me to meet all requirements to date. The solution then offers scripting solutions in R or Python to create custom visuals.
That being said – there are already some great resources of design done well with COVID-19 data, mainly via JHU. I have already mentioned the Data Stories Gallery – but lets face it, it’s each to their own opinion considering the content to be of visually appealing, engaging, informative and accurate – there are some great, and not so great.
So, to more engaging examples – two of the best I consider are Financial Times and Information is Beautiful.
Financial Times offers a great “visual narrative of the spread of Covid-19”, updated regularly and has engaging charts and visuals that are not easy to reproduce in common BI tools
Information is Beautiful I actually found from a post on the Global Power BI Facebook group from Jorge Segarra mid-March, and have kept up with it since. The clue is in the title really, with excellent graphical representations of the data that engages and informs the consumer well. Again, not easily reproduceable in BI tools. As per FT – the data is from JHU and other reliable sources.
Purely out of professional curiosity, and to test some methods of using Power BI to connect to Web sources, which I don’t get to use often, I have also developed a report using Power BI using the JHU CSSE, GOV.UK, RSS feeds and even Wikipedia to build a simple data model and set of report pages.
I considered setting up an Azure Data Factory pipeline to source, process and schedule the updates into a SQL DB – but am now glad I didn’t given the straightforwardness of JHE & GOV.UK sources – and the fact there have been frequent changes to the structure and accuracy of the data, even though it does get corrected at source. Power BI does enough with transformation steps in Power Query to get the data in a suitable format.
If you are not a developer, you probably will not need to read on – so to see my full report – check out this link.
Create an automated Power BI Dashboard to visualise COVID-19 data
If you haven’t used Power BI before – there are some great resources from Microsoft Guided Learning to get you started, the purpose of this article is not to teach end to end, more the overall high level steps. If you would however like training/coaching directly, please feel free to get in touch.
The main challenge of using web sources, as mentioned above but mainly is that the data is outside of my control, so I need to ensure the data model is flexible enough to be able to respond to any changes without making wholesale or complex changes to the model later. Therefore, the approach aims to keep the queries as basic as possible.
The primary objective is to connect to accurate, authoritative COVID-19 open source data set(s) that can be refreshed, preferably automatically, and ideally (now available) direct from the UK government which exactly matches source and what you might see on the news to empower trust in the approach:
Secondary objectives are to design a report expanding on what JHU achieved then taking inspiration from other reports I have seen and take them a bit further incorporating World populations and RSS connectivity to News feeds. No doubt I will probably add new sources once areas like testing and geographical sources become more mature and commonplace.
Third and last is to incorporate the design from my Free Power BI templates and ‘theme’ it to reflect COVID-19 specific content to show how designs and themes can be used to align to a specific content, however disappointing the subject matter is…
Power BI – Get Data
Once you have Power BI Desktop open, you can connect to various sources. For this abridged version, we will connect to JHU Time Series dataset.
The other purpose of this article to provide generic advice for acquiring data from online sources, so you can re-use the same principles/steps for any other source.
If you do not need to follow all these steps, and just want to jump to the quick fix – scroll down to the end of each section and grab the code.
First, we get the data
Then we can add the URL of the raw CSV file itself. We will need to repeat this for the three separate files we need:
We will use techniques to combine the files for ease of data modelling.
I would advise following the steps in the quick video below, as the URLs have changed before as the Time Series files were renamed some time ago. Grab the URL of the file from the Raw selection:
You can then paste the URL of the file into Power BI:
NOTE: If you wish to follow the steps below and combine the three files, you will need to load the Deaths and Recovered first – then append these to the confirmed using the quick-fix code at the end of this section.
You will need to provide credentials to connect to the source. As this is a Public resource, we can use the Anonymous option and Connect
Here we get a Preview of the data source. I always Transform Data when bringing it into Power BI – do not just load. This allow you to preview the data, apply transformations and check data types.
As this is a raw CSV file, there are quite a few steps we will need to apply and repeat to get the data in a useable format for our Report. Check the File Origin and Delimiter relevant for your source data
Once we have Transformed the data, it is now in the ‘Power Query’ window of Power BI. This is where the data magic happens up front, before we can do any visuals, Report pages, Dashboards etc. This is the most important part of BI – the Data Model.
Without going into every individual step, the general considerations for our source data is that the CSV file is time based, and a new column is added in the file to the far right when looking at the columns to include the latest day’s data. We also get the Province/State, Country/Region and Latitude and Longitude to geolocate the daily values.
This is the same in across the Confirmed, Deaths and Recovered CSV files.
Therefore, we can make our data modelling life easy by applying common steps across the three files and combine them.
First step is promoting the headers of the file – notice the columns are called Column1, Column2 etc. What we want is the second row as Headers.
We can use a feature to do this for us:
Power BI does generally automatically apply a step of Changed Type to read the previewed rows and apply what it thinks the columns should be. In any data set, this is where the source data needs to be known by the developer or at least reviewed manually to ensure there are no ‘funny’ values in the four hundred and forty thousand and third row in the data that is actually a text value, instead of a date for example.
From using the JHU data – the data types are pretty consistent, and we know that as it is used the world over, it is curated and checked well and considered accurate, and has been changed if incorrect historically. Therefore, we can trust it throughout – and any errors will be shown to us when we load the data.
Next – in a typical data model, having the dates as individual columns is not useful for us – we would have to make a measure for each column equalling a date, and add a new one each day. No chance.
What we need to do is Unpivot all other columns except for the first four columns – leaving us with an Attribute (the Date) and a Value (the numerical Confirmed, Deaths or Recovered for the given Date).
As we are going to combine the three file types, we need the Source of the file to be added to each row – so we can use it as a Filter context or calculation for KPIs. This example is Confirmed, we will repeat for Deaths and Recovered.
As the source data has US date types (M/DD/YY), and I being based in the UK need a more readable and system compatible date type of DD/MM/YY – there are a few steps we need to apply, which you can see in the final quick-fix entry at the end of this section. This is the result
We will use the Confirmed data source as the ‘base’ table to append the same steps in Deaths and Recovered. I have named my tables
- COVID-19 = Confirmed
- COVID Deaths
- COVID Recovered
Then on the left-hand side menu of Power Query, I added my queries to a new Group for ease and tidiness – you can do this by Source, Data type etc depending on your data model and number of source tables. Right click a table from the left menu, Move to Group – here you can add to an existing Group or create a New Group. I named the base table in such a way that I can hide the Deaths and Recovered from the Report view, and have a user-friendly name for the base table in the Report view
As the latter two queries are going to be appended to the base table – we can disable the load of these two tables by right clicking on the Deaths and Recovered tables, then un-ticking the Enable load feature. When the base table refreshes, it will pull through the full query from each regardless
Now we can append the Deaths and Recovered to the base ‘COVID-19’ query table using the Append feature. Select the base table first and ‘Append Queries’ rather than ‘Append Queries as New’, which would create a new table.
Then move the three tables to the right by Adding them
This now provides us with a clean 7 column dataset. When the source data updates daily, and a new column is added and any history corrected – we will still only have 7 columns, just a taller table every day.
In a later iteration of development, I wanted to see the impact of cases against the Population of a Country – I had not seen this done anywhere else at the time. At the time I first starting working on this, Italy was just starting to surpass China in terms of total number of cases – but clearly this wasn’t relative to the size of the Country’s Population – so therefore I needed a lookup for Populations.
Wikipedia Country Populations
Whilst I would not typically condone Wikipedia as a production source of data – there are few authoritative single sources of Global Populations from multiple census or other methods, and this serves to satisfy my curiosity from a COVID-19 perspective and show how a Web Page source can be used as a data source for this article, ideally from a reputable data source with confidence in the accuracy of data and published in a useable format.
Following the same steps above, we can get Populations from this page:
Which gives us a table of data for ‘Sovereign states and dependencies by population’:
What we are looking for is the Country and Population column – we do not really need anything else from this table. This will serve as our lookup dimension of ‘all’ Countries – assuming all are present and correct.
When we connect to this in Power BI, following above steps – the recently updated connector allows us to see a preview of the table itself, correctly named – as the webpages HTML provide Power BI with the named ranges.
After importing the first HTML table, there is a lot of re-work we can do to remove the citation links and suffixes in the Country names. See the full code in the quick-fix sections. Notice from the above to the below, the [b] has been removed from after China in the first row.
This leaves us with a lookup table for countries:
Last step in Power Query between the JHU and Wikipedia data is to link the Countries. As we do not have a code, GUID or reference number – and we just have to text match the Country name, there is a bit of work to do to match them.
For example, the JHU data has a Country label of ‘US’ for the United States, normal for a source created by a US team – but the Wikipedia lookup is labelled as ‘United States of America’ – so we need to match them.
I ended up doing quite a few rows of matching, and research to ensure outlying smaller islands etc matched up to the ‘mother country’ (many for France) and today a few new Countries and Sources have been added to the JHU countries that cannot be matched to Wikipedia – Cruise Liners such as the Diamond Princess cruise liner for example is its own source in JHU data, but of course not a Country. So, this will continue to be a manual task of reviewing the JHU vs Wiki data, and adding a text replace step as needed – again, not a production proof solution – but this type of activity is common data modelling and should always be rectified in the source system/data normally, full code in the quick-fix section.
We can now make a Relationship between the Country (Wikipedia) column and the Country/Region (JHU) column:
As you can see in the Relationship diagram, there are a few columns which I have hidden in the Report view (greyed out and a small eye symbol) – I do this so if there was another user that was going to either use this PBIX file or edit in the Power BI service – such as a colleague of mine, or handed over to a Client – they would only be directed to use the ‘master’ or parent Column of the Relationship, as any other table sources with similar attributes should also be joined to the parent column – and that is the one used in the Reports for Visuals, Slicers, Filters etc. Any other data sources will then correctly filter or slice as required against the parent column.
The eagle eyed will also notice I created a Hierarchy from the base COVID-19 table, with Province/State underneath Country/Region. This is only used where the Province/State is used from the COVID-19 table, as there is no equivalent Province/State column in the now parent Wikipedia lookup table. This would only be used for visuals for COVID-19 table data only, as its unlikely the COVID-19 table will be used as a parent in any new relationship joins.
Power BI Report Design
There you go – if you have made it this far, we now have a very basic data model from two online sources, JHU and Wikipedia. We will briefly cover the other data sources which I have brought into date at the end of the article, and include the quick-fix code.
In terms of design – I have implemented many different styles and designs over the last few years in different organisations.
I ended up a few months ago pulling these concepts together, alongside inspiration from Microsoft’s designs shared on the Power BI blog. With permission, I have extended what Chris Hamill and Miguel Myers contributed to in the September 2019 Sales & Returns example report.
I have used my Free Power BI Template as the base design, and incorporated an iStock Photo background to make it specifically COVID-19 recognisable. The design was achieved in Photoshop, but this can be replicated in Power Point.
Whilst many developers have done some extensive work on creating detailed Report pages, time series analysis and even predictive modelling using Python and Machine Learning – I have only dabbled with this concept to date and will keep extending and updating posts as and when I extend the concept.
I couldn’t however resist looking at the time series view and incorporating a Ribbon chart and the latest Slicer Hierarchy feature in Power BI (left of the image) to test the underlying alignment of cases by all associated Countries of the United Kingdom globally
The last feature to include was based on a Client’s similar requirement – to get up to the minute links to reputable global News feeds. This concept is a distillation of the approach. Production solutions may include using Microsoft Flow (Power Automate) or Azure Logic Apps to productionise the RSS feeds into SQL databases or Power BI datasets, enabling the storing of history and links to content.
Most major websites may have an RSS feed, either from Blog or news content. If the globally recognised standard of RSS is followed by the provider, then common data types and column names are provided, making it straightforward for developers to merge data sources.
Having initially produced this as a similar approach of appending queries, I wanted to showcase using Microsoft Flow (sorry, Power Automate – not used to the product name change yet!) to push the entries into a Power BI dataset using the REST APIs.
Firstly, we need to setup a Steaming Dataset in the Power BI service. You will ideally have a Workspace dedicated to the data subject. In this example, my Workspace is aptly named COVID-19 on my organisations tenant.
NOTE: Data from Live/Streaming/Power BI Data Flows cannot be combined with other data sources, so it must be a standalone Report – ensure you consider your reporting concept if you are planning to include Streaming data in your report. Therefore, for the purpose of the article demonstration, we will also push the rows into a SharePoint List. You could also push to a SQL database for Direct Querying the data to enable Realtime reporting!
Power BI Streaming Dataset
With the new look Power BI service enabled, the screens may look slightly newer than you may be used to if you are an avid user of Power BI in early 2020 – this has only just been released early April 2020.
From the target Power BI Workspace, we can create a ‘+ New’ selection of Streaming Dataset:
Name the dataset appropriately, and add the following fields as text – note I could apply the ‘publishDate’ as a Date type – but as per my own advice, I had reviewed the datasets manually for some time in power Query – and there are variants of Date types from different publishers, some have DateTime and some have Date/Time/Timezone – so by bringing all sources in as Text, we can handle variants in Power Query later.
The preview at the bottom of the panel shows the JSON format of the dataset. Once complete, you can save it by pressing Done
We will create a similar dataset in SharePoint – as two of the column names are already reserved in SharePoint, namely Title and ID – we will use the Title column and create an id_rss column.
As we only have a few columns, I have added them manually using the ‘+ Add column’ option from the list headers. Ensure you set the summary, primaryLink and logo as ‘Multiple Lines of Text’ to accommodate longer URLs.
Microsoft Flow (Power Automate)
Our datasets are now ready for data. We can use Flows to ‘listen’ for new feeds from the source RSS URL, then ‘push’ the rows of the RSS post into a new row in the dataset.
Once you have logged into Power Automate using your organisational account (assuming you have a suitable Licence) – you can create a new Flow from the main navigation. You want an Automated – from blank, so it runs itself.
Give your Flow a name, and choose your Flow’s trigger by searching for RSS ‘When a new feed is published’
You can then build the steps of adding the URL of the RSS feed, configuring the Power BI push rows and SharePoint List create item to accommodate new RSS feed posts. Notice the ‘Add rows to dataset’ (Power BI) and ‘Create item’ (SharePoint) actions are within a ‘Apply to each’ control – for each value in the RSS feed, the action will push a new row into both datasets.
Also notice the Logo URL and source is text, and I have self-sourced the URLs from a source that I would trust will be available for the foreseeable (Wikipedia, again – I really don’t trust it that much so am being a bit of a hypocrite here!). Ideally, for a production scenario – you would use an image self-hosted, ether from your organisation’s website, repository, or Azure blob storage (my preference).
The orange RSS options that are entered in each box are from the original Trigger’s option set, you select each one from the pop-out menu as you go – being careful to select the correct option if naming is similar.
The last step when importing the data from SharePoint or SQL – is to filter the data on the Title to only include the Feeds that are relevant – most consistently, the titles in this use case include ‘coronavirus’ rather than ‘COVID-19’ or variants of ‘Covid’ – so I included an OR statement to filter the title for all appropriate feeds, as the root RSS feed is either all the producers content, or a Health specific feed. I did not need to see Saudi Oil posts in the output for this case 😊
After bringing the data in – from SharePoint for my COVID-19 report, so I can blend it with the JHE and Wikipedia data sources, I did some filtering as mentioned – and ended up with a simple extract – from an RSS feed, refreshable in the Power BI service up to 8 times a day with a Pro licence, or 48 times a day with Premium service. Had I pushed the rows into a SQL database – I can use Direct Query to keep it real time.
NOTE: Setting the Data Category against the columns for the Logo as ‘Image URL’ and Link as ‘Web URL’ allows the data to be presented in a table visual to show the logo image and provide the user a direct link to the feed. Setting the Publish Date to Date/Time/Timezone works for the meantime, until perhaps one of the publishers changes the date type – so I could use the Created column from SharePoint, or add a custom function in Flow to add the current DateTime when Flow adds the record to the Streaming Dataset.
The disadvantage of using a Power BI Streaming dataset is that the column types cannot be changed, or set as certain types within the Edit mode of creating a report in the Power BI service, or in Power BI Desktop – so data types should be configured in the Flow where possible. But where your reporting requirements do not require ‘special’ column types to produce a straightforward Streaming Dashboard or Report, then this is still a valid method – and the principles of sharing this approach is simply to showcase what can be done. Again, a SQL database or SharePoint list is preferable, so we have control over the data types and refresh frequency.
There you have it – a few technical methods in Power BI, Flow (Power Automate), SharePoint and Power BI service to create a BI report – that meets the objectives and is extensible over time.
Since beginning to write this post, I have also added a manual dataset from GOV.uk (bearing in mind I live in the UK) – which has been used to match the values seen on the news etc. as the JHU data values don’t match the values shown in cabinet news briefings or in media, I am assuming this is due to JHU getting the data from ECDC not GOV.uk. Therefore, some images above have been replaced with GOV.uk data and report pages included in the report.
Next step is to automate the acquisition of data from GOV.uk using other tools, such as Python or Selenium – as the data can only be acquired by pressing a hyperlink on the web page, or until such a time as the data is supplied in a repo or direct URL link. There are many online tools that can do this, but I prefer to have complete control over the automation, processing and storing of data.
So, we are left with a few simple report pages, matching authoritative sources in a COVID-19 specific design.
I hope this may have helped you in some way, please feel free to get in touch or feedback. And use the Power Query Advanced Editor code samples below.