Course project: Snapshot of SARS-CoV 2 infections in Romania at 08 Oct 2020

As I mentioned in the previous article, the final step of a Python and Pandas course was to make an exploratory data analysis and visualization of a real world dataset. Since I already had the idea to do this eventually, and using the pretext of the final project I decided to take a snapshot of the Covid19 infections is Romania.

The entire project is available at this link, but since it’s full of (maybe noobly written) Python code, I’m going to summarize it here:

I started the project on 27 Sept 2020 and submitted on 10 Oct 2020 so some conclusions and thoughts are going to reflect that.

Total infections status at 10 Oct 2020

Daily evolution of new cases

Daily evolution of tests, new cases, and percentage of positive tests

This is a Plotly graph that can be zoomed, has hover interactions and many more options than is apparent. Viewing it from a desktop/laptop would have many benefits

As the black line in the above plot suggests, the percentage of positive tests appears to be rising (08.10.2020), although the number of tests have stayed relatively the same.

Total infections – age-group distribution (08 Oct 2020)


One of the things I wanted to find out if there was an increase of cases among school age-groups Only part of it is available in the dataset.

10-19 age group evolution over time

Even though the number of cases have begun to rise ever since July (fig 1 above), the 10-19 age group percentage (fig. 2) doesn’t seem to increase substantially at the time of writing this (2020-10-08).
Given the time it takes the virus show it’s effect, a future look at this graph might tell a different story.


The next thing I wanted to analyze was the percentage of population that got sick in each county.

Percentage of county population that got infected so far (08 Oct 2020)

And here’s an animated graph where you can see the evolution over time of the previously seen percentage.
Viewing it from a desktop/laptop would have many benefits. (check out the play button and date slider)

Evolution of the percentage of county population that got infected


But out of all observations I had about the data, what would end up to have the most interesting findings was the following question:
Is there a correlation between the county population and chance of getting infected?
To see that, we’re gonna snapshot the latest day and the end of the last 5 months and plot the distribution of population vs percentage of people infected.

County population vs percentage of infected per county

At the time of writing this (08.10.2020), as we can see in the scatterplots above, it appears to be a correlation between the county population and the percentage of people that got infected per county.
The greater the population, the greater the chance of infection.
The only outlier appears to be the capital, Bucharest, but we don’t have the full picture here since I haven’t taken into account the number of tests made.
Ignoring the tests made, if the above trendline and hypothesis is true, a major increase of the percentage of infected people from the capital’s population is due.


This analysis isn’t the most scientific or insightful but it provides a snapshot of current SARS-CoV-2 infections and a starting point for more in-depth ones that I’ll continue to make on this subject.