A Method for Evaluating the Risk of Exposure to COVID-19 by Using Location Data

One of the main reasons for the widespread dissemination of COVID-19 is that many infected people are asymptomatic. Consequently, they likely spread the virus to other people as they continue their everyday life. This em-phasizes the importance for targeting high-risk groups for the diagnosis of COVID-19 (with real-time PCR techni-ques). However, the availability of the necessary technology and resources may be limited in certain towns, cities or countries. Thus, the challenge is to determine a criterion in order to prioritize the suspected cases most in need of testing. The aim of the present study was to develop a method for evaluating the risk of exposure to COVID-19 infection based on geolocation data. The risk is expressed as a score that will be instrumental in optimally applying the COVID-19 test to suspected cases representing the highest probability of exposure. It can be easily and quickly implemented with easily accessible open source tools. A simulation was herein conducted with data from four people, assigning infection to one of them. The results show the feasibility of assessing the risk of exposure with the new methodology. Additionally, the data obtained might provide insights into the sometimes complicated patterns of virus propagation. entendimiento de los patrones de dispersión del virus.


INTRODUCTION
COVID-19 is spreading across the world at an alarming rate, making the global outbreak a significant public health problem [1] . One of the main reasons for the large-scale spread of COVID-19 is that an estimated 80% of carriers are asymptomatic or have mild symptoms [2] . Thus, many infected people could unknowingly spread the virus [3] [4] . Since the COVID-19 virus remains on surfaces and in aerosols for many days under certain conditions, people may get infected by touching contaminated objects long after the carrier has departed [5] .
Consequently, opportune testing of suspected cases is crucial for clinical management and outbreak control. According to the World Health Organization, the decision to test an individual should be based on clinical (symptoms) and epidemiological factors (contact with a confirmed case) associated with the likelihood of infection [6] .
For suspected cases, nucleic acid amplification analysis (RT-PCR) [6] is recommended for COVID-19 testing. In some towns, cities or countries, unfortunately, the resources for such tests may be limited. Therefore, a criterion is needed to determine when to perform a test once the growing number of cases begins to surpass the resources available. Moreover, in the event that people could assure themselves of a high risk of exposure to infection, they would be more prone to self-quarantine even if they are asymptomatic.
The aim of the present study was to develop a method for evaluating the risk of exposure to COVID-19 infection based on geolocation data. The risk is expressed as a score that will be instrumental in optimizing the application of the COVID-19 test to suspected cases representing the highest probability of exposure. To implement the method, a webpage is created for uploading the geolocation data of con-firmed COVID-19 cases of the previous days. The geolocation data of a suspected case is then entered into the same webpage in order to calculate an exposure risk score. The latter is computed based on the number of places where one or more confirmed cases and the suspected case were near each other within overlapping periods. A time window after the departure of the confirmed cases is contemplated to reflect the virus survival time.

Google location data
Google allows owners of a device (e.g., a smartphone) to request the history of location data at https://www.google.com/maps/timeline. The data is sent to the e-mail account of the user in the form of a zip file containing a folder for each year that Google has collected this information. Location data for each month is stored in a JSON file, which contains data for every day of the month in the form of data objects. The placeVisit objects register the places where the user stayed for some period of time. From these data-objects, the AOI can be retrieved.

Registering confirmed cases
A person confirmed to be infected with COVID-19 should provide the location data available in a JSON file. Subsequently, an authorized person (most likely a health authority in charge of the system) uploads the file into a web service offering cloud computing services. The web service reads and parses the JSON data corresponding to the AOI of the previous N days by analyzing each placeVisit object in the file. The values are registered into a relational database, storing only the places and times but not the identity of the confirmed case to conserve the confidentiality of the data.
Finally, it is possible to make queries by entering data of suspected cases into the web service.

Computing the exposure risk score
The JSON data of a person who wants to evaluate his or her risk of exposure is utilized similarly, extracting all the placeVisit objects within the last M days. The proposed method is based on the hypothesis that a person (suspected case) has a determined risk of COVID-19 infection whenever they were at the same place of a confirmed case in the same time lapse (or within a given time window following the departure of the latter).
Let AT S , DT S , AT C and DT C be the arrival and departure times of the suspected and confirmed cases, respectively. Examples are illustrated in Figure 1 of suspected cases at high, medium and low risk of exposure, depending on their arrival and departure times to a specific place. If there is an overlap of periods between the suspected and infected cases, a high risk of infection is assigned. When the arrival time of the suspected case is after the departure time of the confirmed case, the virus survival time must be taken into account. In the event that the arrival time is within the window of virus survival on surfaces, a medium risk of infection exists [5] . If the suspected case has a depar-

RESULTS AND DISCUSSION
The present method was implemented by using Python 3.7 in the Flask web development framework [7] . The geolocation data employed was extracted from the JSON files downloaded from Google and stored in a MySql database. The places where a suspected case was at risk of exposure are indicated in a map via folium [8] . The system was deployed into a virtual environment by means of Ubuntu 16.04. Risk scores were assigned for each place at which person B was near person A within the time window (Table 1). In the event that the arrival time of the suspected case was after the departure time of the confirmed case but within the virus survival time, the score is less than one. A map denoting two infection foci ( Figure 2) identifies the places where person B was at risk of exposure (the workplace and the mall). The total risk score for person B is R = 11.59, which is considerably high in the current scheme. Risk scores were designated for each place where person C was near person A within the time window (Table 2). As with person B, some of the place scores are less than one (for the aforementioned reason). The first two scores (k=1 and k=2) correspond to the same period for the confirmed case.   A map is shown with only one place highlighted (the workplace), at which person C was exposed to the risk of infection (Figure 3). A risk score was assigned for a single place and time, representing the occasion person A and person D were at the same party ( Table 3).
The risk score is R = 1 because there was only one

Implementation issues
There are two important issues to be considered for the implementation of the current methodology.
Firstly, the tracking of user locations records very sensitive data. Secondly, the possibility of entering false information into the system must be confronted.
Consequently, the proposed method should be implemented by a central health authority (HA), who could assure the validity of the data entered into the database and the anonymity of the users. The procedure for logging data into the system is illustrated with a flow diagram ( Figure 5). It is essential to emphasize that the computation of risk scores does not involve any information related to the owner of the device (e.g., smartphone), such as his/ her name, age, gender or address. Since public trust in the anonymity of the data collection procedure is crucial, the procedure for registering data must automatically discard any existing confidential information before uploading files to the database.
According to the proposed model, the right to upload data to the system requires a token provided by the HA to the confirmed case at the time he/she receives a positive diagnosis. The token will be generated independently of patient identity information to assure the anonymity of the data. If the confirmed case is connected with the Google Takeout service, he/she can download and review the data. The only factor that could possibly reveal the identity of the owner of the device is the location data itself. Therefore, the data collection procedure has to include an option for the interested party to remove any sensitive location data (e.g., the location of residence). After making any necessary adjustments, the user will upload the data into the system.

CONCLUSIONS
Quarantine and social distancing are recommended measures for helping to contain the epidemic [2] .
However, it may be complicated to impose quarantine