1.3. What are personal data?

Personal data are any information relating to an identified or identifiable natural person. A person identifiable if they can be associated with the data. If the data cannot be associated or they are anonymised, i.e. processed in such a manner that identification is no longer possible, the data are not personal data and are not covered by data protection. Whether a researcher has to comply with data protection requirements or not, therefore, depends on the identifiability.

A person is directly identifiable if the data contain, for example, the name, personal identification number or another unique data unit. Direct identification is based on the available data and requires no additional data or knowledge. Indirect identification means that the link between a person and their data is not directly manifest but has to be created or inferred, for example, by combining several identifiers. Indirect identification may also be possible by combining different datasets. Such identification relies on assessment.

When a researcher starts to assess the identifiability of the data, it is not enough to look at the processed data. It is necessary to think a few steps further: what else can be done with the data if there is an interest in identifying or finding the people behind it? If there is a lot of background information, it can help make data attributable to the person, even if it appears anonymous at first glance.

The assessment of identifiability must take into account all reasonable and easily performed steps that can be taken to identify an individual. A person is not considered identifiable if the identification takes unreasonably much time, effort or means. Whether it is reasonable is assessed based on the resources required compared to the likelihood of identification. It is important to note that the line beyond which a person is no longer identifiable is not unambiguously clear and will be reassessed from time to time in light of new technologies and identification methods.

Generally, it is possible to distinguish three types of personal data related to research.

1. Survey respondent data are data collected from or about individuals, which need processing to achieve the objectives of the research; for example, recordings and transcripts of interviews, responses to surveys, observation and location data, results of experiments, health data, measurement results or other data relating to an individual.

2. Contact details and other organisational information are data relating to the participation of individuals in a survey; for example, the lists of respondents, their email addresses, telephone numbers, data concerning the location and time of the experiment, interview or another type of data collection, and written consent forms. These data are collected in the course of the research but are not strictly used to achieve the scientific objectives of the study. Nevertheless, they are personal data, and their processing must comply with data protection principles.

3. Data on researchers: several data on researchers may also be gathered during the study, such as the general data and contact details of researchers, data on their workload, salary and workrelated travel, or information on who collected and analysed the data, when and how. Biographies submitted in research project proposals also contain personal data. The data on many researchers are publicly available in ETIS or on the website of the research institution.

The university’s data protection wiki includes a checklist covering all stages of data processing in research. Before starting a survey, it is useful to estimate whether all the conditions for the secure and relevant processing of personal data have been met.

A natural person whose data are processed is called a data subject.

1.3.1. Special categories of personal data

Besides personal data, Article 9 of the GDPR distinguishes special categories of personal data:

data revealing racial or ethnic origin, political views, religious or philosophical beliefs or trade-union membership,
genetic data,
biometric data that are used to identify a natural person uniquely,
health data, or
data concerning a natural person’s sex life or sexual orientation.

Generally, the processing of special categories of personal data is prohibited, i.e. the processing limitation applies. Such data may only be processed based on a person’s consent or other exceptions provided in Article 9 (2) of the GDPR. There must also be a legal basis for processing.

Examples of the use of special categories of personal data in research

Special educational needs. When researchers want to study students’ special educational needs, it may be difficult for them to determine whether these are special categories of personal data. That is certainly the case when special educational needs arise from health reasons (for example, linked to a medical diagnosis or disability). When, however, the special needs are a talent or a communication or learning disability, they may not be related to the child’s health. In this case, it is not a special category of personal data. Nevertheless, special educational needs can be considered a more sensitive data category than usual, especially for children.

The final assessment depends on the specific data processing and its purpose. If the aim is just to make generalisations and the researcher is not interested the cause of the special need, it is not a special category of personal data. For example, if students A and B in one class allegedly have (an unspecified) special educational need, but students C, D, and E do not, the fact of the special need is taken as such, and it is not considered a special category. However, if the research focuses on the performance of pupils in connection with a specific and explicitly formulated special need (for example, how speech impairment affects learning motivation or what kind of support pupils with a physical disability need), it is processing special categories of personal data.

If researchers cannot fully control the amount of data collected (for example, the researcher does not know what the interviewee answers or what is written in response to an open-ended question), they may end up collecting special categories of personal data even if the original purpose was to investigate special needs (such as talent) that do not fall under the special categories. It is therefore advisable, as a precautionary measure, to treat all special educational needs as special categories of personal data, especially if it cannot be excluded that the research will also analyse the causes of the special needs. If it is known that this will not be done, it should be clearly stated in the survey plan and the information given to the respondents. Researchers can also formulate data collection questions in a way that does not encourage anyone to share health data.

Health indices. Indices and other complex measures that can be used to draw conclusions about a person’s health are considered health data. They should therefore be regarded as special categories of personal data, and the same applies to calculating the indices. For example, if the study aims to calculate a body mass index, associate it with a person and thereby obtain new information about the person’s health, the calculation of the index must be considered processing of a special category of personal data. Weight and height do not belong to special categories of personal data.

1.3.2. IP addresses

In its judgment, C-582/14 of 19 October 2016, the European Court of Justice ruled that dynamic IP addresses that change with each connection to the internet constitute personal data. This interpretation is based on the GDPR’s definition of personal data as “any information relating to an identified or identifiable natural person; an identifiable natural person is one who can be identified, directly or indirectly”.

The European Court of Justice notes that “a dynamic IP address does not constitute information relating to an ‘identified natural person’, since such an address does not directly reveal the identity of the natural person who owns the computer from which a website was accessed, or that of another person who might use that computer” (p. 38). However, an IP address makes it possible to identify a person indirectly. The European Court of Justice explains that all the information enabling the identification of the data subject does not have to be in the hands of one person (p. 43). That means that the internet service provider may be asked to provide additional data, after which it will be possible to identify the person.

In addition, it should be assessed how reasonable and likely the possibility of indirectly identifying an individual is when combining the different data. According to the court, it is, for example, reasonable and likely that an internet service provider will transfer its customer’s data to a competent authority (for example, in the case of cyberattacks) to protect the rights of individuals or comply with legal obligations. It is also possible that a person who knows the dynamic IP address of a potential infringer, in order to protect their rights or to comply with legal obligations, to take legal action before a court or other competent authority that can request the necessary data from the internet service provider to identify the infringer behind the IP address.

Reasonable and likely, therefore, include the possibility that a person is identified indirectly by several institutions or persons working together. An indirect identification is not reasonable and unlikely if it is prohibited by law or practically impossible and requires a disproportionate effort or cost (p. 46). Consequently, the researcher should be aware that if the respondent’s IP address is stored together with the data collected, for example, through an online survey or another online service, the respondent could be identifiable from the perspective of the data protection law. Various survey platforms allow researchers to configure the survey in such a way that the respondent’s IP address or other technical information, which would facilitate the respondent’s identification, is not collected (see 3.4.5).

Page tree

1.3. What are personal data?

1.3.1. Special categories of personal data

1.3.2. IP addresses