# Entry

Entry is a collection of variables. An entry name is proceeded by one or multiple pound signs #, in the form of:

# <entry name>
- <key 1>: <value 1>
- <key 2>: <value 2>
...
- <key n>: <value n>

# Example in previous examples are the headers of Level-1 Entries, as denoted by the single leading pound sign. All REAM files start with a Level-1 Entry, and contain exactly one Level-1 Entry.

Entries are useful when describing an object with multiple attributes:

# Country
- name: Belgium
- capital: Brussels
- population: $11433256$
- euro zone: `TRUE`

Let's add some annotations.

# Country
- name: Belgium
  > short for the Kingdom of Belgium
- capital: Brussels
- population: $11433256$
  > data from 2019; retrieved from World Bank
- euro zone: `TRUE`
  > joined in 1999

Entries should have local unique keys. The following code will raise error:

# Country
- name: Belgium
- language: Dutch
- language: French
- language: German

Known Issue

The current parser don't check for duplicate keys, so technically this is still valid. This rule will be enforced in future versions.

# Subentry

Entries can be nested, and the level of the entry is denoted by the number of leading pound signs. So a Level-1 Entry takes the form of # <Level 1 Entry Name>, and a Level-2 Entry takes the form of ## <Level 2 Entry Name>, and so forth.

Examples:

# Country
- name: Belgium

## Language
- name: Dutch

## Language
- name: French

## Language
- name: German

The # Country entry has one variable name and three Level-2 child entries ## Language. The three ## Language subentries are also known as the terminal nodes as they do not contain any subentry. When compiling the dataset, the parser look for all terminal nodes in the REAM file and flatten the data structure. Thus the previous example produces a dataset with three rows (one for each terminal node) and two columns (one of each variable).

Note that the variable keys are scoped, so ## Language is allowed to have a variable with the key name despite its parent entry # Country also contain a variable with the same key.

Entry must be nested in order. Level-2 Entries can only be nested in a Level-1 Entry, and Level-3 Entries can only be nested in a Level-2 Entry, and so forth. Compare the datasets compiled from the following two examples with the previous one:

# Country
- name: Belgium

## Language
- name: Dutch
  > This is in a Level 2 Entry

### Language
- name: French
  > This is in a Level 3 Entry

### Language
- name: German
  > This is in a Level 3 Entry

# Country
- name: Belgium

## Language
- name: Dutch
  > This is in a Level 2 Entry

## Language
- name: French
  > This is in a Level 2 Entry

### Language
  > This is in a Level 3 Entry
- name: German

A visualization of the differences between the three schemas are as follows. The terminal nodes are colored yellow.

tree

A level can contain subentires of differenct classes:

# Country
- name: Belgium

## City
- name: Brussels

## Language
- name: Dutch

Also, entries of the same class need not have identical variables, nor the same variable order.

# Country
- name: Belgium

## Language
- name: Dutch
- size: $0.59$

## Language
- size: $0.4$
- name: French

## Language
- name: German

Observe that the order of the variables are preserved by default.

The datasets compiled by the last two examples are not too useful for analysis. To compile quality analysis-ready datasets, we should specify the schema of the datasets in the codebook.

← Annotation Codebook →