Frame2Package¶
Frame2Package is a helper library for converting Pandas dataframes to DDF packages.
f2p = Frame2Package()
f2p.add_data(data=df, concepts=concepts)
f2p.to_package('my-ddf-folder')
Quickstart¶
Installation
To install frame2package, simply run this command (using pipenv or pip):
$ pipenv install frame2package
Basic usage
import pandas as pd
import io
from frame2package import Frame2Package
# Load some sample data
data = """area,year,age,education,distribution
Bahamas,2000,15+,Total,1.0
Fiji,1970,80+,Total,1.0
Gabon,2025,20--64,Under 15,0.0
Brunei Darussalam,2045,All,Total,1.0
Thailand,1985,15+,Upper Secondary,0.07
"""
df = pd.read_csv(io.StringIO(data))
# Specify all the concepts in the dataset
# as per the DDF data format specification.
concepts = [
{
'concept': 'area',
'concept_type': 'entity_domain'
},
{
'concept': 'year',
'concept_type': 'time'
},
{
'concept': 'age',
'concept_type': 'string'
},
{
'concept': 'education',
'concept_type': 'entity_domain'
},
{
'concept': 'distribution',
'concept_type': 'measure'
},
]
# Initialize a Frame2Package object
f2p = Frame2Package()
# Add the data
f2p.add_data(data=df, concepts=concepts)
# Save the package
f2p.to_package('sample-dataset')
Usage¶
Data format¶
In order for frame2package to create a datapackage, your data needs to adhere to the tidy format:
- Each variable forms a column.
- Each observation forms a row.
- Each type of observational unit forms a table.
Concepts¶
In addition to the data itself, you also need to specify what types of variables the dataset consists of. The variables are referred to as concepts. Each concept has a name and a type.
concepts = [
{
'concept': 'country',
'concept_type': 'entity_domain'
},
{
'concept': 'gdp',
'concept_type': 'measure'
}
]
Feel free to add additional information about your concepts:
{
'concept': 'country',
'concept_type': 'entity_domain',
'description': 'ISO 3 code of countries'
}
Disaggregation levels¶
Some of your variables may have “total” entries. For instance, a sex column may have the unique values male, female, and both. In such cases, the DDF package will create separate datasets with and without the sex dimension. To enable this feature, specify what variables have “total” entries and what they are called in the data.
f2p = Frame2Package()
f2p.add_data(df, concepts, totals={'sex': 'both'})
Entities¶
For each concept of type entity_domain, frame2package will generate an entity file ddf–entities–<entity-name>.csv listing all the unique values for that entity.
If you want to add additional information about the values, such as a description column, create a dataframe with all the unique values for that entity and any additional columns and use .update_entity to add the information to the package.
# Assuming entity with name "sex" and values "M", "F", "B"
# First copy the entity data
sex = f2p.entities['sex'].copy()
# Add a description column
sex['description'] = ['Male', 'Female', 'Both']
# Update the entity in the package
f2p.update_entity(name='sex', data=sex)
API Documentation¶
Documentation of Frame2Package’s public API.