Reframe Documentation

Reframe is a simple relational algebra implementation. It allows you to experiment with relational operators in Python. The relational operators provided in Reframe use an object oriented notation. That is relation.operator() In keeping with relational algebra all operators return a relation. This allows you to chain together operators in a pipeline: relation.operator().operator().operator()

class reframe.Relation(filepath=None, sep='|')

Create a Relation from a csv file of data for use with relational operators

This module is designed for educational purposes, specifically teaching and experimenting with relational algebra. To that end, you can create a relation from a file of data typically a csv file, although you can also specify a separator, for example a vertical bar may be better in some cases.

This module is built on top of the pandas system in in many cases is just a thin shell.

  • parameters:
Parameters:
  • filepath – a string specifying a path to a csv file, OR a Pandas DataFrame to convert to a Relation
  • sep – specify a separator for the data file. default is |
cartesian_product(other)
extend(newcol, series)

Create a new attribute by combining or modifying one or more existing attributes

Parameters:
  • newcol – Name of the new column to create
  • series – An expression involving one or more other attributes
Returns:

Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.extend('gnpdiff',country.gnp - country.gnpold).project(['name','gnpdiff']).head(10)
                   name  gnpdiff
0           Afghanistan      NaN
1           Netherlands    10884
2  Netherlands Antilles      NaN
3               Albania      705
4               Algeria     3016
5        American Samoa      NaN
6               Andorra      NaN
7                Angola    -1336
8              Anguilla      NaN
9   Antigua and Barbuda       28
>>>
groupby(cols)

Collapse a relation containing one row per unique value in the given group by attributes.

The groupby operator is always used in conjunction with an aggregate operator.

  • count
  • sum
  • mean
  • median
  • min
  • max
Parameters:cols – A list of columns to group on
Returns:A GroupWrap object for one of the aggregate operators to work on.
Example:

How many countries are in each continent?

>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.groupby(['continent']).count('name')
       continent  count_name
0         Africa          58
1     Antarctica           5
2           Asia          51
3         Europe          46
5        Oceania          28
6  South America          14
intersect(other)

Create a new relation that is the intersection of the two given relations

In order to compute the intersection the relations must be union compatible. That is they must have exactly the same columns. This may require some projecting and renaming.

Parameters:other – The relation to compute the intersection with.
Returns:
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.query('continent == "Africa"').project(['name', 'region']).intersect(country.query('region == "Western Africa"').project(['name', 'region']))
             name          region
0           Benin  Western Africa
1    Burkina Faso  Western Africa
2          Gambia  Western Africa
3           Ghana  Western Africa
4          Guinea  Western Africa
5   Guinea-Bissau  Western Africa
6      Cape Verde  Western Africa
7         Liberia  Western Africa
8            Mali  Western Africa
9      Mauritania  Western Africa
10          Niger  Western Africa
11        Nigeria  Western Africa
12  Côte d'Ivoire  Western Africa
13   Saint Helena  Western Africa
14        Senegal  Western Africa
15   Sierra Leone  Western Africa
16           Togo  Western Africa
>>>
minus(other)

return a relation containing the rows in self ‘but not’ in other

Parameters:other
Returns:
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.query('continent == "Africa"').minus(country.query('region == "Western Africa"')).project(['name','region','continent'])
                                      name           region continent
4                                  Algeria  Northern Africa    Africa
7                                   Angola   Central Africa    Africa
27                                Botswana  Southern Africa    Africa
34                                 Burundi   Eastern Africa    Africa
39                                Djibouti   Eastern Africa    Africa
43                                   Egypt  Northern Africa    Africa
45                                 Eritrea   Eastern Africa    Africa
47                            South Africa  Southern Africa    Africa
48                                Ethiopia   Eastern Africa    Africa
53                                   Gabon   Central Africa    Africa
87                                Cameroon   Central Africa    Africa
91                                   Kenya   Eastern Africa    Africa
92                Central African Republic   Central Africa    Africa
97                                 Comoros   Eastern Africa    Africa
98                                   Congo   Central Africa    Africa
99   Congo, The Democratic Republic of the   Central Africa    Africa
110                                Lesotho  Southern Africa    Africa
112                                Liberia   Western Africa    Africa
113                 Libyan Arab Jamahiriya  Northern Africa    Africa
117                         Western Sahara  Northern Africa    Africa
119                             Madagascar   Eastern Africa    Africa
121                                 Malawi   Eastern Africa    Africa
126                                Morocco  Northern Africa    Africa
130                              Mauritius   Eastern Africa    Africa
131                                Mayotte   Eastern Africa    Africa
138                             Mozambique   Eastern Africa    Africa
140                                Namibia  Southern Africa    Africa
162                      Equatorial Guinea   Central Africa    Africa
167                                Réunion   Eastern Africa    Africa
169                                 Rwanda   Eastern Africa    Africa
171                           Saint Helena   Western Africa    Africa
178                                 Zambia   Eastern Africa    Africa
181                  Sao Tome and Principe   Central Africa    Africa
184                             Seychelles   Eastern Africa    Africa
189                                Somalia   Eastern Africa    Africa
191                                  Sudan  Northern Africa    Africa
194                              Swaziland  Southern Africa    Africa
199                               Tanzania   Eastern Africa    Africa
206                                   Chad   Central Africa    Africa
208                                Tunisia  Northern Africa    Africa
213                                 Uganda   Eastern Africa    Africa
230                               Zimbabwe   Eastern Africa    Africa
234         British Indian Ocean Territory   Eastern Africa    Africa
njoin(other)

Create a new relation that is the intersection of the two given relations

In order to compute the intersection the relations must be union compatible. That is they must have exactly the same columns. This may require some projecting and renaming.

Parameters:other – The relation to compute the intersection with.
Returns:
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.query('continent == "Africa"').project(['name', 'region']).njoin(country.query('region == "Western Africa"').project(['name', 'region']))
             name          region
0           Benin  Western Africa
1    Burkina Faso  Western Africa
2          Gambia  Western Africa
3           Ghana  Western Africa
4          Guinea  Western Africa
5   Guinea-Bissau  Western Africa
6      Cape Verde  Western Africa
7         Liberia  Western Africa
8            Mali  Western Africa
9      Mauritania  Western Africa
10          Niger  Western Africa
11        Nigeria  Western Africa
12  Côte d'Ivoire  Western Africa
13   Saint Helena  Western Africa
14        Senegal  Western Africa
15   Sierra Leone  Western Africa
16           Togo  Western Africa
>>>
project(cols)

returns a new Relation with only the specified columns

Parameters:cols – a list of columns to project
Returns:a Relation with duplicate rows dropped
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.project(['region','continent','name']).head(10)
                      region      continent                  name
0  Southern and Central Asia           Asia           Afghanistan
1             Western Europe         Europe           Netherlands
2                  Caribbean  North America  Netherlands Antilles
3            Southern Europe         Europe               Albania
4            Northern Africa         Africa               Algeria
5                  Polynesia        Oceania        American Samoa
6            Southern Europe         Europe               Andorra
7             Central Africa         Africa                Angola
8                  Caribbean  North America              Anguilla
9                  Caribbean  North America   Antigua and Barbuda
>>>

Note

Relations have no duplicate rows, so projecting a single column creates a relation with all of the distinct values for that column

query(q)

return a new relation with tuples matching the query condition

Parameters:q – a query string
Returns:a Relation
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.query('continent == "Antarctica"').project(['code','name'])
    code                                          name
232  ATA                                    Antarctica
233  BVT                                 Bouvet Island
235  SGS  South Georgia and the South Sandwich Islands
236  HMD             Heard Island and McDonald Islands
237  ATF                   French Southern territories
>>>
rename(old, new)

Rename old attribute to new

Parameters:
  • old – string, name of old attribute
  • new – string, name to change old to
Returns:

Relation

Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.project(['name']).rename('name','countryname').head()
            countryname
0           Afghanistan
1           Netherlands
2  Netherlands Antilles
3               Albania
4               Algeria
>>>
sort(*args, **kwargs)

sort the relation on the given columns

Parameters:
  • cols – A list of columns to sort on
  • ascending – Boolean, ascending=False implies a sort in reverse order
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.sort(['indepyear'], ascending=False).query('indepyear < 1200').project(['name','indepyear'])
               name  indepyear
159        Portugal       1143
29   United Kingdom       1066
180      San Marino        885
164          France        843
170          Sweden        836
200         Denmark        800
81            Japan       -660
48         Ethiopia      -1000
93            China      -1523
union(other)

Take two Relations with the same columns and put them together top to bottom

Parameters:other
Returns:
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.query('region == "Western Africa"').union(country.query('region == "Polynesia"')).project(['name','region'])
                  name          region
22               Benin  Western Africa
33        Burkina Faso  Western Africa
54              Gambia  Western Africa
56               Ghana  Western Africa
63              Guinea  Western Africa
64       Guinea-Bissau  Western Africa
89          Cape Verde  Western Africa
112            Liberia  Western Africa
124               Mali  Western Africa
129         Mauritania  Western Africa
144              Niger  Western Africa
145            Nigeria  Western Africa
149      Côte d'Ivoire  Western Africa
171       Saint Helena  Western Africa
183            Senegal  Western Africa
185       Sierra Leone  Western Africa
202               Togo  Western Africa
5       American Samoa       Polynesia
37        Cook Islands       Polynesia
146               Niue       Polynesia
157           Pitcairn       Polynesia
166   French Polynesia       Polynesia
179              Samoa       Polynesia
203            Tokelau       Polynesia
204              Tonga       Polynesia
212             Tuvalu       Polynesia
221  Wallis and Futuna       Polynesia
>>>
class reframe.GroupWrap(gbo, cols)

Wrapper for a DataFrameGroupBy object – invisible to end user

count(col)

Count the number of occurrences of a value in the column for a group.

Parameters:col
Returns:A Relation with the groupby column(s) and count for a single column
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.groupby(['continent']).count('name')
       continent  count_name
0         Africa          58
1     Antarctica           5
2           Asia          51
3         Europe          46
5        Oceania          28
6  South America          14
>>>
filteragg(res, col)
max(col)

Count the number of occurrences of a value in the column for a group.

Parameters:col
Returns:A Relation with the groupby column(s) and count for a single column
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.groupby(['continent']).max('gnp')
       continent  max_gnp
0         Africa   116729
1     Antarctica        0
2           Asia  3787042
3         Europe  2133367
4  North America  8510700
5        Oceania   351182
6  South America   776739
>>>
mean(col)

Count the number of occurrences of a value in the column for a group.

Parameters:col
Returns:A Relation with the groupby column(s) and count for a single column
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.groupby(['continent']).mean('gnp')
       continent       mean_gnp
0         Africa   10006.465517
1     Antarctica       0.000000
2           Asia  150105.725490
3         Europe  206497.065217
4  North America  261854.789189
5        Oceania   14991.953571
6  South America  107991.000000
>>>
median(col)

Count the number of occurrences of a value in the column for a group.

Parameters:col
Returns:A Relation with the groupby column(s) and count for a single column
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.groupby(['continent']).median('gnp')
       continent  median_gnp
0         Africa      2533.5
1     Antarctica         0.0
2           Asia     15706.0
3         Europe     20401.0
4  North America      2223.0
5        Oceania       123.0
6  South America     20300.5
>>>
min(col)

Count the number of occurrences of a value in the column for a group.

Parameters:col
Returns:A Relation with the groupby column(s) and count for a single column
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.groupby(['continent']).min('lifeexpectancy')
       continent  min_lifeexpectancy
0         Africa                37.2
1     Antarctica                 NaN
2           Asia                45.9
3         Europe                64.5
4  North America                49.2
5        Oceania                59.8
6  South America                62.9
>>>
sum(col)

Count the number of occurrences of a value in the column for a group.

Parameters:col
Returns:A Relation with the groupby column(s) and count for a single column
Example:
>>> from reframe import Relation
>>> country = Relation('country.csv')
>>> country.groupby(['continent']).sum('surfacearea')
       continent  sum_surfacearea
0         Africa       30250377.0
1     Antarctica       13132101.0
2           Asia       31881008.0
3         Europe       23049133.9
4  North America       24214469.0
5        Oceania        8564294.0
6  South America       17864922.0
>>>

Implementation Notes

Reframe is built on top of Pandas DataFrames. In many cases the relational operators are very thin layers over regular Pandas operators. In other cases more convoluted wrappers have been created to preserve the relation.operator() –> relation

Indices and tables