Puffbird - handle your puffy data!

Date: Jan 14, 2022 Version: 0.0.1

Useful links: Binary Installers | Source Repository | Issues & Ideas

puffbird is an open source, MIT-licensed library providing an extension to handling puffy pandas DataFrame objects. For creating the documentation for puffbird, I used pandas documentation style.

user guide for puffbird action icon
User Guide

New to puffbird? Check out the user guide. It contains installation instructions, an introduction to puffbird's main concepts and tutorials.

api of puffbird action icon
API reference

The reference guide contains a detailed description of the puffbird API. The reference describes how the methods work and which parameters can be used. It assumes that you have an understanding of the key concepts.

User Guide

Installation

Puffbird can be installed via pip from PyPI:

pip install puffbird

You can also clone the git repository and install the package from source.

Quick Start

The main functionality that puffbird adds to pandas is the ability to easily “explode” “puffy” tables:

In [1]: import pandas as pd

In [2]: import puffbird as pb

In [3]: df = pd.DataFrame({
   ...:     'a': [[1,2,3], [4,5,6,7], [3,4,5]],
   ...:     'b': [{'c':['asdf'], 'd':['ret']}, {'d':['r']}, {'c':['ff']}],
   ...:  })
   ...: 

In [4]: df
Out[4]: 
              a                              b
0     [1, 2, 3]  {'c': ['asdf'], 'd': ['ret']}
1  [4, 5, 6, 7]                   {'d': ['r']}
2     [3, 4, 5]                  {'c': ['ff']}

As you can see, this dataframe is “puffy”, it has various non-hashable object types that can be iterated over. To quickly create a long-format DataFrame, we can use the puffy_to_long function:

In [5]: long_df = pb.puffy_to_long(df)

In [6]: long_df
Out[6]: 
    index_level0  a_level0    a b_level0  b_level1     b
0              0         0  1.0        c         0  asdf
1              0         0  1.0        d         0   ret
2              0         1  2.0        c         0  asdf
3              0         1  2.0        d         0   ret
4              0         2  3.0        c         0  asdf
5              0         2  3.0        d         0   ret
6              1         0  4.0        d         0     r
7              1         1  5.0        d         0     r
8              1         2  6.0        d         0     r
9              1         3  7.0        d         0     r
10             2         0  3.0        c         0    ff
11             2         1  4.0        c         0    ff
12             2         2  5.0        c         0    ff

Tutorials

Philosophy

TODO

Creating long dataframes from puffy tables

This tutorial will show the basic features of using the puffbird.puffy_to_long method

[1]:
import numpy as np
import pandas as pd
import puffbird as pb
A weirdly complex table

First, we will create a puffy dataframe as an example:

[2]:
df = pd.DataFrame({
    # a and c have the same data repeated three times
    # b is just a bunch of numpy arrays of the same shapes
    # d is also just a bunch of numpy arrays of different shapes
    # e contains various pandas DataFrames with the same column structures
    #   and the same index format.
    # f contains various pandas DataFrames with different structures
    # g contains mixed data types
    # missing data is also included
    'a': [
        'aa', 'bb', 'cc', 'dd',
        'aa', 'bb', 'cc', 'dd',
        'aa', 'bb', 'cc', 'dd'
    ],
    'b': [
        np.random.random((10, 5)),
        np.nan,
        np.random.random((10, 5)),
        np.random.random((10, 5)),
        np.random.random((10, 5)),
        np.random.random((10, 5)),
        np.random.random((10, 5)),
        np.random.random((10, 5)),
        np.random.random((10, 5)),
        np.random.random((10, 5)),
        np.random.random((10, 5)),
        np.random.random((10, 5))
    ],
    'c': [
        {'dicta':[1,2,3], 'dictb':3, 'dictc':{'key1':1, 'key2':2}},
        {'dicta':[52,3], 'dictb':[3,4], 'dictc':{'key4':1, 'key2':2}},
        {'dicta':[12,67], 'dictb':(4,5), 'dictc':{'key3':1, 'key2':77}},
        {'dicta':[1,23], 'dictb':3, 'dictc':{'key1':55, 'key2':33}},
        {'dicta':123, 'dictb':'words', 'dictc':{'key1':4, 'key2':2}},
        {'dicta':[1,2,3], 'dictb':3, 'dictc':{'key1':1, 'key2':2}},
        {'dicta':[52,3], 'dictb':[3,4], 'dictc':{'key4':1, 'key2':2}},
        {'dicta':[12,67], 'dictb':(4,5), 'dictc':{'key3':1, 'key2':77}},
        {'dicta':[1,23], 'dictb':3, 'dictc':{'key1':55, 'key2':33}},
        {'dicta':123, 'dictb':'words', 'dictc':{'key1':4, 'key2':2}},
        {'dicta':[1,2,3], 'dictb':3, 'dictc':{'key1':1, 'key2':2}},
        {'dicta':[52,3], 'dictb':[3,4], 'dictc':{'key4':1, 'key2':2}},
    ],
    'd': [
        np.random.random((16, 5)),
        np.random.random((18, 5)),
        np.random.random((19, 5)),
        np.random.random((11, 5)),
        np.random.random((12, 5)),
        np.random.random((14, 5)),
        np.random.random((17, 5)),
        np.random.random((110, 5)),
        None,
        np.random.random((2, 5)),
        np.random.random((4, 5)),
        np.random.random((7, 5))
    ],
    'e': [
        pd.DataFrame(
            {'c1':[1,2,3], 'c2':[1,2,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            {'c1':[1,2,3,4], 'c2':[1,2,3,4]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            {'c1':[3,4,3], 'c2':[3,5,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c']],
                names=['a', 'b']
            )
        ),
        np.nan,
        pd.DataFrame(
            {'c1':[1,2,3], 'c2':[1,2,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            {'c1':[1,2,3,4], 'c2':[1,2,3,4]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            {'c1':[3,4,3], 'c2':[3,5,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c']],
                names=['a', 'b']
            )
        ),
        np.nan,
        pd.DataFrame(
            {'c1':[1,2,3], 'c2':[1,2,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            {'c1':[1,2,3,4], 'c2':[1,2,3,4], 'c3':[1,2,3,4]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            {'c1':[3,4,3], 'c2':[3,5,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c']],
                names=['a', 'b']
            )
        ),
        np.nan,
    ],
    'f': [
        pd.DataFrame(
            {'f1':[1,2,3], 'hh2':[1,2,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c'], ['f', 'f', 'f']],
                names=['f', 'b', 'e']
            )
        ),
        pd.DataFrame(
            {'hh1':[1,2,3,4], 'qq2':[1,2,3,4]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            {'q1':[3,4,3], 'qq2':[3,5,3], 'c3':[1,2,3], 'c4':[1,2,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c'], ['t', 't', 't']],
                names=['y', 'll', 'tt']
            )
        ),
        np.nan,
        pd.DataFrame(
            {'qq1':[1,2,3], 'rr2':[1,2,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            [[1,2,3,4], [1,2,3,4]],
            columns=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']],
                names=['rr', 'b']
            ),
            index=pd.MultiIndex.from_arrays(
                [[(1,2), (2,3)], ['a', 'b']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            {'cpp1':[3,4,3], 'c2':[3,5,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c']],
                names=['a', 'rr']
            )
        ),
        np.nan,
        pd.DataFrame(
            {'sr1':[1,2,3,4], 'c2':[1,2,3,4]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            {'cpp1':[3,4,3], 'c2':[3,5,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c']],
                names=['a', 'b']
            )
        ),
        pd.DataFrame(
            {'c1':[3,4,3], 'c2':[3,5,3]},
            index=pd.MultiIndex.from_arrays(
                [['a', 'b', 'c'], ['a', 'b', 'c']],
                names=['mm', 'b']
            )
        ),
        np.nan,
    ],
    'g': [
        'a', 'b', {'ff':'gg'}, {'a', 'b', 'c'},
        ('r',), pd.Series({'a':'b'}), 'a', 'b',
        1, 2, 3, 4
    ]
})

df
[2]:
a b c d e f g
0 aa [[0.9657556404566287, 0.6105982811179597, 0.71... {'dicta': [1, 2, 3], 'dictb': 3, 'dictc': {'ke... [[0.8332176695108217, 0.789958044060405, 0.294... c1 c2 a b a a 1 1 b b 2 ... f1 hh2 f b e a a f 1 1 b... a
1 bb NaN {'dicta': [52, 3], 'dictb': [3, 4], 'dictc': {... [[0.3132246893884286, 0.1335576065684959, 0.42... c1 c2 a b a a 1 1 b b 2 ... hh1 qq2 a b a a 1 1 b b ... b
2 cc [[0.2978295126291556, 0.2876008891935764, 0.85... {'dicta': [12, 67], 'dictb': (4, 5), 'dictc': ... [[0.06184828264219733, 0.18545094706008847, 0.... c1 c2 a b a a 3 3 b b 4 ... q1 qq2 c3 c4 y ll tt ... {'ff': 'gg'}
3 dd [[0.4963928696663038, 0.7239167468580807, 0.98... {'dicta': [1, 23], 'dictb': 3, 'dictc': {'key1... [[0.8993257213551965, 0.2031570590614975, 0.66... NaN NaN {a, c, b}
4 aa [[0.3031081932962224, 0.6450296792517578, 0.32... {'dicta': 123, 'dictb': 'words', 'dictc': {'ke... [[0.5945125943356216, 0.15692780064527623, 0.0... c1 c2 a b a a 1 1 b b 2 ... qq1 rr2 a b a a 1 1 b b ... (r,)
5 bb [[0.5167332905196971, 0.7313676911394652, 0.58... {'dicta': [1, 2, 3], 'dictb': 3, 'dictc': {'ke... [[0.6826944216406796, 0.06614703871524041, 0.4... c1 c2 a b a a 1 1 b b 2 ... rr a b c d b a b c d a ... a b dtype: object
6 cc [[0.708744333480539, 0.8321196509452965, 0.132... {'dicta': [52, 3], 'dictb': [3, 4], 'dictc': {... [[0.6154355882964945, 0.5022137513822291, 0.64... c1 c2 a b a a 3 3 b b 4 ... cpp1 c2 a rr a a 3 3 b... a
7 dd [[0.38385169069805636, 0.4351602576907261, 0.2... {'dicta': [12, 67], 'dictb': (4, 5), 'dictc': ... [[0.252611789308698, 0.04665515687154631, 0.04... NaN NaN b
8 aa [[0.8448759939899335, 0.3957149482929728, 0.70... {'dicta': [1, 23], 'dictb': 3, 'dictc': {'key1... None c1 c2 a b a a 1 1 b b 2 ... sr1 c2 a b a a 1 1 b b ... 1
9 bb [[0.19421217606406949, 0.7404434981173305, 0.5... {'dicta': 123, 'dictb': 'words', 'dictc': {'ke... [[0.7887921954495609, 0.07707935123200094, 0.5... c1 c2 c3 a b a a 1 1 ... cpp1 c2 a b a a 3 3 b b ... 2
10 cc [[0.5360890315477351, 0.11323478357643058, 0.7... {'dicta': [1, 2, 3], 'dictb': 3, 'dictc': {'ke... [[0.09228121163345293, 0.02214142758664739, 0.... c1 c2 a b a a 3 3 b b 4 ... c1 c2 mm b a a 3 3 b b ... 3
11 dd [[0.22093473502397243, 0.024389277765600403, 0... {'dicta': [52, 3], 'dictb': [3, 4], 'dictc': {... [[0.5195772094908402, 0.2494521739025667, 0.39... NaN NaN 4

So this dataframe is quite daunting, I like to call it a puffy table.

Exploding the data out that are in puffy tables

Now with puffy_to_long you can easily unravel this dataframe. Since this dataframe is weirdly constructed puffy_to_long may take a while:

[3]:
long_df = pb.puffy_to_long(df)
long_df.head()
[3]:
index_level0 a b_level0 b_level1 b c_level0 c_level1 c d_level0 d_level1 d e_level0_a e_level0_b e_level0_2 e f_level0_0 f_level0_1 f g_level0 g
0 0 aa 0.0 0.0 0.965756 dicta 0 1 0.0 0.0 0.833218 a a c1 1.0 0.0 b a NaN a
1 0 aa 0.0 0.0 0.965756 dicta 0 1 0.0 0.0 0.833218 a a c1 1.0 0.0 e f NaN a
2 0 aa 0.0 0.0 0.965756 dicta 0 1 0.0 0.0 0.833218 a a c1 1.0 0.0 f a NaN a
3 0 aa 0.0 0.0 0.965756 dicta 0 1 0.0 0.0 0.833218 a a c1 1.0 0.0 f1 1 NaN a
4 0 aa 0.0 0.0 0.965756 dicta 0 1 0.0 0.0 0.833218 a a c1 1.0 0.0 hh2 1 NaN a

Now we have a dataframe with only hashable elements. puffy_to_long iteratively exploded all cells and treated each column individually. For example, if a cell contains a numpy array that is two-dimensional then the column will be exploded twice and two new columns will be added that are called *[COLUMN_NAME]_level0* and *[COLUMN_NAME]_level1*. For our column b, we get two new columns called b_level0 and b_level1. These levels will contain the index corresponding to the data point in the long-format of column b. Let’s try puffy_to_long again but just on column b.

Exploding numpy.array-containing columns
[4]:
long_df = pb.puffy_to_long(df, 'b')
long_df.head()
[4]:
index_level0 b_level0 b_level1 b
0 0 0 0 0.965756
1 0 0 1 0.610598
2 0 0 2 0.710187
3 0 0 3 0.851220
4 0 0 4 0.264982

index_level0 is the previous index from our dataframe. If this were a pandas.MultiIndex, we would have multiple columns instead of just one that corresponds to the old index of the dataframe.

Now, we could take this normal dataframe objects and perform various operations that we would normally want to perform, e.g.:

[5]:
long_df.groupby('b_level0')['b'].mean()
[5]:
b_level0
0    0.489353
1    0.538500
2    0.502751
3    0.490095
4    0.505218
5    0.501370
6    0.474048
7    0.573405
8    0.491318
9    0.516571
Name: b, dtype: float64

Let’s say we want to explode both column b and column d that both contain numpy arrays. Imagine that we want to align the axis 1 of the data in b and d, and call this axes aligned_axis, we can do this with puffy_to_long keyword arguments:

[6]:
long_df = pb.puffy_to_long(df, 'b', 'd', aligned_axis={'b':1, 'd':1})
long_df.head()
[6]:
index_level0 b_level0 aligned_axis b d_level0 d
0 0 0.0 0 0.965756 0.0 0.833218
1 0 0.0 0 0.965756 1.0 0.060445
2 0 0.0 0 0.965756 2.0 0.620360
3 0 0.0 0 0.965756 3.0 0.610501
4 0 0.0 0 0.965756 4.0 0.671548

So now, axis 1 of b and d column data are aligned and those indices are defined in the aligned_axis column. Since both columns contain missing cells, some index_level0 values have missing b and b_level0 columns or missing d_level0 and d columns. You can view these easily using standard pandas functionality:

[7]:
long_df.loc[long_df['b_level0'].isnull()].head()
[7]:
index_level0 b_level0 aligned_axis b d_level0 d
10650 1 NaN 0 NaN 0.0 0.313225
10651 1 NaN 0 NaN 1.0 0.630630
10652 1 NaN 0 NaN 2.0 0.074653
10653 1 NaN 0 NaN 3.0 0.290296
10654 1 NaN 0 NaN 4.0 0.563310
[8]:
long_df.loc[long_df['d_level0'].isnull()].head()
[8]:
index_level0 b_level0 aligned_axis b d_level0 d
9950 8 0.0 0 0.844876 NaN NaN
9951 8 1.0 0 0.905452 NaN NaN
9952 8 2.0 0 0.567938 NaN NaN
9953 8 3.0 0 0.043224 NaN NaN
9954 8 4.0 0 0.736017 NaN NaN
Exploding pandas.DataFrame-containing columns

Let’s take a look at how pandas.DataFrame objects are handled within puffy_to_long by taking a look at column e:

[9]:
long_df = pb.puffy_to_long(df, 'e')
long_df.head()
[9]:
index_level0 e_level0_a e_level0_b e_level0_2 e
0 0 a a c1 1.0
1 0 a a c2 1.0
2 0 b b c1 2.0
3 0 b b c2 2.0
4 0 c c c1 3.0

pandas.DataFrame objects are handled within one explosion iteration, unless the cell within that dataframe are non-hashable. This is why all new columns contain level0. The first two new columns e_level0_a and e_level0_b correspond to the pandas.MultiIndex index defined in all dataframes within this column. e_level0_2 corresponds to all the columns names of the dataframe. e only contains the data within each cell of each dataframe.

Let’s say we don’t want to unravel our columns in this way but instead just concatenate them all together. We can use the expand_cols argument for this, which expects a list of column names that contain only pandas.DataFrame or pandas.Series objects:

[10]:
long_df = pb.puffy_to_long(df, 'e', expand_cols=['e'])
long_df.head()
[10]:
index_level0 a b e_c1 e_c2 e_c3
0 0 a a 1 1 NaN
1 0 b b 2 2 NaN
2 0 c c 3 3 NaN
3 1 a a 1 1 NaN
4 1 b b 2 2 NaN

Here we preserved the columns of each dataframe in each cell and simply concatenated the dataframes together with the index_level0 information preserved. What if we use this method while also exploding column a, since this has the same column name as the column in the dataframes within column e:

[11]:
long_df = pb.puffy_to_long(df, 'a', 'e', expand_cols=['e'])
long_df.head()
[11]:
index_level0 a a_e b e_c1 e_c2 e_c3
0 0 aa a a 1.0 1.0 NaN
1 0 aa b b 2.0 2.0 NaN
2 0 aa c c 3.0 3.0 NaN
3 1 bb a a 1.0 1.0 NaN
4 1 bb b b 2.0 2.0 NaN

Since the column a already existed, the column a within each dataframe within column e was renamed to a_e.

Of course, similarly, this is handled with column b:

[12]:
long_df = pb.puffy_to_long(df, 'b', 'e', expand_cols=['e'])
long_df.head()
[12]:
index_level0 b_level0 b_level1 b a b_e e_c1 e_c2 e_c3
0 0 0.0 0.0 0.965756 a a 1.0 1.0 NaN
1 0 0.0 0.0 0.965756 b b 2.0 2.0 NaN
2 0 0.0 0.0 0.965756 c c 3.0 3.0 NaN
3 0 0.0 1.0 0.610598 a a 1.0 1.0 NaN
4 0 0.0 1.0 0.610598 b b 2.0 2.0 NaN

Less structured dataframe-containing columns will result in more complex long-format dataframes:

[13]:
long_df = pb.puffy_to_long(df, 'f')
long_df.head()
[13]:
index_level0 f_level0_0 f_level0_1 f
0 0 0 b a
1 0 0 e f
2 0 0 f a
3 0 0 f1 1
4 0 0 hh2 1
[14]:
long_df = pb.puffy_to_long(df, 'f', expand_cols=['f'])
long_df.head()
[14]:
index_level0 level_1 f_f f_b f_e f_f1 f_hh2 f_a f_hh1 f_qq2 ... f_('a', 'a') f_('b', 'b') f_('c', 'c') f_('d', 'd') f_rr f_cpp1 f_c2 f_sr1 f_mm f_c1
0 0 0 a a f 1.0 1.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 0 1 b b f 2.0 2.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 0 2 c c f 3.0 3.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1 0 NaN a NaN NaN NaN a 1.0 1.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 1 1 NaN b NaN NaN NaN b 2.0 2.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 28 columns

Exploding dictionaries

The column c contains dictionaries with various data types that are in them. The puffy_to_long algorithm iteratively explodes all objects within the dictionaries:

[15]:
long_df = pb.puffy_to_long(df, 'c')
long_df.head()
[15]:
index_level0 c_level0 c_level1 c
0 0 dicta 0 1
1 0 dicta 1 2
2 0 dicta 2 3
3 0 dictb NaN 3
4 0 dictc key1 1

Since some values within dictionaries can be further exploded, while others cannot some levels/axes contain NaNs when the explosion iteration for that data type stopped (for a specific row).

API Reference

puffy_to_long(table, *cols, **kwargs)

Transform the “puffy” table into a long-format DataFrame.

FrameEngine(table[, datacols, indexcols, …])

Class to handle and transform a pandas.DataFrame object.

CallableContainer(default_callable)

Container of callables, that accept one argument.

Development

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Types of Contributions

Report Bugs

Report bugs at <https://github.com/gucky92/puffbird/issues>.

If you are reporting a bug, please include:

  • Your operating system name and version.

  • Any details about your local setup that might be helpful in troubleshooting.

  • Detailed steps to reproduce the bug.

Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation

puffbird could always use more documentation, whether as part of the official docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/gucky92/puffbird/issues.

If you are proposing a feature:

  • Explain in detail how it would work.

  • Keep the scope as narrow as possible, to make it easier to implement.

  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up puffbird for local development.

  1. Fork the puffbird repo on GitHub.

  2. Clone your fork locally:

    $ git clone git@github.com:your_name_here/puffbird.git
    
  3. Install your local copy into a virtualenv or a conda environment. Assuming you have virtualenvwrapper or conda installed (called puffbird), this is how you set up your fork for local development:

    $ mkvirtualenv puffbird or conda activate puffbird
    $ cd puffbird/
    $ python setup.py develop
    
  4. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:

    $ flake8 puffbird tests
    $ python setup.py test or pytest
    $ tox
    

    To get flake8 and tox, just pip install them into your virtualenv.

  6. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    
  7. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests.

  2. If the pull request adds functionality, the docs should be updated. Provide a tutorial in the form of an annotated .ipynb file and add it to docs/source/user_guide/tutorials.

  3. The pull request should work for Python 3.6, 3.7 and 3.8, and for PyPy. Check https://travis-ci.com/gucky92/puffbird/pull_requests and make sure that the tests pass for all supported Python versions.

Tips

To run a subset of tests:

$ pytest tests.test_frameengine.py

Deploying

TODO

Style Guide

puffbird follows the PEP8 standard and uses Black and Flake8 to ensure a consistent code format throughout the project.

Patterns

Using foo.__class__

puffbird uses ‘type(foo)’ instead ‘foo.__class__’ as it is making the code more readable. For example:

Good:

foo = "bar"
type(foo)

Bad:

foo = "bar"
foo.__class__

String formatting

Concatenated strings
Using f-strings

puffbird uses f-strings formatting instead of ‘%’ and ‘.format()’ string formatters.

The convention of using f-strings on a string that is concatenated over several lines, is to prefix only the lines containing values which need to be interpreted.

For example:

Good:

foo = "old_function"
bar = "new_function"

my_warning_message = (
    f"Warning, {foo} is deprecated, "
    "please use the new and way better "
    f"{bar}"
)

Bad:

foo = "old_function"
bar = "new_function"

my_warning_message = (
    f"Warning, {foo} is deprecated, "
    f"please use the new and way better "
    f"{bar}"
)
White spaces

Only put white space at the end of the previous line, so there is no whitespace at the beginning of the concatenated string.

For example:

Good:

example_string = (
    "Some long concatenated string, "
    "with good placement of the "
    "whitespaces"
)

Bad:

example_string = (
    "Some long concatenated string,"
    " with bad placement of the"
    " whitespaces"
)
Representation function (aka ‘repr()’)

puffbird uses ‘repr()’ instead of ‘%r’ and ‘!r’.

The use of ‘repr()’ will only happen when the value is not an obvious string.

For example:

Good:

value = str
f"Unknown received value, got: {repr(value)}"

Good:

value = str
f"Unknown received type, got: '{type(value).__name__}'"

Imports (aim for absolute)

In Python 3, absolute imports are recommended. Using absolute imports, doing something like import string will import the string module rather than string.py in the same directory. As much as possible, you should try to write out absolute imports that show the whole import chain from top-level puffbird.

Explicit relative imports are also supported in Python 3 but it is not recommended to use them. Implicit relative imports should never be used and are removed in Python 3.

For example:

# preferred
from puffbird.frame import FrameEngine

# not preferred
from .frame import FrameEngine

# wrong
from frame import FrameEngine

Authors

Development Lead

Contributors

None yet. Why not be the first?

Release notes

This is the list of changes to puffbird between each release. For full details, see the commit logs.

Version 0.0.0

Version 0.0.0 (May 15, 2020)

Preliminary version of puffbird. It contains the main features, which include the puffy_to_long function and the FrameEngine object.

Contributors
  • Matthias Christenson (gucky92)