puffbird.puffy_to_long¶
-
puffbird.
puffy_to_long
(table, *cols, **kwargs)[source]¶ Transform the “puffy” table into a long-format
DataFrame
.- Parameters
- table
DataFrame
A table with “puffy” columns.
- colsstr
A selection of “data columns” to create the long dataframe with. If not given, the algorithm will use all “data columns”.
- iterablecallable or dict of callables, optional
This function is called on each cell for each “data column” to create a new
Series
object. If the “data columns” containsdict
,list
,int
,float
,array
,recarray
,DataFrame
, orSeries
object types than the default iterable will handle these appropriately. When passing a dictionary of iterables, the keys should correspond to values indatacols
(i.e. the “data columns” of the table). In this case, each column can have a custom iterable used. If a column’s iterable is not specified the default iterable is used.- max_depthint or dict of ints, optional
Maximum depth of expanding each cell, before the algorithm stops for each “data column”. If we set the max_depth to 3, for example, a “data column” consisting of 4-D
array
objects will result in aDataFrame
where the “data column” cells contain 1-Darray
objects. If the arrays were 3-D, it will result in a long dataframe with scalars in each cell. Defaults to 3.- dropnabool, optional
Drop rows in long-format
DataFrame
, where all “data columns” are NaNs.- condcallable or dict of callables, optional
This function should return True or False and accept a
Series
object as an argument. If True, the algorithm will stop “exploding” a “data column”. The default cond argument suffices for all non-hashable types, such aslist
orarray
objects. If you want to “explode” hashable types such astuple
objects, a custom cond callable has to be defined. However, it is recommended that hashable types are first converted into non-hashable types using a custom conversion function and thecol_apply
method.- expand_colslist-like, optional
Specify a list of “data columns” to apply the
expand_col
method instead of “exploding” the column in the table. If all cells within a “data column” contains similarly constructedDataFrame
orSeries
object types, theexpand_col
method can be used instead of “exploding” the “data column”. Default to None.- shared_axesdict, optional
Specify if two or more “data columns” share axes (i.e. “explosion” iterations). The keyword will correspond to what the column will be called in the long dataframe. Each argument is a dictionary where the keys correspond to the names of the “data columns”, which share an axis, and the value correspond to the depth/axis is shared for each “data column”. shared_axis argument is usually defined for “data columns” that contain
array
objects. For example, one “data column” may consists of one-dimensional timestamp arrays and another “data column” may consist of two-dimensional timeseries arrays where the first axis of the latter is shared with the zeroth axis of the former.
- table
- Returns
Notes
If you find yourself writing custom iterable and cond arguments and believe these may be of general use, please open an issue or start a pull request.
Examples
>>> import pandas as pd >>> import puffbird as pb >>> df = pd.DataFrame({ ... 'a': [[1,2,3], [4,5,6,7], [3,4,5]], ... 'b': [{'c':['asdf'], 'd':['ret']}, {'d':['r']}, {'c':['ff']}], ... }) >>> df a b 0 [1, 2, 3] {'c': ['asdf'], 'd': ['ret']} 1 [4, 5, 6, 7] {'d': ['r']} 2 [3, 4, 5] {'c': ['ff']}
Now we can use the
puffy_to_long
function to create a long-formatDataFrame
:>>> pb.puffy_to_long(df) index_level0 a_level0 a b_level0 b_level1 b 0 0 0 1.0 c 0 asdf 1 0 0 1.0 d 0 ret 2 0 1 2.0 c 0 asdf 3 0 1 2.0 d 0 ret 4 0 2 3.0 c 0 asdf 5 0 2 3.0 d 0 ret 6 1 0 4.0 d 0 r 7 1 1 5.0 d 0 r 8 1 2 6.0 d 0 r 9 1 3 7.0 d 0 r 10 2 0 3.0 c 0 ff 11 2 1 4.0 c 0 ff 12 2 2 5.0 c 0 ff