puffbird.FrameEngine.to_long¶
-
FrameEngine.to_long(*cols, iterable=CallableContainer(iter), max_depth=3, dropna=True, reindex=False, cond=is_hashable(series), expand_cols=None, **shared_axes)[source]¶ Transform the “puffy” table into a long-format
DataFrame.- Parameters
- colsstr
A selection of “data columns” to create the long dataframe with. If not given, the algorithm will use all “data columns”.
- iterablecallable or dict of callables, optional
This function is called on each cell for each “data column” to create a new
Seriesobject. If the “data columns” containsdict,list,int,float,array,recarray,DataFrame, orSeriesobject types than the default iterable will handle these appropriately. When passing a dictionary of iterables, the keys should correspond to values indatacols(i.e. the “data columns” of the table). In this case, each column can have a custom iterable used. If a column’s iterable is not specified the default iterable is used.- max_depthint or dict of ints, optional
Maximum depth of expanding each cell, before the algorithm stops for each “data column”. If we set the max_depth to 3, for example, a “data column” consisting of 4-D
arrayobjects will result in aDataFramewhere the “data column” cells contain 1-Darrayobjects. If the arrays were 3-D, it will result in a long dataframe with scalars in each cell. Defaults to 3.- dropnabool, optional
Drop rows in long-format
DataFrame, where all “data columns” are NaNs.- condcallable or dict of callables, optional
This function should return True or False and accept a
Seriesobject as an argument. If True, the algorithm will stop “exploding” a “data column”. The default cond argument suffices for all non-hashable types, such aslistorarrayobjects. If you want to “explode” hashable types such astupleobjects, a custom cond callable has to be defined. However, it is recommended that hashable types are first converted into non-hashable types using a custom conversion function and thecol_applymethod.- expand_colslist-like, optional
Specify a list of “data columns” to apply the
expand_colmethod instead of “exploding” the column in the table. If all cells within a “data column” contains similarly constructedDataFrameorSeriesobject types, theexpand_colmethod can be used instead of “exploding” the “data column”. Default to None.- shared_axesdict, optional
Specify if two or more “data columns” share axes (i.e. “explosion” iterations). The keyword will correspond to what the column will be called in the long dataframe. Each argument is a dictionary where the keys correspond to the names of the “data columns”, which share an axis, and the value correspond to the depth/axis is shared for each “data column”. shared_axis argument is usually defined for “data columns” that contain
arrayobjects. For example, one “data column” may consists of one-dimensional timestamp arrays and another “data column” may consist of two-dimensional timeseries arrays where the first axis of the latter is shared with the zeroth axis of the former.
- Returns
See also
Notes
If you find yourself writing custom iterable and cond arguments and believe these may be of general use, please open an issue or start a pull request.
Examples
>>> import pandas as pd >>> import puffbird as pb >>> df = pd.DataFrame({ ... 'a': [[1,2,3], [4,5,6,7], [3,4,5]], ... 'b': [{'c':['asdf'], 'd':['ret']}, {'d':['r']}, {'c':['ff']}], ... }) >>> df a b 0 [1, 2, 3] {'c': ['asdf'], 'd': ['ret']} 1 [4, 5, 6, 7] {'d': ['r']} 2 [3, 4, 5] {'c': ['ff']} >>> engine = pb.FrameEngine(df)
Now we can use the
to_longmethod to create a long-formatDataFrame:>>> engine.to_long() index_level0 a_level0 a b_level0 b_level1 b 0 0 0 1.0 c 0 asdf 1 0 0 1.0 d 0 ret 2 0 1 2.0 c 0 asdf 3 0 1 2.0 d 0 ret 4 0 2 3.0 c 0 asdf 5 0 2 3.0 d 0 ret 6 1 0 4.0 d 0 r 7 1 1 5.0 d 0 r 8 1 2 6.0 d 0 r 9 1 3 7.0 d 0 r 10 2 0 3.0 c 0 ff 11 2 1 4.0 c 0 ff 12 2 2 5.0 c 0 ff