Usage

Xndframes is meant to provide a set of Pandas ExtensionDType/Array implementations backed by xnd

This document describes how to use the methods and classes provided by xndframes.

We will assume that the following packages have been imported.

In [1]: import xndframes as xf

In [2]: import pandas as pd

Pandas Integration

So far, xndframes implements XndframesArray. XndframesArray satisfies pandas extension array interface, which means that it can safely be stored inside pandas’s Series and DataFrame.

In [3]: s = ["Pandas", "NumPy", "xnd", "SciPy", None, "CuPy", None, "Keras", "Numba"]

In [4]: packages = xf.XndframesArray(s)

In [5]: type(packages)
Out[5]: xndframes.base.XndframesArray

In [6]: print(packages.data)
xnd(['Pandas', 'NumPy', 'xnd', 'SciPy', None, 'CuPy', None, 'Keras', 'Numba'], type='9 * ?string')

In [7]: ser = pd.Series(packages)

In [8]: ser
Out[8]: 
0    Pandas
1     NumPy
2       xnd
3     SciPy
4      None
5      CuPy
6      None
7     Keras
8     Numba
dtype: xndframes[9 * ?string]

In [9]: vals = list(range(9))

In [10]: values = xf.XndframesArray(vals)

In [11]: ser2 = pd.Series(values)

In [12]: ser2
Out[12]: 
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
dtype: xndframes[9 * int64]

In [13]: df = pd.DataFrame({"packages": packages, "id": values})

In [14]: df.head()
Out[14]: 
  packages id
0   Pandas  0
1    NumPy  1
2      xnd  2
3    SciPy  3
4     None  4

In [15]: df
Out[15]: 
  packages id
0   Pandas  0
1    NumPy  1
2      xnd  2
3    SciPy  3
4     None  4
5     CuPy  5
6     None  6
7    Keras  7
8    Numba  8

In [16]: df.describe()
Out[16]: 
       packages  id
count         7   9
unique        7   9
top       NumPy   8
freq          1   1

Most pandas methods that make sense should work. The following section will call out points of interest.

In [17]: packages.shape
Out[17]: (9,)

In [18]: packages.unique()
Out[18]: <xndframes.base.XndframesArray at 0x7f0ee0262a90>

In [19]: packages.unique().data
Out[19]: xnd(['Pandas', 'NumPy', 'xnd', 'SciPy', None, 'CuPy', 'Keras', 'Numba'], type='8 * ?string')

In [20]: packages.isna()
Out[20]: array([False, False, False, False,  True, False,  True, False, False])

In [21]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 2 columns):
packages    7 non-null xndframes[9 * ?string]
id          9 non-null xndframes[9 * int64]
dtypes: xndframes[9 * ?string](1), xndframes[9 * int64](1)
memory usage: 224.0 bytes

Indexing

If your selection returns a scalar, you get back a string.

In [22]: ser[0]
Out[22]: 'Pandas'

In [23]: df.loc[2, "packages"]
Out[23]: 'xnd'

Missing Data

xnd uses None to represent missing values. Xndframes does the same.

In [24]: df.isna()
Out[24]: 
   packages     id
0     False  False
1     False  False
2     False  False
3     False  False
4      True  False
5     False  False
6      True  False
7     False  False
8     False  False

In [25]: df.dropna()
Out[25]: 
  packages id
0   Pandas  0
1    NumPy  1
2      xnd  2
3    SciPy  3
5     CuPy  5
7    Keras  7
8    Numba  8