Release v0.8.0 · h2oai/datatable

0.8.0 — 2019-01-04

Added

Method frame.to_tuples() converts a Frame into a list of tuples, each
tuple representing a single row (#1439).
Method frame.to_dict() converts the Frame into a dictionary where the
keys are column names and values are lists of elements in each column
(#1439).
Methods frame.head(n) and frame.tail(n) added, returning the first/last
n rows of the Frame respectively (#1307).
Frame objects can now be pickled using the standard Python pickle
interface (#1444). This also has an added benefit of reducing the potential
for a deadlock when using the multiprocessing module.
Added function repeat(frame, n) that creates a new Frame by row-binding
n copies of the frame (#1459).
Module datatable now exposes C API, to allow other C/C++ libraries interact
with datatable Frames natively (#1469). See "datatable/include/datatable.h"
for the description of the API functions. Thanks Qiang Kou for testing
this functionality.
The column selector j in DT[i, j] can now be a list/iterator of booleans.
This list should have length DT.ncols, and the entries in this list will
indicate whether to select the corresponding column of the Frame or not
(#1503). This can be used to implement a simple column filter, for example:
```
del DT[:, (name.endswith("_tmp") for name in DT.names)]
```
Added ability to train and fit an FTRL-Proximal (Follow The Regularized
Leader) online learning algorithm on a data frame (#1389). The implementation
is multi-threaded and has high performance.
Added functions log and log10 for computing the natural and base-10
logarithms of a column (#1558).
Sorting functionality is now integrated into the DT[i, j, ...] call via
the function sort(). If sorting is specified alongside a groupby, the
values will be sorted within each group (#1531).
A slice-valued i expression can now be combined with a by() operator
in DT[i, j, by()]. The result is that the slice i is applied to each
group produced by by(), before the j is evaluated (#1585).
Implemented sorting in reverse direction, via sort(-col), where col is
any regular column selector such as f.A or f[column]. The - sign is
symbolic, no actual negation occurs. As such, this works even for string
columns (#792).

Fixed

Fixed rendering of "view" Frames in a Jupyter notebook (#1448). This bug
caused the frame to display wrong data when viewed in a notebook.
Fixed crash when an int-column i selector is applied to a Frame which
already had another row filter applied (#1437).
Frame.copy() now retains the frame's key, if any (#1443).
Installation from source distribution now works as expected (#1451).
When a g.-column is used but there is no join frame, an appropriate
error message is now emitted (#1481).
The equality operators == / != can now be applied to string columns too
(#1491).
Function dt.split_into_nhot() now works correctly with view Frames (#1507).
DT.replace() now works correctly when the replacement list is [+inf] or
[1.7976931348623157e+308] (#1510).
FTRL algorithm now works correctly with view frames (#1502).
Partial column update (i.e. expression of the form DT[i, j] = R) now works
for string columns as well (#1523).
DT.replace() now throws an error if called with 0 or 1 argument (#1525).
Fixed crash when viewing a frame obtained by resizing a 0-row frame (#1527).
Function count() now returns correct result within the DT[i, j] expression
with non-trivial i (#1316).
Fixed groupby when it is applied to a Frame with view columns (#1542).
When replacing an empty set of columns, the replacement frame can now be
also empty (i.e. have shape [0 x 0]) (#1544).
Fixed join results when join is applied to a view frame (#1540).
Fixed Frame.replace() in view string columns (#1549).
A 0-row integer column can now be used as i in DT[i, j] (#1551).
A string column produced from a partial join now materializes correctly
(#1556).
Fixed incorrect result during "true division" of integer columns, when one
of the values was negative and the other positive (#1562).
Frame.to_csv() no longer crashes on Unix when writing an empty frame
(#1565).
The build process on MacOS now ensures that the libomp.dylib is properly
referenced via @rpath. This prevents installation problems caused by the
dynamic dependencies referenced by their absolute paths which are not valid
outside of the build machine (#1559).
Fixed crash when the RHS of assignment DT[i, j] = ... was a list of
expressions (#1539).
Fixed crash when an empty by() condition was used in DT[i, j, by] (#1572).
Expression DT[:, :, by(...)] no longer produces duplicates of columns used
in the by-clause (#1576).
In certain circumstances mixing computed and plain columns under groupby
caused incorrect result (#1578).
Fixed an internal error which was occurring when multiple row filters were
applied to a Frame in sequence (#1592).
Fixed rbinding of frames if one of them is a negative step slice (#1594).
Fixed a crash that occurred with the latest pandas 0.24.0 (#1600).
Fixed invalid result when cbinding several 0-row frames (#1604).

Changed

The primary datatable expression DT[i, j, ...] is now evaluated entirely
in C++, improving performance and reliability.
Setting frame.nrows now always pads the Frame with NAs, even if the Frame
has only 1 row. Previously changing .nrows on a 1-row Frame caused its
value to be repeated. Use frame.repeat() in order to expand the Frame
by copying its values.
Improved the performance of setting frame.nrows. Now if the frame has
multiple columns, a view will be created.
When no columns are selected in DT[i, j], the returned frame will now
have the same number of rows as if at least 1 column was selected. Previously
an empty [0 x 0] frame was returned.
Assigning a value to a column DT[:, 'A'] = x will attempt to preserve the
column's stype; or if not possible, the column will be upcasted within its
logical type.
It is no longer possible to assign a value of an incompatible logical type to
an existing column. For example, an assignment DT[:, 'A'] = 3 is now legal
only if column A is of integer or real type, but will raise an exception if A
is a boolean or string.
Frame.rbind() method no longer has a return value. The method always updated
the frame in-place, so it was confusing to both update in-place and return the
original frame (#1610).
min() / max() over an empty or all-NA column now returns None instead of
+Inf / -Inf respectively (#1624).

Deprecated

Frame methods .topython(), .topandas() and .tonumpy() are now
deprecated, they will be removed in 0.9.0. Please use .to_list(),
.to_pandas() and .to_numpy() instead.
Calling a frame object DT(rows=i, select=j, groupby=g, join=z, sort=s) is
now deprecated. Use the expression DT[i, j, by(g), join(z), sort(s)]
instead, where symbols by(), join() and sort() can all be imported
from the datatable namespace (#1579).

Removed

Single-item Frame selectors are now prohibited: DT[col] is an error. In
the future this expression will be interpreted as a row selector instead.

Notes

datatable now uses integration with
Codacy
to keep track of code quality and potential errors.
Internally, we now allow each Column in a Frame to have its own separate
RowIndex. This will improve the performance, especially in join/cbind
operations. Applications that use the datatable's C API may need to be
updated to account for this (#1188).
This release was prepared by:
- Pasha Stetsenko - core functionality improvements, bug fixes,
  refactoring;
- Oleksiy Kononenko - FTRL algo implementation, fixes in the Aggregator;
- Michael Frasco - documentation fixes;
- Michal Raška - build system maintenance.
Additional thanks to people who helped make datatable more stable by
discovering and reporting bugs that were fixed in this release:

Pasha Stetsenko (#1316, #1443, #1481, #1539, #1542, #1551, #1572, #1576,
#1578, #1592, #1594, #1602, #1604),
Arno Candel (#1437, #1491, #1510, #1525, #1549, #1556, #1562),
Michael Frasco (#1448),
Jonathan McKinney (#1451, #1565),
CarlosThinkBig (#1475),
Olivier (#1502),
Oleksiy Kononenko (#1507, #1600),
Nishant Kalonia (#1527, #1540),
Megan Kurka (#1544),
Joseph Granados (#1559).

Download links

Linux X86_64
- datatable-0.8.0-cp37-cp37m-linux_x86_64.whl (for Python 3.7)
- datatable-0.8.0-cp36-cp36m-linux_x86_64.whl (for Python 3.6)
- datatable-0.8.0-cp35-cp35m-linux_x86_64.whl (for Python 3.5)
PowerPC PPC64
- datatable-0.8.0-cp37-cp37m-linux_ppc64le.whl (for Python 3.7)
- datatable-0.8.0-cp36-cp36m-linux_ppc64le.whl (for Python 3.6)
- datatable-0.8.0-cp35-cp35m-linux_ppc64le.whl (for Python 3.5)
MacOSX
- datatable-0.8.0-cp37-cp37m-macosx_10_7_x86_64.whl (for Python 3.7)
- datatable-0.8.0-cp36-cp36m-macosx_10_7_x86_64.whl (for Python 3.6)
- datatable-0.8.0-cp35-cp35m-macosx_10_7_x86_64.whl (for Python 3.5)
Source Distribution
- datatable-0.8.0.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.0

0.8.0 — 2019-01-04

Added

Fixed

Changed

Deprecated

Removed

Notes

Download links