v0.8.0
0.8.0 — 2019-01-04
Added
-
Method
frame.to_tuples()
converts a Frame into a list of tuples, each
tuple representing a single row (#1439). -
Method
frame.to_dict()
converts the Frame into a dictionary where the
keys are column names and values are lists of elements in each column
(#1439). -
Methods
frame.head(n)
andframe.tail(n)
added, returning the first/last
n
rows of the Frame respectively (#1307). -
Frame objects can now be pickled using the standard Python
pickle
interface (#1444). This also has an added benefit of reducing the potential
for a deadlock when using themultiprocessing
module. -
Added function
repeat(frame, n)
that creates a new Frame by row-binding
n
copies of theframe
(#1459). -
Module
datatable
now exposes C API, to allow other C/C++ libraries interact
with datatable Frames natively (#1469). See "datatable/include/datatable.h"
for the description of the API functions. Thanks Qiang Kou for testing
this functionality. -
The column selector
j
inDT[i, j]
can now be a list/iterator of booleans.
This list should have lengthDT.ncols
, and the entries in this list will
indicate whether to select the corresponding column of the Frame or not
(#1503). This can be used to implement a simple column filter, for example:del DT[:, (name.endswith("_tmp") for name in DT.names)]
-
Added ability to train and fit an FTRL-Proximal (Follow The Regularized
Leader) online learning algorithm on a data frame (#1389). The implementation
is multi-threaded and has high performance. -
Added functions
log
andlog10
for computing the natural and base-10
logarithms of a column (#1558). -
Sorting functionality is now integrated into the
DT[i, j, ...]
call via
the functionsort()
. If sorting is specified alongside a groupby, the
values will be sorted within each group (#1531). -
A slice-valued
i
expression can now be combined with aby()
operator
inDT[i, j, by()]
. The result is that the slicei
is applied to each
group produced byby()
, before thej
is evaluated (#1585). -
Implemented sorting in reverse direction, via
sort(-col)
, wherecol
is
any regular column selector such asf.A
orf[column]
. The-
sign is
symbolic, no actual negation occurs. As such, this works even for string
columns (#792).
Fixed
-
Fixed rendering of "view" Frames in a Jupyter notebook (#1448). This bug
caused the frame to display wrong data when viewed in a notebook. -
Fixed crash when an int-column
i
selector is applied to a Frame which
already had another row filter applied (#1437). -
Frame.copy()
now retains the frame's key, if any (#1443). -
Installation from source distribution now works as expected (#1451).
-
When a
g.
-column is used but there is no join frame, an appropriate
error message is now emitted (#1481). -
The equality operators
==
/!=
can now be applied to string columns too
(#1491). -
Function
dt.split_into_nhot()
now works correctly with view Frames (#1507). -
DT.replace()
now works correctly when the replacement list is[+inf]
or
[1.7976931348623157e+308]
(#1510). -
FTRL algorithm now works correctly with view frames (#1502).
-
Partial column update (i.e. expression of the form
DT[i, j] = R
) now works
for string columns as well (#1523). -
DT.replace()
now throws an error if called with 0 or 1 argument (#1525). -
Fixed crash when viewing a frame obtained by resizing a 0-row frame (#1527).
-
Function
count()
now returns correct result within theDT[i, j]
expression
with non-triviali
(#1316). -
Fixed groupby when it is applied to a Frame with view columns (#1542).
-
When replacing an empty set of columns, the replacement frame can now be
also empty (i.e. have shape[0 x 0]
) (#1544). -
Fixed join results when join is applied to a view frame (#1540).
-
Fixed
Frame.replace()
in view string columns (#1549). -
A 0-row integer column can now be used as
i
inDT[i, j]
(#1551). -
A string column produced from a partial join now materializes correctly
(#1556). -
Fixed incorrect result during "true division" of integer columns, when one
of the values was negative and the other positive (#1562). -
Frame.to_csv()
no longer crashes on Unix when writing an empty frame
(#1565). -
The build process on MacOS now ensures that the
libomp.dylib
is properly
referenced via@rpath
. This prevents installation problems caused by the
dynamic dependencies referenced by their absolute paths which are not valid
outside of the build machine (#1559). -
Fixed crash when the RHS of assignment
DT[i, j] = ...
was a list of
expressions (#1539). -
Fixed crash when an empty
by()
condition was used inDT[i, j, by]
(#1572). -
Expression
DT[:, :, by(...)]
no longer produces duplicates of columns used
in the by-clause (#1576). -
In certain circumstances mixing computed and plain columns under groupby
caused incorrect result (#1578). -
Fixed an internal error which was occurring when multiple row filters were
applied to a Frame in sequence (#1592). -
Fixed rbinding of frames if one of them is a negative step slice (#1594).
-
Fixed a crash that occurred with the latest
pandas
0.24.0 (#1600). -
Fixed invalid result when cbinding several 0-row frames (#1604).
Changed
-
The primary datatable expression
DT[i, j, ...]
is now evaluated entirely
in C++, improving performance and reliability. -
Setting
frame.nrows
now always pads the Frame with NAs, even if the Frame
has only 1 row. Previously changing.nrows
on a 1-row Frame caused its
value to be repeated. Useframe.repeat()
in order to expand the Frame
by copying its values. -
Improved the performance of setting
frame.nrows
. Now if the frame has
multiple columns, a view will be created. -
When no columns are selected in
DT[i, j]
, the returned frame will now
have the same number of rows as if at least 1 column was selected. Previously
an empty[0 x 0]
frame was returned. -
Assigning a value to a column
DT[:, 'A'] = x
will attempt to preserve the
column's stype; or if not possible, the column will be upcasted within its
logical type. -
It is no longer possible to assign a value of an incompatible logical type to
an existing column. For example, an assignmentDT[:, 'A'] = 3
is now legal
only if column A is of integer or real type, but will raise an exception if A
is a boolean or string. -
Frame.rbind()
method no longer has a return value. The method always updated
the frame in-place, so it was confusing to both update in-place and return the
original frame (#1610). -
min()
/max()
over an empty or all-NA column now returnsNone
instead of
+Inf / -Inf respectively (#1624).
Deprecated
-
Frame methods
.topython()
,.topandas()
and.tonumpy()
are now
deprecated, they will be removed in 0.9.0. Please use.to_list()
,
.to_pandas()
and.to_numpy()
instead. -
Calling a frame object
DT(rows=i, select=j, groupby=g, join=z, sort=s)
is
now deprecated. Use the expressionDT[i, j, by(g), join(z), sort(s)]
instead, where symbolsby()
,join()
andsort()
can all be imported
from thedatatable
namespace (#1579).
Removed
- Single-item Frame selectors are now prohibited:
DT[col]
is an error. In
the future this expression will be interpreted as a row selector instead.
Notes
-
datatable
now uses integration with
Codacy
to keep track of code quality and potential errors. -
Internally, we now allow each Column in a Frame to have its own separate
RowIndex. This will improve the performance, especially in join/cbind
operations. Applications that use thedatatable
's C API may need to be
updated to account for this (#1188). -
This release was prepared by:
-
Pasha Stetsenko - core functionality improvements, bug fixes,
refactoring; -
Oleksiy Kononenko - FTRL algo implementation, fixes in the Aggregator;
-
Michael Frasco - documentation fixes;
-
Michal Raška - build system maintenance.
-
-
Additional thanks to people who helped make
datatable
more stable by
discovering and reporting bugs that were fixed in this release:Pasha Stetsenko (#1316, #1443, #1481, #1539, #1542, #1551, #1572, #1576,
#1578, #1592, #1594, #1602, #1604),
Arno Candel (#1437, #1491, #1510, #1525, #1549, #1556, #1562),
Michael Frasco (#1448),
Jonathan McKinney (#1451, #1565),
CarlosThinkBig (#1475),
Olivier (#1502),
Oleksiy Kononenko (#1507, #1600),
Nishant Kalonia (#1527, #1540),
Megan Kurka (#1544),
Joseph Granados (#1559).
Download links
-
Linux X86_64
- datatable-0.8.0-cp37-cp37m-linux_x86_64.whl (for Python 3.7)
- datatable-0.8.0-cp36-cp36m-linux_x86_64.whl (for Python 3.6)
- datatable-0.8.0-cp35-cp35m-linux_x86_64.whl (for Python 3.5)
-
PowerPC PPC64
- datatable-0.8.0-cp37-cp37m-linux_ppc64le.whl (for Python 3.7)
- datatable-0.8.0-cp36-cp36m-linux_ppc64le.whl (for Python 3.6)
- datatable-0.8.0-cp35-cp35m-linux_ppc64le.whl (for Python 3.5)
-
MacOSX
- datatable-0.8.0-cp37-cp37m-macosx_10_7_x86_64.whl (for Python 3.7)
- datatable-0.8.0-cp36-cp36m-macosx_10_7_x86_64.whl (for Python 3.6)
- datatable-0.8.0-cp35-cp35m-macosx_10_7_x86_64.whl (for Python 3.5)
-
Source Distribution