Skip to content

v0.8.0

Compare
Choose a tag to compare
@h2o-ops h2o-ops released this 25 Jul 23:12
436e6ad

0.8.0 — 2019-01-04

Added

  • Method frame.to_tuples() converts a Frame into a list of tuples, each
    tuple representing a single row (#1439).

  • Method frame.to_dict() converts the Frame into a dictionary where the
    keys are column names and values are lists of elements in each column
    (#1439).

  • Methods frame.head(n) and frame.tail(n) added, returning the first/last
    n rows of the Frame respectively (#1307).

  • Frame objects can now be pickled using the standard Python pickle
    interface (#1444). This also has an added benefit of reducing the potential
    for a deadlock when using the multiprocessing module.

  • Added function repeat(frame, n) that creates a new Frame by row-binding
    n copies of the frame (#1459).

  • Module datatable now exposes C API, to allow other C/C++ libraries interact
    with datatable Frames natively (#1469). See "datatable/include/datatable.h"
    for the description of the API functions. Thanks Qiang Kou for testing
    this functionality.

  • The column selector j in DT[i, j] can now be a list/iterator of booleans.
    This list should have length DT.ncols, and the entries in this list will
    indicate whether to select the corresponding column of the Frame or not
    (#1503). This can be used to implement a simple column filter, for example:

    del DT[:, (name.endswith("_tmp") for name in DT.names)]
  • Added ability to train and fit an FTRL-Proximal (Follow The Regularized
    Leader) online learning algorithm on a data frame (#1389). The implementation
    is multi-threaded and has high performance.

  • Added functions log and log10 for computing the natural and base-10
    logarithms of a column (#1558).

  • Sorting functionality is now integrated into the DT[i, j, ...] call via
    the function sort(). If sorting is specified alongside a groupby, the
    values will be sorted within each group (#1531).

  • A slice-valued i expression can now be combined with a by() operator
    in DT[i, j, by()]. The result is that the slice i is applied to each
    group produced by by(), before the j is evaluated (#1585).

  • Implemented sorting in reverse direction, via sort(-col), where col is
    any regular column selector such as f.A or f[column]. The - sign is
    symbolic, no actual negation occurs. As such, this works even for string
    columns (#792).

Fixed

  • Fixed rendering of "view" Frames in a Jupyter notebook (#1448). This bug
    caused the frame to display wrong data when viewed in a notebook.

  • Fixed crash when an int-column i selector is applied to a Frame which
    already had another row filter applied (#1437).

  • Frame.copy() now retains the frame's key, if any (#1443).

  • Installation from source distribution now works as expected (#1451).

  • When a g.-column is used but there is no join frame, an appropriate
    error message is now emitted (#1481).

  • The equality operators == / != can now be applied to string columns too
    (#1491).

  • Function dt.split_into_nhot() now works correctly with view Frames (#1507).

  • DT.replace() now works correctly when the replacement list is [+inf] or
    [1.7976931348623157e+308] (#1510).

  • FTRL algorithm now works correctly with view frames (#1502).

  • Partial column update (i.e. expression of the form DT[i, j] = R) now works
    for string columns as well (#1523).

  • DT.replace() now throws an error if called with 0 or 1 argument (#1525).

  • Fixed crash when viewing a frame obtained by resizing a 0-row frame (#1527).

  • Function count() now returns correct result within the DT[i, j] expression
    with non-trivial i (#1316).

  • Fixed groupby when it is applied to a Frame with view columns (#1542).

  • When replacing an empty set of columns, the replacement frame can now be
    also empty (i.e. have shape [0 x 0]) (#1544).

  • Fixed join results when join is applied to a view frame (#1540).

  • Fixed Frame.replace() in view string columns (#1549).

  • A 0-row integer column can now be used as i in DT[i, j] (#1551).

  • A string column produced from a partial join now materializes correctly
    (#1556).

  • Fixed incorrect result during "true division" of integer columns, when one
    of the values was negative and the other positive (#1562).

  • Frame.to_csv() no longer crashes on Unix when writing an empty frame
    (#1565).

  • The build process on MacOS now ensures that the libomp.dylib is properly
    referenced via @rpath. This prevents installation problems caused by the
    dynamic dependencies referenced by their absolute paths which are not valid
    outside of the build machine (#1559).

  • Fixed crash when the RHS of assignment DT[i, j] = ... was a list of
    expressions (#1539).

  • Fixed crash when an empty by() condition was used in DT[i, j, by] (#1572).

  • Expression DT[:, :, by(...)] no longer produces duplicates of columns used
    in the by-clause (#1576).

  • In certain circumstances mixing computed and plain columns under groupby
    caused incorrect result (#1578).

  • Fixed an internal error which was occurring when multiple row filters were
    applied to a Frame in sequence (#1592).

  • Fixed rbinding of frames if one of them is a negative step slice (#1594).

  • Fixed a crash that occurred with the latest pandas 0.24.0 (#1600).

  • Fixed invalid result when cbinding several 0-row frames (#1604).

Changed

  • The primary datatable expression DT[i, j, ...] is now evaluated entirely
    in C++, improving performance and reliability.

  • Setting frame.nrows now always pads the Frame with NAs, even if the Frame
    has only 1 row. Previously changing .nrows on a 1-row Frame caused its
    value to be repeated. Use frame.repeat() in order to expand the Frame
    by copying its values.

  • Improved the performance of setting frame.nrows. Now if the frame has
    multiple columns, a view will be created.

  • When no columns are selected in DT[i, j], the returned frame will now
    have the same number of rows as if at least 1 column was selected. Previously
    an empty [0 x 0] frame was returned.

  • Assigning a value to a column DT[:, 'A'] = x will attempt to preserve the
    column's stype; or if not possible, the column will be upcasted within its
    logical type.

  • It is no longer possible to assign a value of an incompatible logical type to
    an existing column. For example, an assignment DT[:, 'A'] = 3 is now legal
    only if column A is of integer or real type, but will raise an exception if A
    is a boolean or string.

  • Frame.rbind() method no longer has a return value. The method always updated
    the frame in-place, so it was confusing to both update in-place and return the
    original frame (#1610).

  • min() / max() over an empty or all-NA column now returns None instead of
    +Inf / -Inf respectively (#1624).

Deprecated

  • Frame methods .topython(), .topandas() and .tonumpy() are now
    deprecated, they will be removed in 0.9.0. Please use .to_list(),
    .to_pandas() and .to_numpy() instead.

  • Calling a frame object DT(rows=i, select=j, groupby=g, join=z, sort=s) is
    now deprecated. Use the expression DT[i, j, by(g), join(z), sort(s)]
    instead, where symbols by(), join() and sort() can all be imported
    from the datatable namespace (#1579).

Removed

  • Single-item Frame selectors are now prohibited: DT[col] is an error. In
    the future this expression will be interpreted as a row selector instead.

Notes


Download links