Skip to content

v0.7.0

Compare
Choose a tag to compare
@h2o-ops h2o-ops released this 20 Nov 18:30

v0.7.0 — 2018-11-16

Added

  • Frame can now be created from a list/dict of numpy arrays.
  • Filters can now be used together with groupby expressions.
  • fread's verbose output now includes time spent opening the input file.
  • Added ability to read/write Jay files.
  • Frames can now be constructed via the keyword-args list of columns
    (i.e. Frame(A=..., B=...)).
  • Implemented logical operators "and" & and "or" | for eager evaluator.
  • Implemented integer division // and modulo % operators.
  • A Frame can now have a key column (or columns).
  • Key column(s) are saved when the frame is saved into a Jay file.
  • A Frame can now be naturally-joined with a keyed Frame.
  • Columns can now be updated within join expressions.
  • The error message when selecting a column that does not exist in the Frame
    now refers to similarly-named columns in that Frame, if there are any. At
    most 3 possible columns are reported, and they are ordered from most likely
    to least likely (#1253).
  • Frame() constructor now accepts a list of tuples, which it treats as rows
    when creating the frame.
  • Frame() can now be constructed from a list of named tuples, which will
    be treated as rows and field names will be used as column names.
  • frame.copy() can now be used to create a copy of the Frame.
  • Frame() can now be constructed from a list of dictionaries, where each
    item in the list represents a single row.
  • Frame() can now be created from a datetime64 numpy array (#1274).
  • Groupby calculations are now parallel.
  • Frame.cbind() now accepts a list of frames as the argument.
  • Frame can now be sorted by multiple columns.
  • new function split_into_nhot() to split a string column into fragments
    and then convert them into a set of indicator variables ("n-hot encode").
  • ability to convert object columns into strings.
  • implemented Frame.replace() function.
  • function abs() to find the absolute value of elements in the frame.
  • improved handling of Excel files by fread:
    • sheet name can now be used as a path component in the file name,
      causing only that particular sheet to be parsed;
    • further, a cell range can be specified as a path component after the
      sheet name, forcing fread to consider only the provided cell range;
    • fread can now handle the situation when a spreadsheet has multiple
      separate tables in the same sheet. They will now be detected automatically
      and returned to the user as separate Frame objects (the name of each
      frame will contain the sheet name and cell range from where the data was
      extracted).
  • HTML rendering of Frames inside a Jupyter notebook.
  • set-theoretic functions: union, intersect, setdiff and symdiff.
  • support for multi-column keys.
  • ability to join Frames on multiple columns.
  • In Jupyter notebook columns now have visual indicators of their types.
    The logical types are color-coded, and the size of each element is
    given by the number of dots (#1428).

Changed

  • names argument in Frame() constructor can no longer be a string --
    use a list or tuple of strings instead.
  • Frame.resize() removed -- same functionality is available via
    assigning to Frame.nrows.
  • Frame.rename() removed -- .name setter can be used instead.
  • Frame([]) now creates a 0x0 Frame instead of 0x1.
  • Parameter inplace in Frame.cbind() removed (was deprecated).
    Instead of inplace=False use dt.cbind(...).
  • Frame.cbind() no longer returns anything (previously it returned self,
    but this was confusing w.r.t whether it modifies the target, or returns
    a modified copy).
  • DT[i, j] now returns a python scalar value if i is integer, and j
    is integer/string. This is referred to as "explicit element selection".
    In the unlikely scenario when a single element needs to be returned as
    a frame, one can always write DT[i:i+1, j] or DT[[i], j].
  • The performance of explicit element selection improved by a factor of 200x.
  • Building no longer requires an LLVM distribution.
  • DT[col] syntax has been deprecated and now emits a warning. This
    will be converted to an error in version 0.8.0, and will be interpreted
    as row selector in 0.9.0.
  • default format for Frame.save() is now "jay".

Fixed

  • bug in dt.cbind() where the first Frame in the list was ignored.
  • bug with applying a cast expression to a view column.
  • occasional memory errors caused by a lack of available mmap handles.
  • memory leak in groupby operations.
  • names parameter in Frame constructor is now checked for correctness.
  • bug in fread with QR bump occurring out-of-sample.
  • import datatable now takes only 0.13s, down from 0.6s.
  • fread no longer wastes time reading the full input, if max_nrows option is used.
  • bug where max_nrows parameter was sometimes causing a seg.fault
  • fread performance bug caused by memory-mapped file being accidentally
    copied into RAM.
  • rare crash in fread when resizing the number of rows.
  • saving view frames to csv.
  • crash when sorting string columns containins NA strings.
  • crash when applying a filter to a 0-rows frame.
  • if x is a Frame, then y = dt.Frame(x) now creates a shallow copy
    instead of a copy-by-reference.
  • upgraded dependency version for typesentry, the previous version was not
    compatible with Python 3.7.
  • rare crash when converting a string column from pandas DataFrame, when
    that column contains many non-ASCII characters.
  • f-column-selectors should no longer throw errors and produce only unique
    ids when stringified (#1241).
  • crash when saving a frame with many boolean columns into CSV (#1278).
  • incorrect .stypes/.ltypes property after calling cbind().
  • calculation of min/max values in internal rowindex upon row resizing.
  • frame.sort() with no arguments no longer produces an error.
  • f-expressions now do not crash when reused with a different Frame.
  • g-columns can be properly selected in a join (#1352).
  • writing to disk of columns > 2GB in size (#1387).
  • crash when sorting by multiple columns and the first column was
    of string type (#1401).

Download links