Release v0.7.0 · h2oai/datatable

v0.7.0 — 2018-11-16

Added

Frame can now be created from a list/dict of numpy arrays.
Filters can now be used together with groupby expressions.
fread's verbose output now includes time spent opening the input file.
Added ability to read/write Jay files.
Frames can now be constructed via the keyword-args list of columns
(i.e. Frame(A=..., B=...)).
Implemented logical operators "and" & and "or" | for eager evaluator.
Implemented integer division // and modulo % operators.
A Frame can now have a key column (or columns).
Key column(s) are saved when the frame is saved into a Jay file.
A Frame can now be naturally-joined with a keyed Frame.
Columns can now be updated within join expressions.
The error message when selecting a column that does not exist in the Frame
now refers to similarly-named columns in that Frame, if there are any. At
most 3 possible columns are reported, and they are ordered from most likely
to least likely (#1253).
Frame() constructor now accepts a list of tuples, which it treats as rows
when creating the frame.
Frame() can now be constructed from a list of named tuples, which will
be treated as rows and field names will be used as column names.
frame.copy() can now be used to create a copy of the Frame.
Frame() can now be constructed from a list of dictionaries, where each
item in the list represents a single row.
Frame() can now be created from a datetime64 numpy array (#1274).
Groupby calculations are now parallel.
Frame.cbind() now accepts a list of frames as the argument.
Frame can now be sorted by multiple columns.
new function split_into_nhot() to split a string column into fragments
and then convert them into a set of indicator variables ("n-hot encode").
ability to convert object columns into strings.
implemented Frame.replace() function.
function abs() to find the absolute value of elements in the frame.
improved handling of Excel files by fread:
- sheet name can now be used as a path component in the file name,
  causing only that particular sheet to be parsed;
- further, a cell range can be specified as a path component after the
  sheet name, forcing fread to consider only the provided cell range;
- fread can now handle the situation when a spreadsheet has multiple
  separate tables in the same sheet. They will now be detected automatically
  and returned to the user as separate Frame objects (the name of each
  frame will contain the sheet name and cell range from where the data was
  extracted).
HTML rendering of Frames inside a Jupyter notebook.
set-theoretic functions: union, intersect, setdiff and symdiff.
support for multi-column keys.
ability to join Frames on multiple columns.
In Jupyter notebook columns now have visual indicators of their types.
The logical types are color-coded, and the size of each element is
given by the number of dots (#1428).

Changed

names argument in Frame() constructor can no longer be a string --
use a list or tuple of strings instead.
Frame.resize() removed -- same functionality is available via
assigning to Frame.nrows.
Frame.rename() removed -- .name setter can be used instead.
Frame([]) now creates a 0x0 Frame instead of 0x1.
Parameter inplace in Frame.cbind() removed (was deprecated).
Instead of inplace=False use dt.cbind(...).
Frame.cbind() no longer returns anything (previously it returned self,
but this was confusing w.r.t whether it modifies the target, or returns
a modified copy).
DT[i, j] now returns a python scalar value if i is integer, and j
is integer/string. This is referred to as "explicit element selection".
In the unlikely scenario when a single element needs to be returned as
a frame, one can always write DT[i:i+1, j] or DT[[i], j].
The performance of explicit element selection improved by a factor of 200x.
Building no longer requires an LLVM distribution.
DT[col] syntax has been deprecated and now emits a warning. This
will be converted to an error in version 0.8.0, and will be interpreted
as row selector in 0.9.0.
default format for Frame.save() is now "jay".

Fixed

bug in dt.cbind() where the first Frame in the list was ignored.
bug with applying a cast expression to a view column.
occasional memory errors caused by a lack of available mmap handles.
memory leak in groupby operations.
names parameter in Frame constructor is now checked for correctness.
bug in fread with QR bump occurring out-of-sample.
import datatable now takes only 0.13s, down from 0.6s.
fread no longer wastes time reading the full input, if max_nrows option is used.
bug where max_nrows parameter was sometimes causing a seg.fault
fread performance bug caused by memory-mapped file being accidentally
copied into RAM.
rare crash in fread when resizing the number of rows.
saving view frames to csv.
crash when sorting string columns containins NA strings.
crash when applying a filter to a 0-rows frame.
if x is a Frame, then y = dt.Frame(x) now creates a shallow copy
instead of a copy-by-reference.
upgraded dependency version for typesentry, the previous version was not
compatible with Python 3.7.
rare crash when converting a string column from pandas DataFrame, when
that column contains many non-ASCII characters.
f-column-selectors should no longer throw errors and produce only unique
ids when stringified (#1241).
crash when saving a frame with many boolean columns into CSV (#1278).
incorrect .stypes/.ltypes property after calling cbind().
calculation of min/max values in internal rowindex upon row resizing.
frame.sort() with no arguments no longer produces an error.
f-expressions now do not crash when reused with a different Frame.
g-columns can be properly selected in a join (#1352).
writing to disk of columns > 2GB in size (#1387).
crash when sorting by multiple columns and the first column was
of string type (#1401).

Download links

Linux X86_64
- datatable-0.7.0-cp37-cp37m-linux_x86_64.whl (for Python 3.7)
- datatable-0.7.0-cp36-cp36m-linux_x86_64.whl (for Python 3.6)
- datatable-0.7.0-cp35-cp35m-linux_x86_64.whl (for Python 3.5)
PowerPC PPC64
- datatable-0.7.0-cp37-cp37m-linux_ppc64le.whl (for Python 3.7)
- datatable-0.7.0-cp36-cp36m-linux_ppc64le.whl (for Python 3.6)
- datatable-0.7.0-cp35-cp35m-linux_ppc64le.whl (for Python 3.5)
MacOSX
- datatable-0.7.0-cp37-cp37m-macosx_10_7_x86_64.whl (for Python 3.7)
- datatable-0.7.0-cp36-cp36m-macosx_10_7_x86_64.whl (for Python 3.6)
- datatable-0.7.0-cp35-cp35m-macosx_10_7_x86_64.whl (for Python 3.5)
Source Distribution
- datatable-0.7.0.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.0

v0.7.0 — 2018-11-16

Added

Changed

Fixed

Download links