Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add result accuracy to transcendental unary ops #2592

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions rfcs/20241015-result-accuracy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# [RFC] Add result accuracy to transcendental unary ops

Status: In Review<br/>
Initial version: 10/15/2024<br/>
Last updated: 10/15/2024<br/>
Discussion thread:

## Overview
This RFC proposes adding a new attribute `result_accuracy` to the following transcendental unary ops: `exp`, `expm1`, `log`, `logp1`, `logistic` and `tanh`.
`result_accuracy` allows the user to choose the implementation of these ops based on the accuracy they request. The choice of implementation is restricted to F32 inputs.

## Background
Transcendental ops can have multiple implementations on a processing unit that vary in accuracy and performance.
By allowing users to select implementation based on accuracy per op, we can offer more tools to weigh tradeoffs between accuracy and performance and also ensure more consistent numerical behaviors across different devices.

### Caveats on targets
The proposed result_accuracy attribute can only be supported for targets that have multiple implementations of the ops.
For example, for XLA-CPU, the implementations of these ops are dependent on LLVM and each target may have a different implementation with a different level of accuracy.
Thus, further analysis needs to be done to support this feature on CPUs. This analysis could be performed while building the compiler, assuming the compiler will only be used on a single type of CPU.

## Proposed Specification

### `result_accuracy`
hanrach9 marked this conversation as resolved.
Show resolved Hide resolved
The users can specify the worst case numerical error they can tolerate in terms of absolute, relative and ULP (unit in last place) errors. If they don't care about the numerical accuracy, they can also choose the implementation using `mode`. We propose a new attribute `result_accuracy`.
`result_accuracy` can be any combination of the following numerical tolerances `atol`, `rtol`, `ulps` or an enum of `HIGHEST`, `DEFAULT` or `TOLERANCE`. `TOLERANCE` enum is a default placeholder value for `mode` when the numerical tolerances are used. When using the numerical tolerances, at least one of atol, rtol or ulps should be specified.

|Name |Type |Constraints |
|---------|-----------------------|-----------------|
|`atol` |APFloat::IEEEdouble() | `atol >= 0` |
|`rtol` |APFloat::IEEEdouble() | `rtol >= 0` |
|`ulp` |int64_t | `ulp >= 0` |
|`mode` | EnumAttr | `HIGHEST`, `DEFAULT`, `TOLERANCE` |



```
New Attribute:
#stablehlo.result_accuracy<atol, rtol, ulps, mode=ResultAccuracyModeAttr>

New Enum:
ResultAccuracyModeAttr ::= DEFAULT, HIGHEST, TOLERANCE
```

The default values are set as follows:

```
#stablehlo.result_accuracy<atol, rtol, ulps, mode=EnumAttr>

Case1: I want DEFAULT
#stablehlo.result_accuracy<atol=0, rtol=0, ulps=0, mode=DEFAULT>

Case2: I want HIGHEST
#stablehlo.result_accuracy<atol=0, rtol=0, ulps=0, mode=HIGHEST>

Case3: I want numerical tolerance X
#stablehlo.result_accuracy<atol=X, rtol=X, ulps=X, mode=TOLERANCE>

(C1) if mode != TOLERANCE: atol = rtol = ulps = 0
```

The numerical tolerances will be compared against the compiler errors according to the following inequality:

`abs(expected(x)-actual(x)) <= max(abs(expected(x))*max(rtol, ulps*epsilon), atol)` for all x, where `epsilon` is the machine epsilon.

The inequality will be checked against the errors of each implementation and the one that can satisfy the constraint will be returned. If multiple implementations satisfy the inequality, the faster implementation will be used. If none of the implementations can meet the requested tolerance, the compiler will return an error.

### Unary Ops
The supported ops listed above will be enhanced to support this field. The result accuracy can appear on IR as follows:

```
stablehlo.exp %arg0, result_accuracy = <ulps = 1> : ...
stablehlo.tanh %arg0, result_accuracy = <mode HIGHEST> : ...
stablehlo.logistic %arg0, result_accuracy = <atol=1e-4, rtol=1e-6 > : ...
```
Loading