Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44575: [C#] Replace LINQ expression with for loop #44576

Merged

Conversation

georgevanburgh
Copy link
Contributor

@georgevanburgh georgevanburgh commented Oct 30, 2024

For code which repeatedly access columns by name, this LINQ expression can form part of the hot path. This PR replaces the LINQ with the equivalent for loop, and should preserve all existing behaviour (return -1 in the event of no match).

I ran a quick benchmark to validate the speedup

[MemoryDiagnoser]
public class ColumnIndexerBenchmark
{
    private readonly RecordBatch _batch;

    public ColumnIndexerBenchmark()
    {
        var builder = new Schema.Builder();
        builder
            .Field(new Field("A", Int32Type.Default, true))
            .Field(new Field("B", Int32Type.Default, true))
            .Field(new Field("C", Int32Type.Default, true))
            .Field(new Field("D", Int32Type.Default, true))
            .Field(new Field("E", Int32Type.Default, true))
            .Field(new Field("F", Int32Type.Default, true))
            .Field(new Field("G", Int32Type.Default, true))
            .Field(new Field("H", Int32Type.Default, true))
            .Field(new Field("I", Int32Type.Default, true))
            .Field(new Field("J", Int32Type.Default, true));
        var schema = builder.Build();
        _batch = new RecordBatch(schema, new IArrowArray[schema.FieldsList.Count], 0);
    }

    [Benchmark]
    public void GetColumnByIndex()
    {
        _batch.Column("H", StringComparer.Ordinal);
    }
}

Some numbers from my machine

BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5011/22H2/2022Update)
13th Gen Intel Core i7-13800H, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.306
  [Host]     : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
Method Mean Error StdDev Gen0 Allocated
GetColumnByIndexLinq 67.84 ns 1.178 ns 1.102 ns 0.0107 136 B
GetColumnByIndexForLoop 9.428 ns 0.1334 ns 0.1114 ns - -

In theory, we could achieve a greater speedup by maintaining a lookup of column names to ordinals. We already have several lookup structures inside Schema, but none of them provides access to ordinal values. However, the speedup from adding another mapping might not warrant adding yet another lookup structure to Schema.

If merged, will close #44575.

For code which repeatedly access columns by name, this LINQ expression
can currently form part of the hot path. Replace the LINQ with the
equivelent for loop.

Some numbers from my machine

| Method                  | Mean     | Error     | StdDev    | Gen0   | Allocated |
|------------------------ |---------:|----------:|----------:|-------:|----------:|
| GetColumnByIndexLinq    | 67.84 ns | 1.178 ns  | 1.102 ns  | 0.0107 |     136 B |
| GetColumnByIndexForLoop | 9.428 ns | 0.1334 ns | 0.1114 ns |      - |         - |
Copy link

⚠️ GitHub issue #44575 has been automatically assigned in GitHub to PR creator.

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great improvement!

@CurtHagenlocher CurtHagenlocher merged commit 5e60823 into apache:main Oct 30, 2024
10 checks passed
@CurtHagenlocher CurtHagenlocher removed the awaiting review Awaiting review label Oct 30, 2024
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Oct 30, 2024
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 5e60823.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C#] Consider replacing LINQ expression when accessing columns by name
2 participants