Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve lazy performance of Data.Text.Lazy.inits #572

Merged
merged 3 commits into from
Mar 21, 2024
Merged

Improve lazy performance of Data.Text.Lazy.inits #572

merged 3 commits into from
Mar 21, 2024

Conversation

Lysxia
Copy link
Contributor

@Lysxia Lysxia commented Mar 14, 2024

Closes #562

The previous implementation, itself based on an earlier version of Data.List.inits, inherited the flaw that accessing the i-th element took quadratic time O(i²). This now takes linear time O(i) as expected.

The current version of Data.List.inits uses a banker's queue to obtain good performance when generating very long lists. For lazy text, consisting of a few big chunks, that benefit seems negligible. So I chose a simpler implementation.

Benchmarks included.

Quadratic growth before (the "2k" benchmarks take 4x the time of the "1k" benchmarks):

    Lazy.inits
      last 1k:      OK
        9.19 ms ± 475 μs
      last 2k:      OK
        46.7 ms ± 1.5 ms
      map-take1 1k: OK
        8.99 ms ± 798 μs
      map-take1 2k: OK
        47.0 ms ± 1.8 ms

Linear growth after (the "2k" benchmarks take twice the time of the "1k" benchmarks):

Lazy.inits
      last 1k:      OK
        46.8 μs ± 3.7 μs
      last 2k:      OK
        94.6 μs ± 1.8 μs
      map-take1 1k: OK
        66.7 μs ± 688 ns
      map-take1 2k: OK
        132  μs ± 6.2 μs

@Lysxia Lysxia force-pushed the inits branch 2 times, most recently from 8ccae28 to d3c20b2 Compare March 14, 2024 03:24
src/Data/Text/Lazy.hs Outdated Show resolved Hide resolved
Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is our test coverage good enough to generate lazy Text with multiple chunks?

benchmarks/haskell/Benchmarks/Micro.hs Outdated Show resolved Hide resolved
++ L.map (Chunk t) (inits' ts)
initsNE ts0 = Empty NE.:| inits' 0 ts0
where
inits' :: Int64 -- Number of previous chunks i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think number of chunks cannot exceed Int even on 32-bit machines.

Copy link
Contributor

@meooow25 meooow25 Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lazy structure so it can, given enough time. For instance, initsNE (cycle "text").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if we want to be precise to that extent it should be Integer ;)

@Lysxia
Copy link
Contributor Author

Lysxia commented Mar 16, 2024

Is our test coverage good enough to generate lazy Text with multiple chunks?

Yes, the Arbitrary TL.Text instance explicitly generates arbitrary chunks:

arbitrary = (TL.fromChunks . map notEmpty . unSqrt) `fmap` arbitrary

The previous implementation, itself based on an earlier version of
`Data.List.inits`, inherited the flaw that accessing the i-th element took
quadratic time O(i²). This now takes linear time O(i) as expected.

The current version of `Data.List.inits` uses a banker's queue to
obtain good performance when generating very long lists.
For lazy text, consisting of a few big chunks, that benefit seems
negligible. So I chose a simpler implementation.
(The Core was already good but, just to be safe.)
@Bodigrim Bodigrim merged commit f8f747b into master Mar 21, 2024
51 checks passed
@Bodigrim Bodigrim deleted the inits branch March 21, 2024 23:10
@Bodigrim
Copy link
Contributor

Thanks a ton!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A better Data.Text.Lazy.inits
3 participants