Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read sync input as IEnumerable<ImmutableArray<string>> #21

Open
airbreather opened this issue Dec 14, 2019 · 6 comments
Open

Read sync input as IEnumerable<ImmutableArray<string>> #21

airbreather opened this issue Dec 14, 2019 · 6 comments
Labels
enhancement New feature or request

Comments

@airbreather
Copy link
Owner

airbreather commented Dec 14, 2019

On CsvSyncInputBase:

public IEnumerable<ImmutableArray<string>> AsEnumerableFromUTF8()
{
    return AsEnumerableFromUTF8(UTF8FieldDecodingParameters.Default); // CsvReaderVisitorWithUTF8HeadersBase.DefaultMaxHeaderLength
}

public virtual IEnumerable<ImmutableArray<string>> AsEnumerableFromUTF8(UTF8FieldDecodingParameters fieldDecodingParameters)
{
    // ...
}
@airbreather airbreather added the enhancement New feature or request label Dec 14, 2019
@airbreather
Copy link
Owner Author

Base implementation probably needs to spin up a thread so that the visitor methods can block while our consumer processes the results.

That said, I think every subclass can do a lot better by reimplementing their input reading logic.

@airbreather
Copy link
Owner Author

Hmm, a more flexible API would be to have just one virtual method that returns something that can be configured like how the input classes work.

I kinda want to limit the default max field length and let callers control it.

@airbreather
Copy link
Owner Author

Then again, I've come down elsewhere on the side of, "if you have some encoding other than UTF-8, then most helpers won't work for you". So only one parameter is needed.

@airbreather
Copy link
Owner Author

airbreather commented Dec 15, 2019

Ugh. The ideal would be IEnumerable<IEnumerable<ReadOnlySpan<char>>> so that the consumer can choose whether or not to do a .ToString() themselves, at no meaningful extra cost on top of us allocating the string.

But of course, ReadOnlySpan<char> is a ref struct, so I can't do exactly that. ReadOnlyMemory<char> can get captured and stored, so I can't safely pool allocations there.

Custom ref structs with GetEnumerator() can't be used with LINQ, and this feature comes down on the "productivity" side of things rather than the "bleeding-edge performance" side of things.

And IEnumerable<IEnumerable<string>> is slightly less productive than IEnumerable<string[]>, but the latter suggests a parameter to control the max number of fields in a record.

Ugh.

@airbreather
Copy link
Owner Author

So only one parameter is needed.

Ugh. Of course that's not true: we also have DecoderFallback.

Ugh.

@airbreather airbreather changed the title Read sync input as IEnumerable<IEnumerable<string>> Read sync input as IEnumerable<ImmutableArray<string>> Dec 18, 2019
@airbreather
Copy link
Owner Author

IEnumerable<ImmutableArray<string>> might make more sense than IEnumerable<IEnumerable<string>>?

Someone might pull the next element from the outer sequence before fully consuming the previous element from the inner sequence (which is really easy to do with .AsParallel), so the latter approach would require the outer enumerable's state machine to ensure that the current inner enumerable's records get buffered before moving on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant