Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrays and objects in eq and neq processes #208

Closed
Ardweaden opened this issue Nov 5, 2020 · 15 comments · Fixed by #422
Closed

Arrays and objects in eq and neq processes #208

Ardweaden opened this issue Nov 5, 2020 · 15 comments · Fixed by #422
Labels
Milestone

Comments

@Ardweaden
Copy link
Contributor

Both eq and neq processes have the following expected behaviour:

If any operand is an array or object, the return value is false.

Wouldn't it make more sense to raise an exception in this case? I find it counterintuitive that the two processes, which are supposed to be complementary, return the same result.

@m-mohr
Copy link
Member

m-mohr commented Nov 6, 2020

Thank you for the feedback. I understand that this can be confusing or misleading. I will discuss this with our team, but it can be problematic to change this behavior as it would be a breaking change.

@soxofaan
Copy link
Member

Why is that array or object rule there in the first place? Is it to avoid defining how to traverse nested structures?

I agree that it would be better to allow a backend to throw some kind of error when it doesn't want to compare possibly nested structures. Note that this would only happen when both operands are array/object. You can easily do eq and neq comparison between for example an int and an array in a complementary way, without having to even start traversing the array.

@m-mohr
Copy link
Member

m-mohr commented Nov 12, 2020

We didn't want to dig into describing how arrays and objects have to be compared and traversed, yes. That's a rather complex topic and seems to be handled pretty differently across environments and as such the desire was to keep the process simple by just disallowing non-scalar values but at the same time not interrupting with errors if something "unexpected" occurs. I mean I would not expect an exception from an if in Python or JS...
I think eq itself is reasonable as it is, but not so sure about neq...

@GreatEmerald
Copy link
Member

From a user's perspective, I'd expect a missing value as a result in that case, for both processes.

@m-mohr
Copy link
Member

m-mohr commented Nov 12, 2020

From a programmer perspective, I'd find anything else than boolean values confusing. That's probably what influenced the current situation most. While I think eq([1], 123) => false makes sense, I agree with @Ardweaden that neq([1], 123) => false is strange and should maybe change to true, but not overly convinced, too.

@soxofaan
Copy link
Member

I mean I would not expect an exception from an if in Python or JS...

It's not the if that throws the error, it's the comparison, which is syntactic sugar for logic that can be non-trivial.
E.g. if you do if 3 > None: ... in python you get an exception TypeError: '>' not supported between instances of 'int' and 'NoneType'. I kind of prefer that the system is honest that it doesn't know to handle something instead of giving a fake answer.

I think eq itself is reasonable as it is,

That eq(x=[1,2], y=[1,2]) would evaluate to false is bit weird to me. (Note that temporal string handling does get "advanced" treatment: e.g. from the examples: eq(x = "00:00:00+00:00", y = "00:00:00Z" = true.)

I would prefer that it is optional for a backend to implement array/object traversal for comparison and just throws an error when traversal is too hard.
Note that even if the backend does not support traversal it can still give reasonable answers for eq([1], 123) or even eq([1], {"foo": "bar}) because it can detect inequality (based on data type) before it has to start traversing.

@soxofaan
Copy link
Member

soxofaan commented Nov 12, 2020

also note that inconsistent definition of equality operators (on arrays/objects) will be annoying for testing purposes like in #204

@m-mohr
Copy link
Member

m-mohr commented Nov 12, 2020

Interesting points, @soxofaan. I wasn't aware Python can actually throw errors for comparisons. I haven't seen that before. So then I guess the cleanest solution would be to go with an exception. That is a breaking change though and as such needs to go through PSC vote. Do we think we have enough good reasons for that? Would anybody be up to write the PSC proposal?

I would prefer that it is optional for a backend to implement array/object traversal for comparison and just throws an error when traversal is too hard.

But then of course you need to define in the process how that should be done, right? It's not a good idea to leave it up to implementation and then get different results across back-ends depending on how they prefer to compare. Just for reference, here's some cases one would need to think about in JS: https://github.com/denysdovhan/wtfjs#array-equality-is-a-monster ;-)

@GreatEmerald
Copy link
Member

From a programmer perspective, I'd find anything else than boolean values confusing. That's probably what influenced the current situation most. While I think eq([1], 123) => false makes sense, I agree with @Ardweaden that neq([1], 123) => false is strange and should maybe change to true, but not overly convinced, too.

Not from the R programmer perspective:

> 5 == list("a", "b")
[1] NA NA
Warning messages:
1: NAs introduced by coercion 
2: NAs introduced by coercion

@m-mohr
Copy link
Member

m-mohr commented Nov 12, 2020

So many insights into different programming languages at the moment, wow. So we have the issue that different programming languages do different things, and I'm obviously influenced by other languages than R and Python. So now the question is, what is the basis we want to go for in openEO? We can't suit all languages, it will be either exceptions, null (NA) or boolean.

@soxofaan
Copy link
Member

some cases one would need to think about in JS: https://github.com/denysdovhan/wtfjs#array-equality-is-a-monster

it is well know that equality checking in JS is not for the faint of heart 😄

it will be either exceptions, null (NA) or boolean.

about the null option: what does it mean?

  • result is always null when one of the operands is array/object, regardless of backend capabilities
  • result is null when backend implementation is undecided (e.g. type checking is supported while traversal is not; or first level traversal is supported, but deeper levels are not, ...)

The advantage of exceptions is that it more clearly communicates that it is a backend specific "limitation", and it allows a backend to improve their implementation incrementally without changing/breaking the working part of the API.

@m-mohr
Copy link
Member

m-mohr commented Nov 12, 2020

about the null option: what does it mean?

Open to discussion, but there are two things we need to decide on:

  1. Do we allow back-ends to implement array/object comparison? If yes, do we allow to "opt-out" via exceptions?
  2. What do eq and neq return for arrays/objects?
  • result is null when backend implementation is undecided (e.g. type checking is supported while traversal is not; or first level traversal is supported, but deeper levels are not, ...)

And it gets complicated. That's why it has not been considered yet, especially as it makes writing a test suite #204 much harder.

The advantage of exceptions is that it more clearly communicates that it is a backend specific "limitation", and it allows a backend to improve their implementation incrementally without changing/breaking the working part of the API.

Indeed, if we go the complex route, exceptions need to handle it, otherwise results are not comparable.

But: I think we should not make this process overly complex and just define array_equals if that is something that users need. And object support is very limited at the moment anyway (we can't even get a value from an object by key), so I don't see why we should allow comparing it deeply.

@soxofaan
Copy link
Member

And it gets complicated.
But: I think we should not make this process overly complex

I agree that a backend's implementation can be complicated (although I would not overestimate this), but that does not mean that the process spec would be complicated.

If we limit the scope to just arrays (support for objects is indeed not that useful at the moment), it's not that hard to describe how array equality (and inequality) is defined: arrays have same length and all pairwise corresponding array items are equal.

and just define array_equals if that is something that users need.

I wonder if defining a dedicated array_equal would reduce complexity. You still have to define the rules, but across different processes (traversal in array_equal and item quality in eq, unless the array items are arrays too ...). And if the need rises to also define an object equality process (e.g. object_equal) it becomes even more complicated.

I just think that the abstraction level of openeo as a programming language allows to work with a single eq process, without the need for C-style type-dependent variants. But that might be my Python background speaking, where the == operator does the right thing out of the box (unlike JS 😛 )

@m-mohr
Copy link
Member

m-mohr commented Nov 19, 2020

The advantage of separate processes is that you can easily detect whether something is supported or not. The other option would be to just remove array/object from the list of supported types in the parameter schemas.

I guess I'd be okay to implement it for arrays and define exceptions for objects, but this would need to go through PSC anyway.

@m-mohr m-mohr added this to the future milestone Dec 22, 2020
@m-mohr m-mohr modified the milestones: future, 2.0.0 Mar 8, 2023
@m-mohr
Copy link
Member

m-mohr commented Mar 14, 2023

Proposal: Restrict the schema to only allow scalar values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants