-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default execution mode should be asynchronous. #413
Comments
@pvretano UNSAFE play? Disagreeing with this, for the reasons we discussed at length previously. If I recall correctly, the main reason we opted for synchronous by default is that the fact that the client provides a The RFC for the Prefer: header does not include a mechanism to indicate sync execution (there is only Changing how this works would definitely be a major breaking change between the two versions, which I hope we would avoid or it will really be a transition nightmare. The OGC API - Processes async mode is significantly more complicated to implement from both the server and the client perspective. From the server side, it requires implementing a complex queuing mechanism, results storage and persistence, etc. Sync execution on the other hand is super simple. It's handling / making a POST requests stating what the client wishes to execute and handling the response. It is much more similar to the typical OGC API GET requests. I believe it's actually easier to implement a relatively safer sync execution than the async equivalent. An async service can be implemented as an additional layer on top of a simple sync service. Async / batch processing is still way overrated in my opinion. The sync mode is the simple thing. I hope we can keep the default the simple way :) |
@jerstlouis requirement 26, part C covers the case where NO perfer headers is specified AND the process can be executed either sync or async. In this specific case I think the better default is async. Otherwise you risk a long running process timing out the HTTP connection. |
@pvretano The reason we had requirement 26 C:
which is equivalent to 25 C in the approved/published 1.0 version is because it would otherwise be impossible for a client to tell the server "I want to execute this process synchronously". Omitting the Prefer: header is how the processing clients can currently do that with the approved standard. Without this, it would be impossible to write synchronous-only execution clients that can work with servers that also offer optional async support. Synchronous-only clients are much easier to write and get working, from past code sprints and testbed experience.
In my opinion, this is a non-issue. The server can decide how long it wants to keep processing something. At any point, both ends can decide to give up and interrupt the connection. This is actually a good thing in terms of avoiding to hog up the resources from the server standpoint, because the server actually knows that there is an actual client still waiting patiently for a response or not, and can easily limit the number of active connections from a particular client (which is much simpler than managing a processing queue, and can be done with readily-available tools like an Apache server proxy). With async, a malicious client could just decide to queue up tons of processing requests and never care about them. Interesting releated comment on Reddit:
If a process is inherently always going to take a long time and the server doesn't want to have a long time out, then it can choose to only offer the async option. If a process sometimes takes a small amount of time and sometimes a very long amount of time based on the execution request, then the server can estimate how long the processing would take, and refuse to process it synchronously in the case it takes too long, with a 400 error that says to the client:
I believe this is the best solution to address your concern. |
SWG meeting from 2024-05-27: We keep it as it is (synchronous), as otherwise there would be no way for a client to tell servers that synchronous execution is requested. |
@pvretano @jerstlouis @bpross-52n Sorry I could not be part of today's meeting, but I strongly disagree with this decision and do not think this issue should be closed. Previous iteration of the standard had a lot of considerations about letting the server decide what is appropriate between sync/async (at some point, even the order of the supported execution modes in the process description defined the default). Sure, the synchronous implementation might seem easier to implement from server/client side when dealing with "simple" processing like NDVI or applying a AoI/ToI/RoI filter to a collection, but it makes absolutely NO sense when the processing is much longer or to handle limited-resource/high-demand situations, since servers can close the HTTP connection due to timeouts or need to wait for the resource to become available in a queuing system. In such cases, the async operation is much more appropriate. I could argue in my typical use cases that async is easier to implement since a monitoring Job is always desired... and leaving a connection open indefinitely is a misuse of resources, which most servers will avoid by defaulting between ~5/60s timeout according to use cases. Note that I am not advocating for async by default either. I believe this flexibility of execution mode is an advantage from the processes API. I would also like to remind that, according to I also disagree with the statement:
If a server allows it, it is perfectly valid to indicate
Sometimes, the operations are not themselves long, but the resources to compute them are insufficient to respond to the demand. Sometimes, there is simply no way to estimate that duration, as it depends on external factors the server does not have access to, such as the time of a deployed process the server cannot know how long it will take to complete. In some cases, the server still wants to respect the submission order of the requests, and not only depend on luck for when an execution request will succeed or not. Believing that async is used only for long-running jobs is a vast oversimplification of the use cases.
I would like to make sure OGC API - Processes does not evolve according to this kind of mentality. If the operations is "so simple" that it can be performed by a simple GET request for a one-off operation, maybe it is an indication that a dedicated endpoint for that relevant operation should be something else than a OGC API - Processes, since it does not really justify the whole Process Description overhead. IMO, implicating a Process Description and potentially the creation of a Job operation to monitor it implies that the operation is "complicated enough" to need its own standalone definition rather than a generic OpenAPI endpoint schema. In many cases, these "complicated processes" could run with particular requirements that perfectly justifies async, but that could also support sync execution if requirements were available at the time the request was submitted. OGC API - Processes should never disregard these use cases, especially with the increasing demand for AI/ML algorithms implicating ever increasingly complex operations, varying demands and resource requirements. |
Thanks a lot for engaging in this discussion. I enjoy our thoughtful discussions, even though others might find them too long to read ;)
This is where I disagree, because I believe there is value in developers being able to quickly put together an OGC API - Processes client that supports only sync, for those servers/processes that do support sync only requests.
The difference is that the server is not obligated to respond sync in this case, unlike when the
The way we kind of avoided deviating from the RFC in Processes 1.0, is that when the client does not use the Prefer: header, the RFC is not involved, and the default (if the process supports sync execution) is synchronous execution.
The way I would suggest to address this, is an explicit permission for the server to refuse synchronous execution (execution request submitted without a
The idea with collection output (which very much in line with the concept of GeoDataCubes), is that you can describe the workflow once, and the follow-on requests for a particular AoI/ToI/RoI are really just a simple get requests (e.g., using OGC API - Coverages, Tiles, DGGS, Features, Maps, EDR...). The description of the workflow can be of any level of complexity, and may combine several inputs and chains of other workflows/processes, whether local or distributed, so this does justify the use of OGC API - Processes. But the requests for partial results are very simple and potentially also very quick to process due to their limited ATRoI.
I hope we can prototype example OpenAPI version of the Process description as an alternative/complement to the OGC process description. Regardless of how complicated the process is, the inputs and outputs need to be described and I think that is what both of those approaches do.
The need for job monitors is when we cannot avoid lengthy batch processes. |
Guys, you are killing me with these LONG, LONG comments! ;) This particular issue is about a client POSTing an execution request without an accompanying I was proposing that the server should respond ASYNC since, without knowning how long the process might take to run, ASYNC was the safer approach. @jerstlouis is proposing to leave it as is. @fmigneault, I think, is saying remove this requirement altogether and instead let the server ALWAYS decide the execution mode base on its internal knowlege of the process, the execution enviroment, the available resources, etc. The client can express a PREFERENCE via the So do I remove Req26? I'm leaning towards yes. This, of course, then begs the question ... do we need to bother with job control metadata in the process description at all? |
@pvretano I really believe we need to keep the requirement as-is. I'm proposing to address @fmigneault 's use case by adding a permission clarifying that the server MAY return a 503 if it's not able to handle the request synchronously right now due to limited resources or a 413 if it's not able to due to the client asking to process too much synchronously, including a verbose hint that the client should re-submit the request with a |
@jerstlouis I think @fmigneault proposal is simpler and pretty much what Req27 says right now. In the case where the process can run sync or async the server picks and makes its decision known via the |
As I said above:
This loses that, and breaks compatibility with 1.0.
I also still believe we should aim for as much compatibility with 1.0 as possible, given that so far everything is very, very close to full compatibility in terms of existing 1.0 clients being able to execute processes from 1.1/2.0 servers. |
@jerstlouis assuming we keep the job control options in the process description (see my previous question to the group) then you can still write such a simple OAProc client. This client simply needs to search for processes in the process list that can only run synchronously and ignore all the other ones. No? As for compatability, nice to have but not 100% necessary since we are targeting 2.0 ... no? |
Currently, Processes 1.0 also makes this possible for processes that support both sync & async, this change limits this to those that support only sync (and the server adding support for async to those processes later will suddenly breaks those sync-only clients).
I was still hopeful that this could still be a 1.1 version, at least in terms of not breaking this existing compatibility, even if not reflected in the actual version number. |
Exactly as @pvretano described. Perfectly understood by thought process. |
Is there any way in OGC API - Processes that I can enforce synchronous execution and if it's not supported or to complex to run synchronously, it just returns an error? Similarly, is this possible to asynchronous execution? I'm thinking of two use cases here:
|
2 part answer depending on the version of the standard BeforeYes, completely, using CurrentNot guaranteed depending on the description. The same rule about mismatching I have raised this concern on multiple occasions, across multiple issues. The only "real" way to enforce a mode currently while respecting all standard revisions and HTTP simultaneously, is to only indicate a single mode in |
To clarify, the "Before" version with "mode" that @fmigneault is referring to was pre-1.0 and the "Current" version is 1.0 (which already replaced the use of "mode" by the According to 1.0, when NOT including the The reverse (both sync and async as an option for the process, with a |
The issue is that the server has the option to ignore the client's preference, meaning that the behavior cannot be predetermined in all circumstances. This makes the implementation of async handling much harder by clients. It is fine to have defaults being sync, since it is a simpler use case, but the async mode should be handled just as fairly and reliably, not twice as hard to achieve. A "good reason" could be as simple as a server having a limited amount of resources to store jobs, for which it always tries to return the result synchronously when it could be executed fast enough to save space, but "is forced" to fall back to async when an input resource it must wait for cannot be ready in time for it to execute before server timeout was reached. Since it could respond in either way, the process description MUST indicate both modes in If a server was behaving this way, I could not blame the implementer that they do not follow the standard, as they technically be right. They would be allowed to ignore my preference. This makes it a bigger burden for my client integrating their server, as I must always try to deal with any possible outcome. |
It sounds like the Prefer header is not the right solution for what is needed. Don't use it? Splitting the endpoints might be the better option, see #419 |
Yes, because it's only a recommendation, the client do technically need to be ready to handle sync responses as well, even when they submit a Different end-points as @m-mohr suggested might have been a simpler solution side-stepping those problems, but possibly because some implementations also create jobs for synchronous execution, the SWG had not considered that at the time. Now I believe that the SWG mostly wants to avoid breaking changes as much as possible and finalize 1.1/2.0. |
💯 agreed. This is a strong requirement. I only wished there was a way to "force" a certain mode for certain edge cases where it is critical that it is respected. In these few cases were sync/async is a "must" for whatever reason, I would actually prefer receiving an unprocessable request error or similar over generating a possibly long/heavy-resource execution that will not be handled accordingly. I've actually just came across this: |
Hi,
See below a change proposal for asynchronous communication that we did in OGC Testbed 18 (Secure Asynchronous Catalog) : https://docs.ogc.org/per/22-018.html#toc28
I hope it helps.
OGC API-Notification Service
Propose Webhooks for an OGC API-Notification Service specification for other services besides Records which should work with the PubSub SWG.
8.1.3. Notification Content
The subscription prototype backend performs a periodic check as to whether the response corresponding to the resources-uri in the subscription object is identical or not to the original response. If not, then the same resources-uri is sent to the subscriber according to the delivery mechanism that was selected.
If the resources-uri corresponds to a single item (e.g., /items/{item-id}), then this means that the record content has changed. If the resources-uri corresponds to a search request (e.g., /items?…), then this means that the (first result page) of the response has changed.
A number of improvements are possible.
· In the case of a search request, changes should be detected that affect not only the first result page (e.g., first n results and information about total number of hits). A deletion followed by an insertion, keeping the total number of results identical, may not be detected.
· In case resources-uri corresponds to a search request, the client should have the option to receive the actual changes, instead of a repetition of the full set of results. This is particularly useful for very large collections or searches with many results. A mechanism to receive information about deleted records is required in particular when the notification mechanism would be applied for supporting incremental harvesting. A mechanism similar to the Atom deleted-entry Element [RFC-6721<https://www.rfc-editor.org/rfc/rfc6721>] might be applied. The Open Archive Initiative Protocol for Metadata Harvesting [OAI-PMH<https://docs.ogc.org/per/22-018.html#oai-pmh>], which is still widely used, has support for deleted records<https://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords>.
· When a resource-uri corresponds to a search result, e.g., “all records with datetime overlapping with date-1 and date-2”, it might be useful to indicate in a subscription that the dates are to be interpreted as relative with respect to the actual time the subscription check is executed, instead of interpreting dates always as absolute dates.
8.1.4. Asynchronous Patterns for OGC API Common
A significant topic for future activities should be exploring how asynchronous communications can be addressed at the OGC API Common level. In the context of other APIs, asynchronous communications must address a wider range of concerns than for OGC API Records, as show in the following examples.
· A server might handle requests asynchronously for preparing the results (single response), or in response to certain events (multiple responses).
· A client should specify the preferred behavior for retrieving the delayed response(s) either actively (polling pattern) or passively (push pattern requiring a listener endpoint).
The next subsections propose patterns to address general OGC API concerns related to asynchronous communications.
· The asynchronous pattern for delayed response provides a simple solution to query (poll) a URL until the response is ready.
· The callback pattern for delayed response complements the asynchronous pattern for specifying an endpoint for receiving (push) the response(s).
· The OGC API Common Asynchronous requirement class is a draft that generalizes the API records asynchronous solution relying on jobs resources.
8.1.4.1. Asynchronous Pattern for Delayed Response
Implementations of OGC APIs typically use a communication protocol pattern based on a single socket pair for sending and receiving an HTTP(S) request / response. The handling of slowly resolved queries that are not ready within the regular HTTP time out (i.e., roughly 30 seconds) can be addressed using a simple communication pattern (not implying the subscription resources solution detailed earlier).
The proposed standard RFC 7240<https://docs.ogc.org/per/22-018.html#rfc7240> “Prefer Header for HTTP” recommends using the ‘Prefer’ header to indicate the server behavior preferred by the client. As already adopted by OGC API Processes, the “respond-async” preference indicates that the client prefers the server to respond asynchronously to a response. In addition, the value respond-async, wait=10 is a hint to the server that the client expects a maximum of 10 seconds to return the response following the traditional synchronous pattern.
RFC 7240<https://docs.ogc.org/per/22-018.html#rfc7240> mentions that the server honoring the “respond-async” preference should return a 202 (Accepted) response as per HTTP 1.1 (RFC 7231). However, the actual behavior is unspecified, and little guidance is provided by code 202 (“The representation sent with this response ought to describe the request’s current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled”). For the above reason, some Testbed participants also proposed considering the alternative code 303: See Others. “See Other” is a way to redirect web applications to a new URI, particularly after a HTTP POST has been performed.
Independent of the response code, the server should return a Location header (and an optional Retry-After header) holding a link to the target requested resources. The link to the target resources can monitor the status of the previous request until the resources are available. The structure of the monitor response is custom as it needs to fit the purpose. But, to ensure that the caller is not too proactive, the server may throttle the caller via 429 (too many requests: Retry-after).
***@***.***Figure 36 — Asynchronous Pattern for Delayed Response
8.1.4.2. Callback Pattern for Delayed Response
Complementary to the asynchronous pattern described above, the request might be extended to submit an endpoint URI for receiving callback messages that contain the delayed response of the server. Indeed, as adopted by OGC API Processes, OpenAPI 3.0 provides a callback (push-based) mechanism where a subscriber-URL is passed to the API in the request. Once the resources are available, the result response is sent to the specified URL.
OpenAPI supports specifying the placeholders of the callback URI’s (potentially for a set of defined events) submitted in the request, and to define the schemas of the callback messages which must be specified in the callbacks property of the related definition of the OpenAPI operation. An example is shown below.
callbacks:
completed:
'{$request.header.Prefer}/callbackURI':
post: # Method
requestBody: # Contents of the callback message
…
responses: # Expected responses
…
Figure 37 — Generic OpenAPI Definition of the callback
For expressing the callback endpoint (and options), Testbed participants highlighted one simple approach taking advantage of the Prefer header. The header might be extended with a callback token holding the (single) endpoint for callback messages. Also, in case of multiple results (based on particular events), the frequency preference can be provided in a schedule token holding a unix-cron value.
8.1.4.3. OGC API Common Asynchronous Requirement Class
The proposed approach for a generic asynchronous requirement class relies upon the typical use of HTTP code 202<https://restfulapi.net/http-status-202-accepted/>. A job resource is created to monitor the execution of the request. Note that the approach is very similar and reuses most concepts from the OGC API Processes.
Recommendation 21
Label
/rec/core/process-execute-honor-prefer
A
If a request is accompanied with the HTTP Prefer<https://datatracker.ietf.org/doc/html/rfc7240#section-2> header asserting a respond-async<https://tools.ietf.org/html/rfc7240#section-4.1> preference, then the server should honor that preference and response asynchronously.
B
If a request is accompanied with the HTTP Prefer<https://datatracker.ietf.org/doc/html/rfc7240#section-2> header asserting a wait<https://tools.ietf.org/html/rfc7240#section-4.3> preference, then the server should honor that preference in the decision to execute the process asynchronously.
C
If a request is accompanied with the HTTP Prefer header, then in the response, servers should include the HTTP Preference-Applied response header as an indication as to which ‘Prefer` tokens were honored by the server.
Recommendation 22
Label
/req/async/response
A
If a request is executed asynchronously, the server should respond with an HTTP status code of 202. The server should return a Location header (and an optional Retry-After header) holding a link to the job monitoring the processing of the request.
Requirement 21
Label
/req/async/job
A
The server shall support the HTTP GET operation for retrieving a long-running asynchronous job at the path /jobs/{jobID}.
B
A successful execution of the operation shall be reported as a response with a HTTP status code 200. The content of that response shall be based upon the OpenAPI 3.0 schema jobStatus.yaml.
The jobStatus schema is illustrated on the class diagram below.
***@***.***Figure 38 — JobStatus Schema
Recommendation 23
Label
/req/async/prefer-callback
A
If a request is accompanied with the HTTP Prefer<https://datatracker.ietf.org/doc/html/rfc7240#section-2> header asserting a callback preference (endpoint URI), then the potential asynchronous response(s) should be pushed as a callback message delivered to the provided callback endpoint URI.
B
If a request is accompanied with the HTTP Prefer<https://datatracker.ietf.org/doc/html/rfc7240#section-2> header asserting a callback preference and a schedule UNIX-cron value, then the potential asynchronous response(s) should be pushed in respect to the submitted schedule.
The resulting sequence diagram is provided below.
***@***.***Figure 39 — Generic Asynchronous Job Sequence Diagram
The status values defined in OGC API Processes are clarified below in the context of sequential results updates (on purpose) managed by the asynchronous job.
Requirement 22
Label
/req/async/job-status
A
The status of a job shall be accepted if the asynchronous job request is valid and has been queued for execution.
B
The status of a job shall be running if the provided start time has been reached.
C
The status of a job shall be failed if the asynchronous job request is not valid or if the processing of the request raised an error that prevented its completion.
D
The status of a job shall be dismissed if the asynchronous job has been dismissed through a HTTP DELETE request.
E
The status of a job shall be successful if the asynchronous job has completed or the end time has been reached.
From: Matthias Mohr ***@***.***>
Sent: Friday, 21 June 2024 9:20 PM
To: opengeospatial/ogcapi-processes ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [opengeospatial/ogcapi-processes] Default execution mode should be asynchronous. (Issue #413)
Is there any way in OGC API - Processes that I can enforce synchronous execution and if it's not supported or to complex to run synchronously, it just returns an error?
Similarly, is this possible to asynchronous execution?
I'm thinking of two use cases here:
* Rapid web visualization via synchronous execution (e.g. for web mapping, which just doesn't work effectively with batch jobs).
* Creating large result sets e.g. a result as STAC catalog (I'd probably never want to parse that from a single HTTP response)
—
Reply to this email directly, view it on GitHub<https://clicktime.symantec.com/15siL2oUbxEJXKFUoN3Hm?h=q45iUO4CRZ_DPlGkn_7s7GfBlMfj4TJQoJb2E7Tunhw=&u=https://github.com/opengeospatial/ogcapi-processes/issues/413%23issuecomment-2183323562>, or unsubscribe<https://clicktime.symantec.com/15siFCcC9LYi7NRZFoe99?h=bxbWhlgnBeMQubJXmFynKCgY_-FAKvEDXIjzhPNoJ6U=&u=https://github.com/notifications/unsubscribe-auth/ACFG335KCJQZ7TO62HLQS7LZIR4FTAVCNFSM6AAAAABIKEQE7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBTGMZDGNJWGI>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.******@***.***>>
…------------------------------------------------------------------------------
E-MAIL DISCLAIMER
The present message may contain confidential and/or legally privileged information. If you are not the intended addressee and in case of a transmission error, please notify the sender immediately and destroy this E-mail. Disclosure, reproduction or distribution of this document and its possible attachments is strictly forbidden.
SPACEBEL denies all liability for incomplete, improper, inaccurate, intercepted, (partly) destroyed, lost and/or belated transmission of the current information given that unencrypted electronic transmission cannot currently be guaranteed to be secure or error free.
Upon request or in conformity with formal, contractual agreements, an originally signed hard copy will be sent to you to confirm the information contained in this E-mail.
SPACEBEL denies all liability where E-mail is used for private use.
SPACEBEL cannot be held responsible for possible viruses that might corrupt this message and/or your computer system.
-------------------------------------------------------------------------------
|
@pvretano The issue is rather about the default handling of In other words, if a process supports both sync/async, the following can happen:
The biggest issue is that One way to address this, as mentioned in #413 (comment), is to use |
Requirements or mention of My understanding of the requirements relating to the Although this may seem redundant when executing a process supporting only async, a client expecting the process to always be executed async, even in the event that the server introduces new sync support for that same process, should always include the |
@jerstlouis The async execution might be preferred to avoid cases where, depending on input dimension, we are always threading that fine line between closed connection or not. Using async, we would not have to worry about encountering that case no matter which input dimension is submitted. However, the process itself could very well work fine in sync for a given smaller input submitted in other situations. I believe always using |
That is a separate requirement class that the server may or may not support thoug, right? This is the Callback requirement class? This is also a security vulnerability from the perspective of an open service accepting requests from anyone without authorization which can trigger the server making any URL request. And so is async in general compared to sync whereas you may limit the number of connections from a particular client, and you won't execute anything more from that client if it already has e.g., 5 processes waiting on sync exec requests. Whereas with async, a client may just queue thousands of requests one after the other. @pvretano We should probably add a mention in the Security Considerations about the Callback requirement class security vulnerabilities.
Increasing connection time outs may be one way to address that :) |
Even if it is defined in a separate requirement class, it is a valid use case. Since Core provides this as a valid mechanism, the standard must provide all necessary means to handle it without side effects from defaults. Maybe open a separate issue about the security concern to avoid diverging in this thread.
If anything, that is a bigger security concern. Great way to cause a server to DDoS. |
Will try implementing the suggested I'm proposing "408 Request Timeout" for cases where For the opposite case, where |
Using the I would like to know if we all agree to force asynchronous execution in OGC API - Processes - Part 4: Job Management when starting a job using POST on the If we agree on the previous point, I would like to propose adding the |
IMO, it should behave just like If the process indicates that it can only run synchronously, making async the default would cause it to always fail, and require explicitly adding the That being said, I strongly believe that the core issue remains, as it as been mentioned many times, that the Rec-25C somewhat contradicts what Rec-26C indicates in Execution mode. The server should have the option to decide its own default (aka the "auto" from the previous revision) when nothing was requested explicitly. That would allow openEO to use its own default (see below), and not force servers to run sync by default.
I believe openEO creates the job, but does not put it in queue until |
Requirements 26 item C says: "The server SHALL respond synchronously if, according to the job control options in the process description, the process can be executed in either mode."
I think this is the UNSAFE play. We should change this to ASYNCHRONOUS by default.
The text was updated successfully, but these errors were encountered: