Add documentation for the HousekeepingInterval parameter and enforce validation for it. #3517
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In my testing Kubernetes environment, I observed that if the HousekeepingInterval of cadvisor is set very high by using --housekeeping-interval, for example, greater than 2 minutes, then obtaining container data via kubelet returns null.
Upon analyzing the cadvisor code, I found that this is due to the fact that the RecentStats of containers only contain one stat data in cadvisor's timed_store. https://github.com/google/cadvisor/blob/master/manager/manager.go#L529;
Consequently, it becomes impossible to calculate the CpuStats of containers, resulting in a nil value for stat.CpuInst(https://github.com/google/cadvisor/blob/master/info/v2/conversion.go#L192, https://github.com/google/cadvisor/blob/master/info/v2/conversion.go#L234);
When kubelet aggregates metrics data for all containers on a node, it filters container data. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cadvisor_stats_provider.go#L97 ; Due to stat.CpuInst=nil,kubelet considers the container is Terminated, https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cadvisor_stats_provider.go#L399 ,resulting in a null value for 'containers' in the final.
The reason why RecentStats of containers in cadvisor's timed_store only contain one data point is because the HousekeepingInterval set by --housekeeping-interval exceeds the expiration period of data in timed_store, In kubelet, this expiration period is defined as 2 minutes by default so the memoryCache's maxAge is 2minutes.
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cadvisor/cadvisor_linux.go#L55 ;Therefore, each time housekeeping polls to write data to timed_store, previous data has already expired and been deleted. https://github.com/google/cadvisor/blob/master/utils/timed_store.go#L78
Based on this analysis, I believe we should add documentation for the HousekeepingInterval parameter and enforce validation for it, which is setted by --housekeeping-interval.