Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of pattern yyyy-ww does not change index every 7 days #50

Open
vesselinn opened this issue Jul 2, 2020 · 18 comments
Open

Usage of pattern yyyy-ww does not change index every 7 days #50

vesselinn opened this issue Jul 2, 2020 · 18 comments

Comments

@vesselinn
Copy link

Description
Hello,

For couple of months we are using your app to send log events to ELK,
but for some reason it is not changing weekly index.
What we see actually is that the index stay for more than 10 days and when it change the index it neither Sunday nor Monday.

Configuration
`

<Appenders>
	<Elasticsearch name="elasticsearchAsyncBatch">
		<RollingIndexName indexName="log4j" pattern="yyyy-ww" timeZone="UTC" separator="." />
		<ThresholdFilter level="ALL" onMatch="ACCEPT"/>
		<JacksonJsonLayout>
			<PooledItemSourceFactory poolName="itemPool" itemSizeInBytes="512" initialPoolSize="10000"
										monitored="true" monitorTaskInterval="10000" resizeTimeout="500">
				<!--<UnlimitedResizePolicy resizeFactor="0.6"/>-->
			</PooledItemSourceFactory>
			<JacksonMixIn mixInClass="come.project.elk.CustomLogEventJacksonJsonMixIn"
                          targetClass="org.apache.logging.log4j.core.LogEvent"/>
		</JacksonJsonLayout>
		<AsyncBatchDelivery batchSize="5000"
							deliveryInterval="3000">
			<IndexTemplate name="vegas" path="classpath:elasticLogIndexTemplate.json"/>
			<JestBufferedHttp serverUris="http://somehost:9200"
								connTimeout="500"
								readTimeout="30000"
								maxTotalConnection="40"
								defaultMaxTotalConnectionPerRoute="8"
								mappingType="_doc">
				<PooledItemSourceFactory poolName="batchPool" itemSizeInBytes="5120000" initialPoolSize="5"
											monitored="true" monitorTaskInterval="10000" resizeTimeout="500">
					<UnlimitedResizePolicy resizeFactor="0.70"/>
				</PooledItemSourceFactory>
			</JestBufferedHttp>
		</AsyncBatchDelivery>
	</Elasticsearch>
	
</Appenders>
  
<Loggers>
	<Root level="trace">
		<AppenderRef ref="elasticsearchAsyncBatch" />
	</Root>
</Loggers>
`

Runtime (please complete the following information):

  • log4j2-elasticsearch-jest-1.4.1.jar
  • log4j2-elasticsearch-core-1.4.1.jar
  • WebLogic Server 12c
  • JVM 8
  • OS: Red Hat 7.5

Additional context
Trying with daily or hourly pattern it is ok.

@rfoltyns
Copy link
Owner

rfoltyns commented Jul 2, 2020

Hi @vesselinn. I can't reproduce it. RollingIndexName (actually RollingIndexNameFormatter class) takes the time as-is from LogEvent.getTimeMillis(). I've just unit tested the weekly rollover. 2 events everyday: 1 daily, second one with random time every day just to try to 'confuse' the formatter, from January 2020, 3 years ahead. It rolled over exactly every 7 days, calculating index name correctly every day. Could you provide me with a test that can reproduce this behavior?

Are the logs in your indices have correct timeMillis values? Given that you can create an index pattern per index, you should be able to verify that documents are indexed for more than a week in a single index.

@vesselinn
Copy link
Author

Hi @rfoltyns. Let me give you examples from our kibana:

  • First log with index 2020-26:

Date: 2020-06-22 14:59:07.986
timeMillis: 1592827147987

  • Last log with index 2020-26:

Date: 2020-07-04 23:59:13.271
timeMillis: 1593896353271

  • First log with index 2020-27:

Date: 2020-07-05 00:00:13.572
timeMillis: 1593896413572

Thanks

@rfoltyns
Copy link
Owner

rfoltyns commented Jul 7, 2020

How consistently does that happen? Does it happen right after the deployment or after few weeks of JVM running? How many documents do you send to ES per day? Do you send logs immediately after they're produced or do you replay/re-process them? Do you have any post processing defined in ES? Pipelines? Do you have any index management defined in ES? Can you reproduce this in a test?

@vesselinn
Copy link
Author

Hi, When we noticed the problem we left one server (JVM) to run without restart for 3 weeks (you can see the result).
image
On this picture you can see how many logs we send to ELK. Actually we use java util logging to produce logs, but they are handled and transformed to log4j2 logs and send to ELK via elasticsearch asynch appender (shown in my first post). We don't have mechanism to re-process them. I've also unit tested the behavior of weekly rollover and it was running normally. Could it be a problem with WebLogic server as all our apps and the Handler are deployed on it? When we run WebLogic all log indexes are in correct format and week of the year, but they are not changing on time - like Sunday 00:00 or Monday 00:00. We don't have any simultaneous scenario for changing the index. Also we don't have post processing and pipelines defined in ES, only mapping template for types of the data.

@vesselinn
Copy link
Author

I've also found log4j2 ticket mentioned in org.apache.logging.log4j.core.appender.rolling.PatternProcessor.getNextTime() / log4j-core-2.11.1jar. Could it be something related to the issue?

@rfoltyns
Copy link
Owner

rfoltyns commented Jul 9, 2020

Given a fairly constant daily rate, sth apparently went wrong on 22nd June. Could send me the output of this query?

curl http://somehost:9200/_cat/indices/log4j.2020-*?v=&s=index

Ticket you mentioned describes issues on year change, but maybe it's worth looking into it.

BTW, If you're transforming the logs from java logging, you can actually use AsyncBatchDelivery directly, this way you'll have full control of the target index name (and maybe discover an issue with it a bit easier). Also, it might flesh out some timestamp issues.

@cuneytcalishkan
Copy link
Contributor

Hello @rfoltyns ,

We are also using this project and we've come across the same problem.

Here is a unit test to reproduce the issue and I guess I have found why the problem happens.
The nextRolloverTime is calculated at construction time of RollingIndexNameFormatter and if this is the only instance that is used for each message before logging, we end up with this error.
If we create a new instance each time, then the index is calculated correctly.
Please take a look at the unit test code below and let me know if I could help to solve this issue or if this test makes sense at all.

@Test
  public void testWeeklyRollingIndexPattern() {
    Long sundayBeforeMidnight = LocalDateTime.of(2020, 9, 20, 23, 59, 59)
                                             .atZone(ZoneId.of("GMT"))
                                             .toInstant().toEpochMilli();
    RollingIndexNameFormatter.Builder builder = spy(RollingIndexNameFormatter.newBuilder());
    when(builder.getInitTimeInMillis()).thenReturn(sundayBeforeMidnight);
    builder.withIndexName("index");
    builder.withPattern("yyyy-ww");
    builder.withSeparator("-");
    builder.withTimeZone("GMT");
    RollingIndexNameFormatter formatter = builder.build();

    LogEvent logEvent = mock(LogEvent.class);
    when(logEvent.getTimeMillis()).thenReturn(sundayBeforeMidnight);

    String formattedIndexName = formatter.format(logEvent);
    Assert.assertEquals("index-2020-38", formattedIndexName);

    Long mondayAfterMidnight = LocalDateTime.of(2020, 9, 21, 0, 0, 1)
                                            .atZone(ZoneId.of("GMT"))
                                            .toInstant().toEpochMilli();

    when(builder.getInitTimeInMillis()).thenReturn(mondayAfterMidnight);
    LogEvent newWeekEvent = mock(LogEvent.class);
    when(newWeekEvent.getTimeMillis()).thenReturn(mondayAfterMidnight);

    String nextWeekIndex = formatter.format(newWeekEvent);
    Assert.assertEquals("index-2020-38", nextWeekIndex); // should have failed and produced index-2020-39

    formatter = builder.build(); // creating a new formatter instance

    nextWeekIndex = formatter.format(newWeekEvent);
    Assert.assertEquals("index-2020-39", nextWeekIndex);
  }

@rfoltyns
Copy link
Owner

Thank you @cuneytcalishkan!

Looks like a bug.

If you'd like to solve it, I'm more than happy to accept your help 👍 I haven't thought about any concrete solutions yet, but there are few things to have in mind:

  • we cannot create new instances as this particular algorithm it's specifically designed to reduce memory allocation as much as possible; current* and next* fields are there to ensure that in a typical scenario, calculations and allocations are made only "around" rollover time
  • rollover must be thread-safe, but it can't lock (that's why atm it falls back at RollingIndexNameFormatter:116)
  • formatter must return correct index name if logs from previous period arrived after rollover

@cuneytcalishkan
Copy link
Contributor

cuneytcalishkan commented Oct 12, 2020

Hello @rfoltyns,

So far I've managed to debug a little bit and understand what is going on behind the scenes of PatternProcessor that is created at initialization time, which actually calculates the next rollover time.

What I've found out is that, this depends on the system settings about the country/region where the first day of the week changes depending on the locale. e.g. United States, first day of the week is Sunday whereas for France it is Monday.

You can see in the below 2 unit tests which demonstrate this.

In order for returnsWeeklyRolloverIndexNameWithFrance to pass, you need to set the country setting of your OS to some location where the first day of the week is a Monday. With this setting, returnsWeeklyRolloverIndexNameWithUS will fail.
If you change the country setting to United States, then returnsWeeklyRolloverIndexNameWithUS will pass and the other one will fail.

    @Test
    public void returnsWeeklyRolloverIndexNameWithFrance() {
      final String timezone = "Europe/Paris";
      long loggerInitMillis = LocalDateTime.of(2020, 10, 8, 21, 48, 35)
                                           .atZone(ZoneId.of(timezone))
                                           .toInstant().toEpochMilli();
      RollingIndexNameFormatter.Builder builder = spy(RollingIndexNameFormatter.newBuilder());
      when(builder.getInitTimeInMillis()).thenReturn(loggerInitMillis);
      builder.withIndexName("index");
      builder.withPattern("yyyy-ww");
      builder.withSeparator("-");
      builder.withTimeZone(timezone);
      RollingIndexNameFormatter formatter = builder.build();

      long sundayBeforeMidnight = LocalDateTime.of(2020, 10, 11, 23, 59, 59)
                                               .atZone(ZoneId.of(timezone))
                                               .toInstant().toEpochMilli();
      LogEvent thisWeekEvent = mock(LogEvent.class);
      when(thisWeekEvent.getTimeMillis()).thenReturn(sundayBeforeMidnight);
      String formattedIndexName = formatter.format(thisWeekEvent);
      Assert.assertEquals("index-2020-41", formattedIndexName);

      long nextWeekMondayAfterMidnight = LocalDateTime.of(2020, 10, 12, 0, 0, 1)
                                                      .atZone(ZoneId.of(timezone))
                                                      .toInstant().toEpochMilli();
      LogEvent nextWeekEvent = mock(LogEvent.class);
      when(nextWeekEvent.getTimeMillis()).thenReturn(nextWeekMondayAfterMidnight);
      String nextWeekIndex = formatter.format(nextWeekEvent);
      Assert.assertEquals("index-2020-42", nextWeekIndex);
    }

    @Test
    public void returnsWeeklyRolloverIndexNameWithUS() {
      final String timezone = "Europe/Paris";
      long loggerInitMillis = LocalDateTime.of(2020, 10, 8, 21, 48, 35)
                                           .atZone(ZoneId.of(timezone))
                                           .toInstant().toEpochMilli();
      RollingIndexNameFormatter.Builder builder = spy(RollingIndexNameFormatter.newBuilder());
      when(builder.getInitTimeInMillis()).thenReturn(loggerInitMillis);
      builder.withIndexName("index");
      builder.withPattern("yyyy-ww");
      builder.withSeparator("-");
      builder.withTimeZone(timezone);
      RollingIndexNameFormatter formatter = builder.build();

      long sundayBeforeMidnight = LocalDateTime.of(2020, 10, 10, 23, 59, 59)
                                               .atZone(ZoneId.of(timezone))
                                               .toInstant().toEpochMilli();
      LogEvent thisWeekEvent = mock(LogEvent.class);
      when(thisWeekEvent.getTimeMillis()).thenReturn(sundayBeforeMidnight);
      String formattedIndexName = formatter.format(thisWeekEvent);
      Assert.assertEquals("index-2020-41", formattedIndexName);

      long nextWeekMondayAfterMidnight = LocalDateTime.of(2020, 10, 11, 0, 0, 1)
                                                      .atZone(ZoneId.of(timezone))
                                                      .toInstant().toEpochMilli();
      LogEvent nextWeekEvent = mock(LogEvent.class);
      when(nextWeekEvent.getTimeMillis()).thenReturn(nextWeekMondayAfterMidnight);
      String nextWeekIndex = formatter.format(nextWeekEvent);
      Assert.assertEquals("index-2020-42", nextWeekIndex);
    }

I am currently checking with our operation team about this country/region setting of the machines.
However, even though it is already week 42 for both US and Europe, we still have logs written with index week 41 as of now.

@rfoltyns
Copy link
Owner

Thank you @cuneytcalishkan. Great findings! 👍 I had a look at it as well and also found those Locale-based week ends. I've made a few tweaks during formatter initialization that gets your test to pass. Have a look here. I'll do some more testing today.

Also @vesselinn, YYYY-ww date pattern should be used in this case, not yyyy-ww. Otherwise you'll get incorrect behaviour at the end of the year, e.g.: 2020-01 instead of 2021-01 ~end of December 2020 (Java dates wizardry..). I'll add a note in the documentation.

It seems like the issue is caused by premature rollover. Initialising the formatter with nextRolloverTime = patternProcessor.getNextTime(...,0,..) was trying to force the rollover on first log. It was not catering for the case when first logs are arriving after factual rollover time causing it to rollover to far into the future.

Would you like to test the behaviour of 1.4.5-SNAPSHOT? I can push it out today. It should be available for download tomorrow.

@cuneytcalishkan
Copy link
Contributor

@rfoltyns that would be great to test it with these changes if possible. Thank you very much.

@rfoltyns
Copy link
Owner

Done. 1.4.5-SNAPSHOT - 1.4 branch with f5d18c3 - pushed to http://oss.sonatype.org/content/repositories/snapshots.

Add the repo to your pom.xml (or other build system equivalent) to get it:

<repositories>
    <repository>
        <id>oss.sonatype.org-snapshot</id>
        <url>http://oss.sonatype.org/content/repositories/snapshots</url>
        <releases>
            <enabled>false</enabled>
        </releases>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>
</repositories>

@cuneytcalishkan
Copy link
Contributor

Hello @rfoltyns ,
I've checked the logs on our system today and unfortunately the solution didn't work. We still see last week's number in today's messages as 2020-42. The deployment was done on Friday and we would expect to see index with week number 43 on the logs of today's messages.
We need to investigate a little bit more I guess. I will try to check again with your changes and different test scenarios.

@rfoltyns
Copy link
Owner

Ok. If you're getting just a few events per day, could you create a test that reproduces it with exact timestamps?

@rfoltyns
Copy link
Owner

@cuneytcalishkan I've uploaded another snapshot with isses/50 to http://oss.sonatype.org/content/repositories/snapshots.

This one looks a bit more promising. Rolling formatter test gets green once -Duser.timezone=GMT JVM arg is set.

user.timezone must match the timezone configured by the builder as PatternProcessor takes it from the env variables and FastDateFormat from the builder params - that's not great..

@rfoltyns
Copy link
Owner

rfoltyns commented Nov 5, 2020

@cuneytcalishkan Did you have a chance to test the latest snapshot?

@cuneytcalishkan
Copy link
Contributor

Hello @rfoltyns, we actually deployed new version of the code with the latest snapshot and it seems to be even worse.
Before, with a restart or a new deployment we would have the correct index. However, the latest snapshot takes the last week's number as the index. We had 2 deployments this week on Monday and Wednesday and the index is still pointing to week 44.
I haven't had time to provide some unit tests to reproduce the issue, yet. I will try to do that as soon as possible.

@rfoltyns
Copy link
Owner

@cuneytcalishkan I got similar results in tests if -Duser.timezone=GMT was not set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants