SharePoint Distributed Cache Bug with AppFabric Pre-CU4

There is a known bug in SharePoint (2013 & 2016) Distributed Cache when using AppFabric pre-CU4. These issues can be fixed by applying the latest AppFabric CU and enabling the background garbage collection feature. It is also recommended to make the changes to the Distributed Cache and STS configs that we have outlined in our process.

Since Distributed Cache is involved in caching credentials in a SharePoint environment it is not surprising to find out that a misconfiguration of this cache can cause a wide variety of issues that can be difficult to pinpoint. In our experience, we have seen this issue manifest itself in the following ways.

 

Primarily, in the SharePoint ULS logs, we see the following errors:

Product: SharePoint Foundation
Category: DistributedCache
Level: Unexpected
Message:
Unexpected error occurred in method 'GetObject' , usage 'Distributed Logon Token Cache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<errca0018>:SubStatus<es0001>:The request timed out..
Additional Information : The client was trying to communicate with the server : net.tcp://cacheserver.example.com:22233
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)
at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)
at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.<get>b__48()
at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key)'.</get></es0001></errca0018>
Product: SharePoint Foundation
Category: DistributedCache
Level: Medium
Message:
Unexpected error occurred in method 'GetObject' , usage 'Distributed Logon Token Cache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<errca0017>:SubStatus<es0006>:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.).
Additional Information : The client was trying to communicate with the server : net.tcp://cacheserver.example.com:22233
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)
at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)
at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.<get>b__48()
at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key)'.</get></es0006></errca0017>
Product: SharePoint Foundation
Category: DistributedCache
Level: Medium
Message:
Token Cache: Failed to get token from distributed cache for '0#.f|provider|username'.(This is expected during the process warm up or if data cache Initialization is getting done by some other thread).
Exception: 'Microsoft.SharePoint.DistributedCaching.SPDistributedCacheClientRequestTimeOutException: Communications with the cache cluster has experienced a delay past the timeout value,please increase the RequestTimeout of the client. ---> Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<errca0018>:SubStatus<es0001>:The request timed out..
Additional Information : The client was trying to communicate with the server : net.tcp://cacheserver.example.com:22233
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)
at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)
at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.<get>b__48()
at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key) -
-- End of inner exception stack trace ---
at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key)
at Microsoft.SharePoint.IdentityModel.SPDistributedSecurityTokenCache.GetObject(String key)
at Microsoft.SharePoint.IdentityModel.SPTokenCache.TryGetCachedToken(String cacheKey)'.</get></es0001></errca0018>
Product: SharePoint Server Search
Category: QueryCache
Level: Unexpected
Message:
SearchDistributedCache::PutAction() - Failed due to exception = 'Microsoft.Office.Server.DistributedCaching.SPDistributedCacheClusterDownException: Cache cluster is down, restart the cache cluster and Retry ---> Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<errca0017>:SubStatus<es0006>:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.).
Additional Information : The client was trying to communicate with the server : net.tcp://cacheserver.example.com:22233
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)
at Microsoft.ApplicationServer.Caching.DataCache.InternalPut(String key, Object value, DataCacheItemVersion oldVersion, TimeSpan timeout, DataCacheTag[] tags, String region, IMonitoringListener listener)
at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass25.<put>b__24()
at Microsoft.ApplicationServer.Caching.DataCache.Put(String key, Object value, TimeSpan timeout)
at Microsoft.Office.Server.DistributedCaching.SPDistributedCache.Put(String key, Object value)
--- End of inner exception stack trace ---
at Microsoft.Office.Server.DistributedCaching.SPDistributedCache.Put(String key, Object value)
at Microsoft.Office.Server.Search.Query.SearchDistributedCache.PutAction(String key, Object value)'</put></es0006></errca0017>
Product: SharePoint Server Search
Category: QueryCache
Level: Unexpected
Message:
DistributedSearchResultsCache::Get() - Failed due to exception = 'Microsoft.Office.Server.DistributedCaching.SPDistributedCacheClusterDownException: Cache cluster is down, restart the cache cluster and Retry ---> Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<errca0017>:SubStatus<es0006>:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.).
Additional Information : The client was trying to communicate with the server : net.tcp://cacheserver.example.com:22233
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)
at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)
at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.<get>b__48()
at Microsoft.Office.Server.DistributedCaching.SPDistributedCache.GetObject(String key)
--- End of inner exception stack trace ---
at Microsoft.Office.Server.DistributedCaching.SPDistributedCache.GetObject(String key)
at Microsoft.Office.Server.Search.Query.SearchResultsDistributedCache.Get(String key)'</get></es0006></errca0017>

 

Resolution

Step 1

Modify the Distributed Cache Logon Token Cache settings by using PowerShell to run the following commands:

Get-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache

$DLTC = Get-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache

$DLTC.maxBufferPoolSize = "1073741824"

$DLTC.maxBufferSize = "33554432"

$DLTC.requestTimeout = "3000"

$DLTC.channelOpenTimeOut = "3000"

$DLTC.MaxConnectionsToServer = "100"

Set-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache $DLTC

Restart-Service -Name AppFabricCachingService

These settings can only be changed using PowerShell and must be run on all the distributed cache servers in the cluster. This will virtually eliminate the cache timeout issues.

 

Step 2

Modify the Security Token Service cache settings by using PowerShell to run the following:

$sts = Get-SPSecurityTokenServiceConfig

$sts.MaxServiceTokenCacheItems = "1500"

$sts.MaxLogonTokenCacheItems = "1500"

$sts.Update()

This will also need to be run on all the distributed cache servers.

 

Step 3

Download and install the latest AppFabric CU from Microsoft.

This will need to be installed on all the servers in the farm that are hosting Distributed Cache.

 

Step 4

Modify the DistributedCacheService.exe.config file on all servers to fix the issue with the background garbage collection process.

This file can be found in the WindowsSystem32AppFabric folder. Add the following line to the config file between the </configSections> tag and the <dataCacheConfig> tag.

<appSettings><add key="backgroundGC" value="true"/></appSettings>

 

Step 5

Restart the AppFabric Windows service and the Distributed Cache SharePoint service and run an IISRESET on all machines.

I prefer to do a full farm reboot just to be safe and to avoid having to restart the services manually on each server.

Depending on the number of servers in the environment, this can be a big time-saver.

This should fix the issues and clear up the errors filling up ULS.

I have created a couple of basic scripts for making the token cache changes.

I would recommend saving the PowerShell commands in step 1 and step 2 as scripts to use in your own environments.

These will save a good bit of time compared to making all the changes manually.

I am also working on a script to make the changes to the Distributed Cache config file.

If you are interested, comment below and I will add it to the instructions once complete.

 

Conclusion

It’s important to keep up with updates to all of the Microsoft technologies that are incorporated within the SharePoint Administration.

Occasionally, Microsoft will find a bug in a previously released version of one of these technologies and the sooner it is addressed the better.

These bugs can be anywhere on the spectrum from simple functionality tweaks to serious security vulnerabilities.

Although this particular bug was resolved with AppFabric CU5, it is highly recommended to stay up to date with all of the current MS updates.

At the time this article was published, the current AppFabric CU available is CU8. It can be downloaded here: https://www.microsoft.com/en-us/download/details.aspx?id=54440.

Following the instructions in this blog and staying up to date with all of the AppFabric CUs will keep your Distributed Cache implementation functioning at a high level and will eliminate the flood of errors that can show up in ULS and the event logs.

Related Articles to Help Grow Your Knowledge

Features and Benefits of Microsoft 365
Features and Benefits of Microsoft 365

Microsoft 365 or Microsoft Office 365 has all the apps that help you reach your business or educational goals. Learn more about the features and benefits of Microsoft 365 that will help you achieve those goals efficiently. This guide will also help those who have...

Automating Your Tasks with Teams and Power Automate
Automating Your Tasks with Teams and Power Automate

Microsoft Teams is a collaboration platform that combines all the tools your team needs to be productive. MS Teams is an ideal platform to enhance productivity and help you accomplish more things. With its powerful chat features, you can easily stay in touch with your...

Do You Need Power Automate Dataverse Integration?
Do You Need Power Automate Dataverse Integration?

Part of the Microsoft Power Platform, Power Automate (formerly known as Microsoft Flow) is a must-have tool for any business. Power Automate can help you increase business productivity by automating routine processes and tasks, freeing your employees to focus on...