SharePoint Distributed Cache Bug with AppFabric Pre-CU4

There is a known bug in SharePoint (2013 & 2016) Distributed Cache when using AppFabric pre-CU4. These issues can be fixed by applying the latest AppFabric CU and enabling the background garbage collection feature. It is also recommended to make the changes to the Distributed Cache and STS configs that we have outlined in our process.

Since Distributed Cache is involved in caching credentials in a SharePoint environment it is not surprising to find out that a misconfiguration of this cache can cause a wide variety of issues that can be difficult to pinpoint. In our experience, we have seen this issue manifest itself in the following ways.

 

Primarily, in the SharePoint ULS logs, we see the following errors:

Product: SharePoint Foundation
Category: DistributedCache
Level: Unexpected
Message:
Unexpected error occurred in method 'GetObject' , usage 'Distributed Logon Token Cache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode:SubStatus:The request timed out..
Additional Information : The client was trying to communicate with the server : net.tcp://cacheserver.example.com:22233
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)
at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)
at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.b__48()
at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key)'.
Product: SharePoint Foundation
Category: DistributedCache
Level: Medium
Message:
Unexpected error occurred in method 'GetObject' , usage 'Distributed Logon Token Cache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode:SubStatus:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.).
Additional Information : The client was trying to communicate with the server : net.tcp://cacheserver.example.com:22233
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)
at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)
at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.b__48()
at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key)'.
Product: SharePoint Foundation
Category: DistributedCache
Level: Medium
Message:
Token Cache: Failed to get token from distributed cache for '0#.f|provider|username'.(This is expected during the process warm up or if data cache Initialization is getting done by some other thread).
Exception: 'Microsoft.SharePoint.DistributedCaching.SPDistributedCacheClientRequestTimeOutException: Communications with the cache cluster has experienced a delay past the timeout value,please increase the RequestTimeout of the client. ---> Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode:SubStatus:The request timed out..
Additional Information : The client was trying to communicate with the server : net.tcp://cacheserver.example.com:22233
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)
at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)
at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.b__48()
at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key) -
-- End of inner exception stack trace ---
at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key)
at Microsoft.SharePoint.IdentityModel.SPDistributedSecurityTokenCache.GetObject(String key)
at Microsoft.SharePoint.IdentityModel.SPTokenCache.TryGetCachedToken(String cacheKey)'.
Product: SharePoint Server Search
Category: QueryCache
Level: Unexpected
Message:
SearchDistributedCache::PutAction() - Failed due to exception = 'Microsoft.Office.Server.DistributedCaching.SPDistributedCacheClusterDownException: Cache cluster is down, restart the cache cluster and Retry ---> Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode:SubStatus:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.).
Additional Information : The client was trying to communicate with the server : net.tcp://cacheserver.example.com:22233
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)
at Microsoft.ApplicationServer.Caching.DataCache.InternalPut(String key, Object value, DataCacheItemVersion oldVersion, TimeSpan timeout, DataCacheTag[] tags, String region, IMonitoringListener listener)
at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass25.b__24()
at Microsoft.ApplicationServer.Caching.DataCache.Put(String key, Object value, TimeSpan timeout)
at Microsoft.Office.Server.DistributedCaching.SPDistributedCache.Put(String key, Object value)
--- End of inner exception stack trace ---
at Microsoft.Office.Server.DistributedCaching.SPDistributedCache.Put(String key, Object value)
at Microsoft.Office.Server.Search.Query.SearchDistributedCache.PutAction(String key, Object value)'
Product: SharePoint Server Search
Category: QueryCache
Level: Unexpected
Message:
DistributedSearchResultsCache::Get() - Failed due to exception = 'Microsoft.Office.Server.DistributedCaching.SPDistributedCacheClusterDownException: Cache cluster is down, restart the cache cluster and Retry ---> Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode:SubStatus:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.).
Additional Information : The client was trying to communicate with the server : net.tcp://cacheserver.example.com:22233
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)
at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)
at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.b__48()
at Microsoft.Office.Server.DistributedCaching.SPDistributedCache.GetObject(String key)
--- End of inner exception stack trace ---
at Microsoft.Office.Server.DistributedCaching.SPDistributedCache.GetObject(String key)
at Microsoft.Office.Server.Search.Query.SearchResultsDistributedCache.Get(String key)'

 

Resolution

Step 1

Modify the Distributed Cache Logon Token Cache settings by using PowerShell to run the following commands:

Get-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache

$DLTC = Get-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache

$DLTC.maxBufferPoolSize = "1073741824"

$DLTC.maxBufferSize = "33554432"

$DLTC.requestTimeout = "3000"

$DLTC.channelOpenTimeOut = "3000"

$DLTC.MaxConnectionsToServer = "100"

Set-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache $DLTC

Restart-Service -Name AppFabricCachingService

These settings can only be changed using PowerShell and must be run on all the distributed cache servers in the cluster. This will virtually eliminate the cache timeout issues.

 

Step 2

Modify the Security Token Service cache settings by using PowerShell to run the following:

$sts = Get-SPSecurityTokenServiceConfig

$sts.MaxServiceTokenCacheItems = "1500"

$sts.MaxLogonTokenCacheItems = "1500"

$sts.Update()

This will also need to be run on all the distributed cache servers.

 

Step 3

Download and install the latest AppFabric CU from Microsoft.

This will need to be installed on all the servers in the farm that are hosting Distributed Cache.

 

Step 4

Modify the DistributedCacheService.exe.config file on all servers to fix the issue with the background garbage collection process.

This file can be found in the WindowsSystem32AppFabric folder. Add the following line to the config file between the tag and the tag.

 

Step 5

Restart the AppFabric Windows service and the Distributed Cache SharePoint service and run an IISRESET on all machines.

I prefer to do a full farm reboot just to be safe and to avoid having to restart the services manually on each server.

Depending on the number of servers in the environment, this can be a big time-saver.

This should fix the issues and clear up the errors filling up ULS.

I have created a couple of basic scripts for making the token cache changes.

I would recommend saving the PowerShell commands in step 1 and step 2 as scripts to use in your own environments.

These will save a good bit of time compared to making all the changes manually.

I am also working on a script to make the changes to the Distributed Cache config file.

If you are interested, comment below and I will add it to the instructions once complete.

 

Conclusion

It’s important to keep up with updates to all of the Microsoft technologies that are incorporated within the SharePoint Administration.

Occasionally, Microsoft will find a bug in a previously released version of one of these technologies and the sooner it is addressed the better.

These bugs can be anywhere on the spectrum from simple functionality tweaks to serious security vulnerabilities.

Although this particular bug was resolved with AppFabric CU5, it is highly recommended to stay up to date with all of the current MS updates.

At the time this article was published, the current AppFabric CU available is CU8. It can be downloaded here: https://www.microsoft.com/en-us/download/details.aspx?id=54440.

Following the instructions in this blog and staying up to date with all of the AppFabric CUs will keep your Distributed Cache implementation functioning at a high level and will eliminate the flood of errors that can show up in ULS and the event logs.

Related Articles to Help Grow Your Knowledge

Microsoft Power Products: The Future of Business
Microsoft Power Products: The Future of Business

You are not alone if you are unsure about what is included in the Microsoft Power Products platform. These five Microsoft business applications make up the Microsoft Power Platform. Microsoft Power BI Microsoft Power Apps Microsoft Power Automate (originally called...

5 Benefits of SQL Server Consolidation
5 Benefits of SQL Server Consolidation

It's hard to remember a time without Microsoft SQL Server. After all, it's almost 50 years since its initial release. That's enough time for SQL instances to have multiplied across any enterprise. With SQL servers distributed throughout a network, IT departments may...

9 of the Most Common Microsoft Flow Examples
9 of the Most Common Microsoft Flow Examples

Analysts expect the global workflow automation market to grow by 5.8% from 2020 to 2025. Is your business part of this growing trend yet? If you use Office 365, you may already have access to a versatile automation tool. Microsoft Flow can help you save time and...

Get our free, 30-second weekly newsletter. Used by 2000+ people to keep up with always-changing Microsoft technology.