One thing I have been very passionate about is making secure network access deployments easier, which includes what we like to call serviceability. Serviceability is all about making a product easier to troubleshoot, easier to deploy and easier to use. Ultimately the goal is always customer success.
There is a distinct correlation between visibility and success of any NAC project. If you are blind to what’s happening, and if you can’t easily get to the information that helps figure out what’s wrong, it can be very frustrating and also gives the appearance of a poor deployment.
My goal of this post is to highlight a lot of the serviceability items Cisco has put into ISE that you may not be aware of. I’ll do my best to not only call out the feature or function that was added, but explain why it matters and what version it was added in.
Per Endpoint Debug (ISE 1.3+)
This is one of my favorite serviceability features that added, and arguable one of the most usable. ISE is not just a single product; it is a solution with many moving parts, and each of those parts may have different logs that you or TAC may have to sift through. The Per Endpoint Debug feature was added in ISE 1.3, and it provides a single debug file for all components (RADIUS, Guest, Profiling, etc.) for a specific endpoint across it’s entire session—across the entire deployment!
So, if an endpoint is getting profiled in the East-Coast DC and the West-Coast DC at the same time, all of that will still show up in the single, consolidated debug file. It prevents you from having to enable debug on the components themselves for all endpoints, and it focuses the debug instead. This is incredibly elegant, and it helps advanced admins and TAC engineers greatly reduce time to resolution when experiencing an issue.
De-duplication and anomalous endpoint suppression (1.2+)
Many of you have also heard me rant about endpoint supplicants and how they behave. You may have read my post on why to use Wildcard/WildSAN certificates to alleviate the painful symptom of bad endpoint behavior. We’ve even added functionality to TEAP (RFC-7170) to help with that behavior by delivering the list of server certificates to trust down to the supplicant. I won’t rehash all that pain here; instead I will show you one of the things we did at the RADIUS server (ISE) side to help alleviate wasting log storage/scale on poorly behaving endpoints.
Prior to ISE 1.2, every authentication request would create a 12KB log record that needed to be stored. When bad endpoint behavior is causing millions of failed authentications a day, that is storing a LOT of log data.
Beginning in ISE 1.2, ISE suppresses anomalous clients by default, only storing a single record and then logging each time that same exact record was received. This saved a tremendous amount of processing and log storage, and it provides for higher scale.
Examining the screen shot above:
- Detection Interval will flag misbehaving supplicants when they fail authentication more than once per interval.
- Reporting Interval sends the alarm from the PSN to the MNT every X-Minutes.
- Request Rejection Interval stops sending logs for repeat authentication failures for the same endpoint during the rejection interval (Suppresses the logs). Note: A successful authentication will clear all flags.
- Reject Requests After Detection. Once the endpoint is in the reject interval, any requests with the same Calling-Station-ID (Mac-Address), NAD (NAS-IP-Address) and Failure reason will be sent an Access-Reject, and the counter will increment by 1 + timestamp. That log is sent at the "Reporting Interval" listed above.
Below the horizontal line, you will notice the ability to de-duplicate successful authentications.
- Suppress Repeated Successful. Applies the de-duplication and suppresses the logs from MnT.
- Accounting Suppression Interval. Stops sending accounting logs for the same session during this configured interval.
- Long Processing Step Threshold Interval. Detects and logs NAS retransmission timeouts for authentication steps that exceed this threshold. This relates to the step latency that is visible in the Authentication Detail report.
Dashlet counters above Live Log (1.2+)
The de-duplication is a very nice and welcome change, but it did leave a few gaps to be addressed. Live Log is the first screen that one would use when troubleshooting a login problem. However, if the entries are not showing up in Live Log because they are being suppressed, it leaves the admin in a very bad position with no visibility into what’s going on.
So, we added key counters at the top of the Live Log screen to help provide visibility. You can see those counters in Figure 3 below.
The admin would see the Repeat Counter, Misconfigured Supplicant and RADIUS drops counters continue to go up. Click on one of the counters, and you’re brought to the list of items that are making the counters increment.
Key actions from Live-Log (1.3+)
Now you can see which endpoints are causing the counters to increment, i.e. which ones are being suppressed. When troubleshooting, you may need to bypass the suppression to ensure all logs come to the Live Log no matter what, but only for that endpoint. That way you aren’t disabling the de-duplication for the entire deployment and opening those floodgates. Instead it is applying to only the single endpoint.
Live Log was enhanced to include the ability to bypass suppression for one hour with a right click (ISE 1.3 – 2.0) and with the Actions target icon in ISE 2.1, as seen in Figure 4.
The ability to bypass the event suppression is not limited only to the context menu within Live Log. It also exists in the collection filters located at Administration > System > Logging > Collection Filters , as seen in Figure 5.
Live Log RegEx (1.3+)
In ISE 1.3 the ability to use negative filtering in the quick filter boxes was added. Beyond just negative filtering, it was actually a full RegEx capability, making it much easier to find what you really need within the Live Log. Figure 6 shows an example in version 2.0 and below. Figure 7 shows the new filtering in ISE 2.1, which provides a graphical way to leverage the advanced filters.
Tree View for Policy Match (1.3+)
When a policy is multi-tiered, it can be somewhat complex to quickly recognize the "path" that an authentication session takes through that policy. Tree View was added to Live Log and to the reports to show the Policy Set > Authentication Protocol Rule > ID Store Rule and the Policy Set > Authorization Rule that the session followed. This is illustrated in Figure 8.
Active Directory diagnostics (1.3+)
In ISE 1.3, the Active Directory connector was replaced with one that could support Multi-Forest, Multi-Join, domain white lists, and much more. One of the fantastic enhancements that doesn’t get enough credit is the Diagnostic Tool.
The built-in tool was designed to provide the ISE admin with every bit of information possible to help them diagnose problems. You may want to translate that to "provide enough detail to give the Active Directory team at your company irrefutable evidence if something may be AD’s fault and not ISE’s fault":
TCPDump from Central GUI (1.0+)
Since the first release of ISE, it was known that packet captures are tremendously important for effective troubleshooting. Instead of just including the TCPDump utility on each of the ISE Nodes, the call was made to centrally control it through the GUI. From that centralized location, you can configure a TCPDump to happen on any interface on any node in the entire deployment and download the result to your local machine—all through the GUI, as seen in Figure 10.
Detailed authentication report (1.0+)
Since version 1.0, ISE has had an incredible troubleshooting tool that has single-handedly been responsible for solving the vast majority of cases. Is it some magical portal? No. It’s just the detailed authentication report. Simply click the magnifying glass in Live Log. It provides an overview of the authentication, every detail available, and even includes every step that has occurred within ISE—from receiving the RADIUS Access-Request to the RADIUS response.
Additionally, when any step takes longer than normal, the report lists out the step latency. A senior member of TAC has been quoted as saying the inclusion of latency was the "best feature ever." 🙂 Figure 11a shows a snippet of a detailed authentication reports, while Figure 11b shows the step latency in action.
Download logs from GUI (1.0+)
ISE nodes have very detailed log files in the underlying operating system. You have the ability to download those logs for any node in the deployment from the centralized GUI since 1.0. It’s been enhanced over time, but has always been there. If only it were in alphabetical order. 🙂
Portal Preview (1.3+)
When creating portals with ISE 1.3 or above, there’s a WYSIWYG portal customization page with an automatic preview of what the portal will look like in mobile size.
Portal test button (1.3+)
In addition to the portal preview, o pens an example of your saved portal configuration that allows you to test functionality without needing to actually connect to the portal. Figure 13 shows the Portal Test Link, while Figure 14 shows
Support information link on all end-user facing portals (1.3+)
Along with all the new portal enhancements that came with ISE 1.3, there is an option on most portals (Guest, Sponsor, BYOD, MDM, My Devices) to offer "support information." If configured, it will include information that aids a help desk if someone had issues. The information is something that end users may not know how to obtain otherwise (MAC Address, IP Address, Browser User Agent, Policy Server and Failure Codes), as seen in Figure 15a and Figure 15b.
Time range support bundles (1.3+)
Support bundles are another lifesaver for TAC cases. They create a single encrypted bundle of files, DB exports, Configurations—basically everything TAC should need to help root cause an issue. In ISE version 1.3, the ability to bind that bundle to a specified time range was added, as seen in Figure 16. This helps keep the file size down and ensure that only relevant logs are captured as part of the bundle.
Pre-defined smart-defaults and policies (2.0+)
Even back in 1.3, some smart-defaults were added, such as pre-built Identity Source Sequences that include all Active Directory join-points and pre-defining the MAB-continue for the MAB rules. Those were a very nice addition.
In 2.0, those pre-defined smart configurations continued. They include pre-built guest rules, pre-built defaults for BYOD registration and on-boarding, even pre-installed Native Supplicant Profiles and Certificate templates. I have personally gone from first login to ISE to fully functioning with BYOD on-boarding using TLS and certificates in fewer than 25 minutes start to finish.
Figure 17 illustrates the pre-built authorization rules for on-boarding and for accepting the EAP-TLS after the device was on-boarded.
Figure 18 shows the pre-built authorization result. Notice it uses an ACL named "ACL_WEBAUTH_REDIRECT". If your WLC uses a different ACL for redirection, change this value to match.
Figure 19 shows the pre-configured Native Supplicant Profile. It is pre-configured to use an SSID named "ISE." It’s also pre-configured to use a pre-built certificate template and the built-in certificate authority.
Before ISE 2.0, you would have to connect to the Internet and download the Network Supplicant Assistants (NSAs) for MAC and Windows. In version 2.0+, they are included in the install. Figure 20 shows the pre-installed NSA wizards.
ISE 1.4 and below required you to create the Client Provisioning Policies; one for each OS type. Beginning in ISE 2.0, they are pre-configured for all OS’s using the pre-installed NSA’s and the pre-configured NSPs. 🙂 Figure 21 shows these pre-built policies.
Overall, the time savings for BYOD on-boarding alone is over two hours.