Troubleshooting IPsec

By richard_pracko
February 20, 2015 8:01 am

IPsec protocol suite provides secure way for transferring data over the networks. Naturally correct configuration is necessary on the tunnel endpoints for the VPN to establish so the traffic can be transmitted. However not in every case the VPN comes up right away and troubleshooting is needed. This post provides few options and tips for troubleshooting IPsec on SRX devices.

Typically IPsec uses the IKE protocol for the VPN tunnels establishment, because the other option (highly error-prone) is to configure everything (including encryption keys) manually. IKE messages are carried within UDP protocol (ports 500 or 4500) and negotiation is divided into 2 phases – IKE phase 1 and phase 2. The phase 1 result is one bidirectional security association (tunnel) and result of phase 2 are two unidirectional security associations. Phase 2 uses the tunnel from phase 1 for message transfer. This means the phase 1 has to complete successfully for the phase 2 to even start.

Logically the phase 1 should be checked first. The following command displays the IKE phase 1 status:

inetzero-blog-troubleshooting-ipsec-image-01

The “UP” state means the IKE phase 1 completed successfully.  If the command output is completely empty or the desired SA is missing or in “DOWN” state troubleshooting of phase 1 is needed.

inetzero-blog-troubleshooting-ipsec-image-02

or

inetzero-blog-troubleshooting-ipsec-image-03-01

SRX drops packets destined for itself if they are not permitted within the host-inbound-traffic configuration. This restriction applies to IKE protocol too. It can happen under specific circumstances the IKE messages are allowed and processed even without being listed in host-inbound-traffic configuration. In such case both tunnel endpoints are configured with the “establish-tunnels immediately” and no NAT is performed. The routing engines start to send IKE messages to each other. The outgoing packets originated on RE are not subject to host-inbound-traffic limitation and the incoming IKE packets are treated as their responses. However this behavior is not very consistent and relying on it is not very wise. Simply put: including the IKE protocol in the host-inbound-traffic configuration is the safest way to make sure the IKE messages will be allowed and processed. Please remember, it has to be allowed on both devices!

The following command displays the host-inbound-traffic for the interface.

inetzero-blog-troubleshooting-ipsec-image-03

Alternatively you can check the IKE messages are being send out and received on the interface. Use the monitor traffic command that matches on protocol UDP or ports 500|4500 or both:

inetzero-blog-troubleshooting-ipsec-image-04

Once the IKE packets sending and reception is verified the next step is to take a closer look on IKE phase 1 itself. Now because your focus has been distracted from the IPsec configuration for a short time it might be worth to take another look at the [edit security ike] configuration on both devices and compare them. Do not spend too much time here – at most one/two minutes or so. The chances of finding an error by looking at the configuration after this time rapidly decrease. If no discrepancies are detected it is time to turn on the traceoptions for IKE phase 1.

inetzero-blog-troubleshooting-ipsec-image-05

Please note when no file name is specified the traceoptions entries go to the kmd log file by default.

Look for errors ( “err”string) in the trace file. In many problem situations the string “No proposal chosen” appears. Do not jump to hasty conclusion and spend few moments examining the lines around this error entry. Sometimes the real cause can be different than the indicated proposal mismatch and the “surrounding” lines can contain useful information to pinpoint it.

Examples:

- Incorrect outgoing interface

inetzero-blog-troubleshooting-ipsec-image-06

- Pre share key mismatch (here the “No proposal mismatch” text does not occur).

inetzero-blog-troubleshooting-ipsec-image-07

- Mode mismatch

inetzero-blog-troubleshooting-ipsec-image-08

The next step is to check the IKE phase 2 once the phase 1 completes successfully and the bidirectional tunnel establishes.  The command below does just that:

inetzero-blog-troubleshooting-ipsec-image-09

Please remember the result of phase 2 are two unidirectional tunnels (two Sas). Empty command output or when the desired two SAs are not listed is an indication problem with the phase 2 exists.

inetzero-blog-troubleshooting-ipsec-image-10

Spending a short time to double check the configuration on both nodes might be worth a try. Again spent only a few moments. If you cannot determine the problem enable traceoptions for phase 2. The option to define the file name is not available for traceoptions in phase 2. By default the trace messages go to the kmd log file unless the traceoptions for phase 1 are active and have a custom name defined.  Search for error entries. Many of the error messages might indicate the proposal problem. Do jump to conclusions right away and examine also the surrounding lines whether they do not indicate a different cause of the problem. It might happen the only understandable message will be the “No proposals chosen”. That is an indication of either proposal mismatch or PFS mismatch. Please check both options because  from the log entries it is hard to tell which one it is.

inetzero-blog-troubleshooting-ipsec-image-11

The proxy-id mismatch is easier to recognize. The traceoptions messages will be similar to the ones shown below.

inetzero-blog-troubleshooting-ipsec-image-12

This problem is typically experienced in pure policy-based setups or in mixed setup (policy-based VPN on side and route-based VPN on the other). Route-based VPNs have the proxy-id values set to zeros by default. Which means the proxy-id mismatch does not occur unless the proxy-ids are explicitly modified through configuration.

Do not forget to check the st0.0 interface association to the security zone. st0.0 (as other other interfaces too) belongs to the null zone (where all traffic is dropped) unless explicitly assigned to a zone. Association to an incorrect security zone creates problems as well because the traffic will be evaluated against different policies than anticipated.

inetzero-blog-troubleshooting-ipsec-image-14

You might experience situations when the VPN is flapping (e.g. VPN comes up, then after some time goes down, again comes up and then goes down again, and so on). The most probable cause might be routing. In case the VPN has the monitoring enabled and the the flapping is still happening it is most definitely routing problem. Check if the route used to reach the IP address of the other tunnel end (defined under the [edit security ike gateway]) does not point to the st0.0.

Conclusion

A lot of information is available on the internet about troubleshooting IPsec VPNs on Junos devices. Many articles go really deep into this topic. This post was focused to present some of the most common options – such as cli commands, traceoptions together with few example entries and their meaning. KB21899 (http://kb.juniper.net/InfoCenter/index?page=content&id=KB21899) might be a good starting point when looking for additional information and more comprehensive details.

Are you interested in our courses, but would like to receive a demo first?

Simply enter your e-mail address and we will give you access to our demo for free