MIT Touchstone Project Planning


Goals:

Transition the IdPs to Shibboleth 2.1.4 release.

Phase One: Transition core MIT IdPs (idp.mit.edu)


Hardware

Idp1 and idp2 are running on RHEL3 physical machines. NIST has also provided idp2-dev, which is also a RHEL3 machine. Bob has been using foonalagoona which is provided by OPS/AMIT. This is not a RHEL3 machine. To complete the transition new RHEL5 VMs will be requested from NIST. In addition to the OS upgrade, several other components will be upgraded as well. For example, we have been using Apache 2.0 and will be going to 2.2. We've been using Tomcat 5.5 and will be going to 6. We have been using Java 5 and will be going to 6.
 

    1. 2 dev machine (increased from 1 to 2 at Mark's suggestion) (Received)
    2. 2 staging machines
    3. 2 production machines
    4. Configuration:
      1. minimum RAM 2GB, we're requesting 4GB.
      2. at least 10Gb disk, 7200 RPM
      3. Switch Federation recommends on a physical machine the CPU should 4 cores, each running at 2GHz. It has been noted that IdPs tend to be CPU bound, not disk io or network bandwidth intensive.

 
Once the transition to the new IdPs has been completed the following physical machines will no longer be needed by Touchstone:

    1. Idp1
    2. Idp2
    3. Idp2-dev

 
Once the transition to the new IdPs has been completed the following virtual machines will no longer be needed by Touchstone:

    1. Idp1-staging.mit.edu
    2. Idp2-staging.mit.edu
    3. Foonalagoona.mit.edu (OPS/AMIT)

Develop login page(s) that support multiple mechanisms, without using Stanford WebAuth. (currently expect to complete by January 1st, 2010)

  1. Authentication mechanisms:
    1. Username/password urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport, This uses the Internet2 provided JAAS Kerberos module. the MIT specific UI still needs to be written.
    2. X.509 certificates (via Apache mod_ssl) urn:oasis:names:tc:SAML:2.0:ac:classes:SoftwarePKI
    3. Kerberos via http-spnego (via modgssapache) urn:oasis:names:tc:SAML:2.0:ac:classes:Kerberos
  2. login handler module development (to support the MIT page with multiple mechansim choices, as well as the ability for SPs to specificy a particular mechanism)
  3. re-implement the iPhone support

The login page that presents all three of these mechanisms will be written in JSP. Work estimate is 2 to 3 weeks to have a proof of concept page. Another 1-2 weeks are estimated for the complete work, for a total of 3-5 weeks, with a fairly high confidene on 4 weeks.

New issue (12/15/2009)

The current "options" page is done as Perl CGI. We suggest bringing this over as is to the new core IdPs. This should take less than one day of configuration and testing. After the transition has been completed it may make sense to convert these from Perl CGI to JSP to make things more consistent and remove some extraneous templating. That work will probably take an additional 2 to 3 days. No needed completion date.

High Availability plans

ShibHA is not available for the 2.x IdPs. The recommendation is to use Tomcat Terracotta. As of October 18th Bob had not started working with Terracotta. Bob estimates that he will need approximately 3 days to become familiar with Terracotta configuration issues.
 
Paul will request two test machines from NIST (RHEL5 VMS). These will start as test machines and become the new staging machines.
 
DNS round robin is being used today, and we plan to continue using this for next phase of the project, despite Internet2s recommendation to use a hardware load balancer. We wish to avoid performing an SSL termination at the F5, especially since usernames and passwords are being sent to the IdPs in some cases.

Cease to support artifact query

We will be removing the artifact query support from the MIT metadata.

  1. (completed, week of 10/19/2009)Bob will check to make sure that no SPs are using artifact query
  2. (week of 10/26/2009) perform test to ensure that the removal does not break anything.
  3. Since EZProxy does not use the Internet2 Shibboleth distribution, and instead has a fully integrated SP distribution, additional testing will have to be coordinated with the MIT Libraries. (Paul to test 12/15 or 12/16)
  4. Need to determine when this change will be made. (Currently planned for the week of 12/22/2009)

Migrate from SP attribute query to IdP attribute push.

This means that the user’s attributes will be included with the authentication assertion that is returned to the SP in the initial POST transaction. This will reduce one network round trip between the SP and the IdP.
 
Bob has confirmed that the existing SPs will accept the push without requiring any changes to be made to any of the existing SP configurations.  

Pre-idp 2.1 deployment changes to the existing 1.3 IdPs

  1. Update the Apache server on the existing 1.3 IdPs to include the mod_rewrite module. This will require a rebuilding of Apache on RHEL3. Idp2-dev will be used.  This is expected to take approximately one week. This time estimate includes the rebuilding of Apache, the installation on the test server, and testing the URL rewriting functionality. (completed on one dev machine (reported on 12/15))
  2. Idp2-dev will be used for testing this prior to deployment to either staging or production.

 Alternate strategy: A second instance of the 1.3 IdP software will be installed on the existing IdPs. The new instances will support the new endpoint naming convention. This alternate will only be used if the URL rewriting strategy is a failure.

  1. Test phase:
    1. This will include the modified IdPs , with the new endpoints supported on the IdP staging servers.
    2. The initial testing phase will simply demonstrate that we can use mod_rewrite to avoid the need to run a second instance of the 1.3 IdPs.
    3. We will want some of our most heavily used SP customers to help with testing. We would like Stellar and Wikis staging and test instances to point to the core staging environment during this phase. - (Can we get AMIT to commit to this during the week of January 4th? Paul will raise this at the 12/15 ISDA Pipeline meeting)

IDP 2.x deployment issues

entityIDs (There is no good way to transition the InCommon metadata, we plan to leave this as is. Bob does need to test the configuration that supports the two different entityIDs for MIT. He estimates this will take one day. The testing should be completed by January 1st.)

There are currently two entityIDs or proiderIDs that are used to describe the core MIT IdPs. Within InCommon our entityID is urn:mace:incommon:mit.edu Within the campus federation our entityID is *https://idp.mit.edu/shibboleth*. A single entityID can be used in two different federations. However, when doing so it is important to keep the data identical in the two different metadata files.


https://spaces.internet2.edu/display/InCCollaborate/IdP+entityID+Shift+to+URLs+--+FAQ indicates that new IdPs should use a URL style entityID. However, it also suggests that existing URN style entityID should not be migrated. It points out, “Changing an entityID may cause service disruption and require changes at many partner SP sites.  It is usually more important for entityIDs to remain stable.”

We should strongly consider ignoring this recommendation and migrating to the URL form of entityID within the InCommon metadata. (Won't work.)

metadata updating 

Several of the MIT SPs do not currently follow our recommendations to update the MIT metadata on a regular basis. We need to get all SPs that are important to users and customers to update their metadata on a regular basis before the transition deployment can commence.

  1. Complete draft of message to all SP administrators (Paul)
  2. Send message to all SP administrators (Paul)
  3. Do we have a way of verifying that SP administrators have taken action?
  4. Draft message regarding IdP transition

On December 8th the metadata was updated with an expiration date of February 1, 2010.

Paul to send another reminder to stakeholders, and include information about how they can confirm they have the recent file. Send by COB 12/18/2009.

deployment migration and temporary SSO disruption

We’re thinking about adding the new IdPs to the existing idp.mit.edu DNS round robin. It should then be possible to remove the 1.3 IdPs from the DNS pool, without shutting them off. If necessary, they can then quickly be added back to the DNS pool, and the new IdPs can be withdrawn. We need to coordinate with Mark what the DNS TTL should be, and what procedures should be used to get a DNS change to happen, in a timely manner.

It should be understood that there will be no state sharing between the 1.3 IdPs and the 2.x IdPs. To a certain extent this will affect SSO. For users that have configured their browsers to always use a certificate, or always use Kerberos, there should be no visible change in behavior while the 1.3 and 2.x servers are both running.

For users that don’t have the mechanism automatically selected, but always click on certificates, or Kerberos, they may be presented with the login screen twice, during a browser session.

The same is true for people that use username and password. They should only end up being prompted for their username and password one extra time during a typical browser session. Many users will not see a change in behavior.

Once we have confidence in the new IdPs, the 1.3 IdPs will be taken out of the DNS pool and taken out of service. How long the 1.3 and 2.x IdPs should be allowed to run concurrently is open to debate. Perhaps only an hour, perhaps a couple of days, I expect that we will have a better idea once we have done this in the staging environment first. .

Should the DNS TTL be lowered, during the time that both the 1.3 and 2.x IdPs are running concurrently?

New issue: communications to help staff and stakeholders

Paul will compose a message to help staff and stake holders explaining  the transition issues. It should include the small change in the IdP authentication page appearance. (including screen shots) It should explain the SSO interruption. Compose after working out the transitions issues with Mark and peforming initial testing. E.g. should be sent near the end of the week of January 11th

Alternatives:

Bring up the new IdPs under a new DNS name, and add them to the MIT-metadata. As SPs take the new metadata, they will start using the new IdPs.


Note that this technique is not recommended by the Shib-user community or the Internet2 wiki. It can lead to a long transition time, and it is difficult to backtrack quickly if there are any problems. The best behaved SPs tend to only update their metadata once a day, many only update manually.


We could “throw the switch” and shut down the old IdPs and bring up the new IdPs during one scheduled short down time. This would require a short interruption of service. If there are problems with the 2.x IdPs it will mean there will be other interruptions of service.

Phase Two: Transition TouchstoneNetwork.net IdPs

All of the work to be done to migrate the core IdPs is also applicable to the TouchstoneNetwork IdPs, with the exception to the work required to eliminate Stanford WebAuth from the core IdPs. We expect that once the core IdPs have been transitioned, it will simply be a matter of applying nearly the same configurations to the new CAMS IdP machines, and testing them. However, there is other work that is required. Today, CAMS uses a Tomcat realm for authentication. For the transition to Shibboleth 2.x, a new JAAS Tomcat module that interacts with the CAMS MySQL database needs to be written to support the username-password mechanism. The Internet2 provider JAAS Kerberos handler is not applicable in the CAMS situation.

In order to speed up the work we plan to drop the support for OpenID, at least temporarily. In the last several quarters we have seen only one user each quarter use this feature. It appears that in all cases it has been IS&T employees testing the feature. We do not believe that any customers are currently using the feature. We feel that migrating this feature to the new generation of IdPs will require several weeks of effort and this can be done post transition.

By dropping the OpenID support, temporarily, we believe this phase of the transition can be done in approximately four weeks of time.

Phase Three: improve SP registration

SP registration currently requires a person to send mail to an RT queue for processing. The person has to understand what type of information is required. Someone (Bob) has to edit the MIT-metadata.xml file with the submitted information in order for the registration to become effective.

Phase four: Misc

  1. InCommon is strongly advocating the use of inline certificates, i.e. inline in the metadata. This will mean that if MIT SP use a single certificate for the user facing SSL and for Shibboleth, when the certificate has to be renewed, the system administrators will also have to register the new certificate in the MIT, and potentially the InCommon Federation, metadata. Any decisions regarding such a transition at MIT will occur after the initial transition to the next generation of IdPs.
  2. InCommon will now accept self-signed certificates, or even certificates issued by the MIT CA, discussion of any such transitions will occur after we have transitioned the IdPs to 2.x. 

Projected timeline

Week of October 26 (~3 days available)

  • Request new VMs from NIST
  • Remove the artifact query support from the MIT metadata.
  • Schedule CAMS restart with new Moira WS settings
  • Work on new login pages (elimination of Stanford WebAuth)
  • (2 days on Athena)

Week of November 2nd (3 days available)

  • Work on new login pages (elimination of Stanford WebAuth)
  • (1 day on Athena)
  • (Bob will take one day of vacation)

Week of November  9 (2 days available)

  • (Vetrans Day)
  • Work on new login pages (elimination of Stanford WebAuth)
  • Work on Apache rebuild for RHEL3 (done)
  • (2 day on Athena)

Week of November  16 (3 days available)

  • Work on Apache rebuild for RHEL3 (done)
  • Install second instance of 1.3 IdP on idp2-dev.mit.edu
  • (2 days on Athena)

Week of November 23 (1 day available)

  • TH and Friday holidays
  • Wed travel?
  • (1 day on Athena)
  • Idp2-dev configuration and testing

Week of November 30 (2 days available)

  • Start Terracotta investigation (delayed)
  • (3 days on Athena)

Note: Bob plans to two two days of vacation in December, these are not yet scheduled or accounted for below.

Week of December 7 (0 days)

  • (5 days on Athena)

Week of December 14th

  • revisit schedule
  • meet with Mark to discuss DNS and F5 issues as they relate to transition plan

Week of  December 21 (2 days available?)

  • Terracotta
  • (Athena)
  • (Holidays)

Week of December 28 (3 days available?)

  • Terracotta
  • (1 Day Athena)

Week of January 4th (5 days available)

  • Terracotta
  • Start testing needs more detailed planning
  • This week will consist of testing the revised 1.3 IdP configuration

Week of January 11th

  • test the transition of the 1.3 and 2.x IdPs.
  • This will require the cooperation of Mark (or NIST) to perform the DNS changes as needed. 
  • Send transition issue message to help staff and stake holders

Deploy week of January 25th

Deploy to CAMS week of February 22


  • No labels