Single login/IMSoP2

From Meta, a Wikimedia project coordination wiki
This is an archived proposal for the implementation of single user login. For more information, see the single login documentation page.


This page was intended to be a combination of, and expansion on, two different pages: the one now located at Single login/Kowey (talk) and Single signon transition (talk). However, the latter was restored without really clearing up the distinction between it and this, so the two pages now overlap again. Nonetheless, there are points here which are not currently discussed elsewhere.

Legal issues[edit]

To comply with the GFDL and other copyright law, the record of who did what on a given wiki must be accurate.

  • Any changes of name must be carried through to all references in the cur and old tables of the database.
  • Ideally, all textual references, such as signatures and other links to User: and User_talk: pages should also be changed, although this is problematic:
    • a user's name may appear as plain text, where it could look like a real word; it may also appear in interwiki links (e.g. signing on meta:) and have external links pointing to the User: page.
    • a note on the user's User: page may be sufficient, although this is easier if nobody uses that name at all anymore. If such a notice appears on the page of another existing user, it should only be removable by the user whose name has changed, not the user who now controls that page.
    • This is particularly important with respect to reputation: misidentifying a user leads to associating them with somebody else's actions.

Username migration and merging[edit]

One of the biggest issues with converting from multiple logins to a unified system is how to deal with usernames that are currently in use on more than one project.

  • A "feasibility study" should be run to determine the scale of this issue: inspecting the current user tables in all Wikimedia databases to find:
    1. how many unique usernames have been chosen across all projects, and would need to be migrated in one way or another
    2. how many of these are used on more than one project
    3. how many of those have enough attributes in common to be likely candidates for auto-migration

Auto-migration[edit]

Some, and possibly most, cases of duplicate usernames will be the same user who has accounts on more than one project. Ideally, as many of these as possible should be migrated with little or no human intervention.

When?[edit]

  • Unique usernames, that exist on only one project, can simply be copied into the unified database.
  • Matching passwords between accounts are required for completely automatic migration; without them, the user needs to specify which password to use for the merged account.
    • Is same username + same password sufficient to assume same user; how likely is it that this would occur by coincidence?
  • Matching e-mail addresses - where entered - are a pretty good sign that this is the same user.
  • IP addresses could also be used but these are a)not often known and b)not unique to one person (dynamic allocation, proxies & firewalls, etc)
  • Interwiki links exist on some people's user pages, and the existence of reciprocal links could be taken as proof. This would require fairly complex processing, but could perhaps be used as a "last resort".
  • Users could be urged, ahead of the transition, to alter all their logins to the same password (and e-mail address?); this would increase the amount of accounts that could be automatically migrated.
  • Accounts under which no edits have been made could be treated as non-existent. Alternatively, such accounts could just be biased against in later name resolution procedures.

How?[edit]

  • All references to each of the user_ids being merged would have to be changed to the new merged user_id. However, no textual references would need to be altered, since the name would remain the same.
  • Information for the new login would have to be created algorithmically from multiple existing records
    • Passwords and e-mail addresses would all be the same in order to get this far
    • What about preference information?

Name conflicts[edit]

Where no auto-migration is possible, some strategy is needed for resolving the conflict between different users with the same name.

Automatic forced resolution[edit]

One option is that accounts could all be auto-migrated, with some being renamed to remove conflicts.

  • Would need to be preceded by publicity to increase scope of auto-migration.
  • Who gets the original name?
    • Some automatic decision could be made, but it is not obvious what the key criterion would be: number of edits? time of creation? having admin status on one or more projects? None of these gives an accurate picture of the level of name-recognition each user is likely to have, which is what is at stake.
    • The best option may be to rename all conflicting accounts.
  • Usernames might be something like User:Foo_wp_en (where the original name was User:Foo, and this is the account imported from the English Wikipedia). Alternatively, users could be given a chance to pick a new name, but this would hinder the automation of the process.
  • During the final transition developers will need to use their judgement to avoid duplicates. May consider blocking new account creation for a day or so before transition, to give time for reports to be run and conflicts resolved. The community can review these decisions later and ask for name changes as necessary.

Pros:

  • Allows the entire set of users to be migrated at once, with no technical "transition period" or database duplication.

Cons:

  • Would require replacement of textual references to every account that has been split in this way, since no account would now exist with the original name.
    • Alternatively, some note or magic redirect could be placed at the old user-page pointing to the migrated account.
  • Could lead to false positives: anyone who didn't prepare their accounts for auto-migration would find that far from a "single login", they now had multiple logins, none of which bore their original name. This would lead to people requesting account re-mergers, meaning the textual references would have to be reverted to their original form.
    • It would also be difficult to alert these users to the problem when they tried to login, since their old login name would simply be invalid.
  • Impolite, particularly to users who are unaware this is going on. A nasty shock to users who go away for the month during which this change is implemented.

Resolution by discussion[edit]

Alternatively, a clearing-house could be provided where users from different communities with the same name could attempt to get in contact and decide who should relinquish the name.

  • Users whose name appeared to be used by different people elsewhere would be alerted, either on their User_talk page or vai a message at login.
  • Users could prove that they held multiple accounts by providing the passwords for all of them on the same screen.
  • Facilities for manually migrating an account to a new name of choice could be made available to affected users.
  • A time-limit could be set for the other user(s) with the same name to respond, after which those who haven't responded have their accounts forcibly renamed, as above. This stops non-existent users being forced to give up their name because somebody hasn't turned up.
    • Again, how does the user with a changed name then log in?
  • We should encourage those who use their real name to have precedence over those using the same name as a pseudonym, because copyright law gives pseudonyms a shorter copyright term than those using real names and it's convenient to have a real name be obviously a real name, with obviously different copyright term.
  • Avoid breaking external links. If one has extensive links from outside, we should break the fewest links. Since the user page will belong to a different person, there is potential for harm to the reputation of the person with the external links, since those not from the project will be unaware of the name change situation.
    • Use the day this discussion began as the cut-off day if there are disputes about external links, so neither person can create more to influence the decision.

Pros:

  • Allows users to choose their own replacement name, or convince others to give up theirs, rather than simply being told it has already happened.
  • Reduces the number of textual replacements needed, since it provides more chances for users to point out that their accounts can be merged.
  • Provides more possibility for someone to keep the original name, since everyone involved can state that accept any consequences of the decision.

Cons:

  • Requires that the two login systems be available at the same time, at least for a period and possibly indefinitely. This could be achieved using #Abstracted user table access.
  • Slow. This doesn't entirely matter, but some people do get fed up with having to discuss everything. Perhaps some arbitration would be needed to make sure the conversations were resolved in reasonable time.

How to rename an account[edit]

  • Ideally, every reference to the old name should be changed:
    • cur, old and all other database tables where the user name text is stored. old can be done after the down time, by creating a table recording which versions have been converted and using that to work through them all.
      • Changing article histories is a violation of the GFDL.
    • Signatures, if the signature is default or in the user details. Will be very slow for old, use the table to record changes approach.
      • Record the original signature in a HTML comment in case there is a problem.
      • Change which namespaces? All talk, Wikipedia namespace articles?
    • Textual references are even harder to find: is "John said..." referring to User:John, or just somebody called John; is "fluffy" referring to a user called "fluffy_dice"? These references may be impossible to change in practice, which could cause confusion.
  • Where there has been a name conflict, some notification should be placed on the user page at the wiki where the changed name resides. So if wp:fr: user A gets the name, and wp:en: user A had their name changed, wp:en: user page for name A should say that the old wp:en: user called A is now the global user A_wp_en. A_wp_en can choose to remove that notice when they find out about the change. A from fr should leave the notice there. This resolves two problems:
    • How A_wp_en can find their new name easily.
      • But they won't see this if they just try to log in, so a message saying "perhaps you are logging in with a username that has been changed in your absence" could be added to the password failure screen.
    • How those following a link to the user page can be informed that they may not be looking at the page of the person they are expecting, and where they may find that page now.

Other implementation considerations[edit]

I understand there is a patch already "out there somewhere" for doing the unification, but I've not seen it, so I can't tell if any of these issues have already been considered. I may also have misunderstood a few operational details, not being 100% familiar with the code. Please amend as appropriate. - IMSoP

Cookies[edit]

"Cookies", which are used to identify the fact that a user is logged in, have to be associated with a specific domain for security reasons. Thus a "single login" will not be able to log in a user to all projects, since they are at different domains.

  • Cookies can be language independent, because they can be set to (for instance) ".wikipedia.org" rather than "en.wikipedia.org". The user would still have to re-login to any other project, such as wiktionary or wikibooks.
  • It may be possible to create a "quick login" system using some central authentication domain (e.g. login.wikimedia.org) which would send a special authentication to the appropriate Special:Login (or another page) telling it to set the cookie without requesting a password.
    • Redirects with POSTDATA could be used to send the authentication info, which could then be checked in the database. Thus the user would click "login" and wait for their browser to load a couple of blank pages, and find themselves logged in to the other project. Still not ideal, but easier than manually logging in.
    • There may be JavaScript tricks which could be used, but these should probably be avoided if possible; they could be useful as a "last resort" convenience feature, if it didn't break normal logins.

User rights[edit]

There is currently a user_rights field in the user table (info). If the user table were in a project-independent database, there would only be one set of rights.

  • The bot status should be applicable everywhere, since it is an objective fact about the account.
  • The sysop and bureaucrat statuses, however, should remain valid only on the project they were granted for, since they represent the decision to grant trust of a particular community.
    • New projects cannot be effectively run without at least one user with sysop status, so those setting up a project in a new language are often granted sysop status there; this should not provide a shortcut for gaining extra rights on other, larger, projects.
    • The single value "sysop" could therefore be replaced in the database by a per-project value, such as "sysop_wp_en" or "sysop_wikt_de". A project would then only obey the values which had the affix for that project.
  • Bans should also be considered. It is probably acceptable for these to be made global, although this may be controversial if some communities apply them more readily than others.

User_talk notification[edit]

The "you have new messages" message is currently managed in its own table: user_newtalk. Simply having one central record for each user will not be sufficient, since each user will have a User_talk page for each project.

  • The message could remain local to each project, (still a local table, but with user_id pointing to the id in the central user DB) and only appear while viewing a page from that same project.
    • This may be confusing since we will be telling people they now have a single login, and they may not understand why this is not therefore centralised.
    • It is, however, the easiest solution, so is possible as an "interim" measure.
  • The message could be reworded to point to which project(s) the user has new messages in. The database would have to be similarly redesigned to store zero or more project names against the user's name, rather than simply a boolean value.

Watchlists[edit]

One of the major requests has been and will be to unify watchlists between projects. This is probably not a priority, since they will not immediately break.

  • Complete unification would require significant redesign to efficiently store and retrieve a list of pages, and associated changes, from multiple DBs.
  • A compromise solution would be to add a set of links to each watchlist, such as "Your other watchlists: Wikipedia: en fr Wiktionary: en Meta". This could be easily achieved with a record in the user DB to store a list of projects in which the user had non-empty watchlists.

Preferences[edit]

  • There are problems with certain skin/preference/browser combinations which are project/language-specific (see Help:Preferences#Skin); merging the user's preferences across projects means forcing them to make a one-size-fits all decision.
  • Users may also want different preferences on more seldom-visited wikis, such as "always mark edits as minor" on a wiki where they normally only add interlanguage links.

Running old and new structures in parallel[edit]

It may be useful to continue running the old single-project user lists while the new system is becoming established. This could be done as a temporary measure, with a set deadline after which disputes would be forcibly resolved and all accounts migrated; or, it could be indefinite, with users encouraged to migrate their accounts, but given as much time as needed to do so.

  • Accounts in the unified DB could be given user_id values above the highest previously assigned on any project (in practice, probably the highest user_id on en.wikipedia at the time of switchover). Thus the software could know immediately that a user_id below this needed to be found in the old tables.
  • All existing (unmigrated) usernames would have to be "reserved", by creating dummy entries in the unified DB, so that they could be migrated in the future but not registered by anyone else for a new account.
    • Lookups on these dummy names would have to resolve to a per-project record, like low user_ids.
  • New accounts would always be created in the unified DB.
    • Users whose accounts were pending migration would therefore be unable to register their name on a new project; if the name was in dispute, they would have to create a new (global) username in order to log into any project they didn't previously have an account for. Hopefully, this will not too often be a problem, and of course if somebody else uses the same username this can already be true.

Pros:

  • Allows for a more flexible transition approach: some users can have migrated while other, more complex, cases are still pending.
    • Creates a "discuss first, then act" system; otherwise, developers will be forced to "act first, then invite review" on "Changeover Day".
  • Minimises disruption to users: database downtime could be reduced by doing some of the migration while people were using the site.
  • The transition can begin sooner, since no delay is needed while users prepare their accounts - any manual account merging can be done after the initial changeover date.

Cons:

  • Requires a more complex software design, such as the one below.
  • As people get used to logins being global, they may be confused by the continued existence of different people with the same name.


Abstracted user table access[edit]

To avoid every part of the code having to have cases for migrated and non-migrated users, the user database could be accessed only through an abstracted class - either a redesign of User.php, or a special UserDatabase.php. This class would deal with selecting the correct source of info, and translate it such that other code could be naive of its source.

  • Some global variables could be set that enabled the access functions to know which project the request applies to.
  • if(user_id ≥ $FIRST_SSO), the unified table would be used (where $FIRST_SSO is the first user_id assigned in the unified DB).
    • If project-specific values, e.g. user_rights, are requested, they would be filtered from the unified table and returned as though they applied globally (e.g. a user_rights value of "sysop_wp_en" would be returned as "sysop" for a request from the English Wikipedia, but as "" for any other project)
  • For user_id < $FIRST_SSO, the old individual project tables would be used (copied into the same actaul database at merge-time); the calling code would not need to know that such an access had been made.

Pros:

  • The rest of the code doesn't need to know about the single login system, whether an account has been "migrated" or not, or even whether the single-login system is in use.
    • This allows other people to use MediaWiki without a separate user database, since only one class would need to maintain support for such a setup. Without this, it will be a tough call how long to continue supporting such a configuration.
    • Anyone wanting to unify MediaWiki and some other database-driven system can have the User information fetched from wherever they like, or in whatever combination they like, by changing only this one class.
  • Assuming some code is going to be needed for dealing with the transition, this approach actually will actually be cleaner and more maintainable, since it isolates the code to one part of the software.

Cons:

  • It's still more complicated (especially to maintain in the long-term) than having a run-once-and-pray conversion system, and assuming only the unified DB will ever be used again.

Security[edit]

Local active users[edit]

There are also many users that don't follow more than 1 wiki (usually their native language's wiki). If i follow only it.wiki, what if someone stole my password and vandalize en.wiki? I can't easily notice it and change my password, since i don't look on en:Special:Recentchanges or similar. So, how could we deal with this?

In some ways, single login would make this less of a problem: right now, I could find out your username (on it.wikipedia) and register an account on en.wikipedia without needing any other information about you. If your username was recognisable, any vandalism I performed under that account could then be blamed on you. With a single login system, however, I would not be able to do this without first stealing your password - and as with any system, both the software and the users should always do everything they can to make that as hard as possible.