Research talk:Newly registered user

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

Work Log[edit]

Archive

Discussion[edit]

User names Versus Count[edit]

The outcome of the metric should be the number of registered users in a given project for the timerange selected rather than user names NRuiz (WMF) (talk) 15:33, 24 April 2014 (UTC)


Bots excluded?[edit]

Should bots be excluded from this measure? --Halfak (WMF) (talk) 17:26, 4 December 2013 (UTC)

I would say yes, definitely they should.--GlimmerPhoenix (talk) 18:48, 14 February 2014 (UTC)
Halfak (WMF), GlimmerPhoenix: I'd like to push back on this suggestion for the following reasons:
  1. new bot registrations should account for a negligible fraction of new user registrations in any given time period
  2. bot status is a property of a user that can change retrospectively. Until we have a different and dedicated process for bot registrations (there have been discussions on whether mandatory API keys are needed, for example), we're going to have to constantly update historical data to account for regular accounts switching to the bot group, I don't think this is worth the effort.
This said, bot identification is going to be a very high priority item for the other categories of metrics, I just don't think it's critical for this specific metric (at least until we start seeing bulk bot registrations). DarTar (talk) 19:47, 18 March 2014 (UTC)

log_action='newusers'[edit]

At least in the German Wikipedia, the logging table has a 4th type of entry corresponding to log_type='newusers'. This 4th type of entry also has the same value for the log_action column. The count (on an old dump) returns > 85K entries with this combination. Do we know which use case is triggering these 4th type of entries. Thanks. --GlimmerPhoenix (talk) 18:48, 14 February 2014 (UTC)

GlimmerPhoenix: Aaron and I looked into this yesterday (sorry I missed your comments on the talk page) and it appears that this was the log_action associated with regular account creations for a short time window (September 2005 - April 2006) until the new log_action (create) was introduced. We started a page to document known anomalies in historical data stored in MediaWiki's database. Contributions are very welcome --DarTar (talk) 19:51, 18 March 2014 (UTC)

Sensitivity analysis[edit]

The current proposal makes a number of assumptions but doesn't present yet a sensitivity analysis. For example, we could analyze the impact of including or excluding in the definition attached users, bot registrations, proxy registrations. DarTar (talk) 23:09, 18 March 2014 (UTC)

Qs from the Analytics Developers[edit]

The output of the SQL includes usernames however we are interested in just a daily count. Is there a particular reason the code doesn't return counts? This applies to the other metrics as well. KLeduc (WMF) (talk) 21:30, 24 April 2014 (UTC)

Mostly because I didn't know that you guys wanted counts. Can you provide me with a spec of what you expect from each of the metric SQL statements so that I can fix the SQL once and be done? --Halfak (WMF) (talk) 20:19, 25 April 2014 (UTC)

Difference between sample queries for wiki's logging table and Eventlogging[edit]

COUNT(*)-ing the lines emitted by the two Sample queries for “local”, I obtain a difference of ~0.7% (checked with enwiki, dewiki, elwiki) between the queries. Especially, since enwiki has a count of ~150K, it looks the queries are measuring different things. What is causing this difference? --QChris (talk) 12:38, 30 April 2014 (UTC)

As far as I can tell (see my work), these are records there were dropped from EventLogging, but appear in the production database. --Halfak (WMF) (talk) 19:44, 19 May 2014 (UTC)

Selection criterium for self-created accounts[edit]

As there are users having a log entry with matching log_type = 'newusers' AND log_action = 'create' for the same username on more than one project (e.g.: enwiki and dewiki, both in 2014), is checking for log_type and log_action selective enough? (Or how could users create such log entries in two different projects?) --QChris (talk) 12:55, 30 April 2014 (UTC)

Hey QChris, I'm not sure what you are referring to here, but the logging table is project database specific. So, in other words, it's no concern that users may register with the same name on different wikis. If you could show an example of such an entry after the deployment of central auth, we should take a look at it with csteipp. --Halfak (WMF) (talk) 19:22, 19 May 2014 (UTC)
Hi Halfak, sorry for the vagueness. Let's have an example:
  select * from enwiki.logging WHERE log_id = 54002013;
  select * from dewiki.logging WHERE log_id = 58580076;
The first one is from enwiki, the second from dewiki. (Username etc is probably not secret, but I prefer to not paste concrete data in here.) --QChris (talk) 12:26, 21 May 2014 (UTC)
QChris, I think I see now. Since this measure is generated on a per-wiki basis it shouldn't be a problem. Looking through centralauth, this user did go through the regular account registration process on both dewiki and enwiki, but later associated the local accounts with their global account via password. This is an unusual case, so I don't expect that it will have substantial implications for this method of filtering R:Attached users. In fact, this user is "attached" the expected way on 42 different wikis (log_action = "create2"). However, in the future, we may choose to reference the centralauth database to look for evidence of post-registration attachment via password. --Halfak (WMF) (talk) 16:23, 23 May 2014 (UTC)
@CSteipp:, see above – any clue why this happened? --DarTar (talk) 19:42, 23 May 2014 (UTC)