P(3|2) seems fairly conclusively down now:
That’s very interesting! If there’s a real phenomenon behind that, we’ll really need to understand it. I find myself basically not believing it—that there must be some weird other confounding effect, that we’ve changed something about what “counts” as a notification or a session.
P(did session 1 | eventually collected 80% of an essay):
P(did session 3 | eventually collected 80% of an essay):
I’m having a lot of trouble squaring this with the data we saw earlier, and with the MAU/retention data. I'll need to do a much more careful analysis.
Quick take, after a bunch of exploratory analysis: there’s so much inter-month variation in compliance that a natural experiment may be impossible: it may be necessary to run an RCT.
Here’s a month-cohort per-session compliance series: Google Cloud Platform
If that trend continues, we’ll have cut a large amount of review time off for readers.
3|2 is now at: 84±1.7% (N=1778) -> 82±8.1% (N=84). We’ll get more samples soon: 145 new-experience readers have finished their second session, so we only have third-session data from a little more than half of them. Query
We have a little more resolution on 2|1: 79±1.7% (N=2277) -> 86±5.2% (N=169), up from N=125 and an interval of 5.5% on 12/05.
Looking again at session 1 compliance:
To get a leading indicator of the impact on retention, let’s look at P(did session 2 | did session 1) for users who joined before/after introducing the new user journey elements. Still not enough N among readers with 80% of cards in their first session, so I’ll do this analysis across all readers.
The CI is pretty wide on the new cohort, but the intervals don’t overlap; that looks like a real shift.
There may be some influence of the teleportation essay here: maybe people who just read that essay are more likely to do a second session because they could review the whole thing in their first one? I could tell the story the other way pretty easily: many people who just read teleportation will have a huge interval between their first and second session, which is ample time to churn.
What about P(does session 3 | did session 2)? Do we have enough samples? Not quite. We went from 84% ± 1.7% (N=1774) to 88% ± 8.7% (N=52).
WITH
eligible AS (
SELECT
userID
FROM
`logs.compliance`
WHERE
sessionNumber = 1
AND studyTimestamp IS NOT NULL),
conditions AS (
SELECT
userID,
CASE
WHEN timestamp >= TIMESTAMP("2019-11-12") THEN "new"
WHEN timestamp < TIMESTAMP("2019-10-01") THEN "old"
ELSE -- Throwing out users who would have seen some partial version of the new experience.
NULL
END
AS bucket
FROM
`logs.registeredUsers`),
means AS (
SELECT
bucket,
COUNTIF(studyTimestamp IS NOT NULL
OR hoursLate >= 24*7) AS N,
COUNTIF(studyTimestamp IS NOT NULL) / COUNTIF(studyTimestamp IS NOT NULL
OR hoursLate >= 24*7) AS fraction
FROM
`logs.compliance`
JOIN
eligible
USING
(userID)
JOIN
conditions
USING
(userID)
WHERE
sessionNumber = 2 AND bucket IS NOT NULL
GROUP BY
bucket),
cis AS (
SELECT
*,
1.96 * SQRT((fraction * (1 - fraction)) / N) AS CI95
FROM
means)
SELECT
*,
fraction - CI95 AS lower,
fraction + CI95 AS upper
FROM
cis
ORDER BY
bucket