2 Apr 2021 |
n8fr8 | We have a bunch of ideas there, and can help you sort these things out. The actual code implementation or log configuration isn't hard... moving away from "spy on and record everything, and then we'll figure it out later" is. | 17:00:31 |
simonft | Thanks! We're currently doing something similar to the matomo proxy, but instead nginx is sending IP-less logs to a script that sends them up to matomo. This gets us raw pageviews, but the rest of the data in matomo ends up being quite a bit less useful | 17:08:01 |
n8fr8 | Yes, we've had the same issue, somewhat with Matomo quickly losing usefulness, at least in its default dashboard configuration. | 17:09:05 |
n8fr8 | Some of the work we are doing is related to custom events, so you can decide you want to understand use of a certain feature on the site, or interaction with aspects of a specific story or content | 17:09:41 |
n8fr8 | This is what F-Droid did for their app popularity contest, for instance, or to try and understand how often users are failing to install an app | 17:10:04 |
simonft | Oh interesting. I think that would be useful to us. | 17:13:50 |
simonft | We've also been looking at self-hosting plausible. Last time I looked into it I had some concerns about backing up the visitor data without also backing up the rotating salt they're using to make it extremely difficult to brute force the hashes. They were thinking about ways to fix that though. | 17:14:00 |
simonft | And we've had on-and-off talks with Ian Goldberg about various possible differential privacy schemes if we want to get fancy. One of the things we're thinking about is how badly we want numbers of "unique" visitors, and how correct that number needs to be | 17:15:07 |
n8fr8 | Diff Privacy can start working well at a large scale, though as a way to get a specific count of visitors, yeah, it can be tricky. It could be useful to get a rough idea of typeface preference, or doing A/B testing of one layout versus another. | 17:20:41 |
n8fr8 | I think it could be interesting to try to do some analysis of the raw page views to group them into sessions, and then estimate unique visitors from there. Overall, I will admit implementing CI for web visitors is not our strongest area, but we are putting more effort into now, as it keeps coming up. Even for those with apps, they want to be able to link it to web traffic, and have one system for both. | 17:23:09 |
gina_h | So glad you're here @simonft! | 19:13:04 |
simonft | glad to be here! and glad others are thinking about these problems | 19:37:32 |
3 Apr 2021 |
@eighthave:matrix.org | hi simonft, welcome! glad to have the publisher's perspective here. You might be interested in the comparison between the Clean Insights approach and differential privacy https://guardianproject.info/2021/03/02/new-insights-into-clean-analytics/ | 11:18:44 |
@eighthave:matrix.org | in other Clean Insights news, Apple is rejecting apps with SDKs that do not get user consent https://9to5mac.com/2021/04/01/app-store-now-rejecting-apps-using-third-party-sdks-that-collect-user-data-without-consent/ | 11:20:05 |
6 Apr 2021 |
simonft | _hc: thanks for that comparison! | 14:24:45 |
simonft | Do people here have thoughts on Prio? https://www.abetterinternet.org/prio/ | 14:25:07 |
@eighthave:matrix.org | Clean Insights does various kinds of aggragation like is mentioned in that brief description. My first thought is that sounds like we have the same goals | 14:30:15 |
@eighthave:matrix.org | they don't seem to have anything you can use yet | 14:30:29 |
@eighthave:matrix.org | it seems very focused on one specific part of the problem: the aggragation | 14:31:05 |
simonft | yeah. https://github.com/abetterinternet/prio-server is a thing but there's not a lot more details on how to actually run it | 14:31:45 |
simonft | they reached out to us but I don't have much info about why | 14:31:55 |
@eighthave:matrix.org | ocumentation
Integration Guide for Android and iOS Applications (Coming Soon)
How to Operate a Data Share Processing System (Coming Soon)
| 14:32:06 |
@eighthave:matrix.org | based on their paper, their use case is more like differential privacy, where you assume that you're aiming to collect PII | 14:33:41 |
@eighthave:matrix.org | we're trying to push analytics without gathering PII and see how far that can get us | 14:34:05 |
simonft | When you talk about PII, does that include some sort of unique device identifiers for the device/user? E.g. I could imagine estimating uniques, or say pagevisits per user, without collecting or storing things like name, username, location | 14:42:22 |
@eighthave:matrix.org | yes unique IDs are PII | 16:24:40 |
@eighthave:matrix.org | its really hard to gather a lot of data tied to a pseudonym, then keep it anonymous | 16:25:10 |
@eighthave:matrix.org | unless the types of data gathered is really restricted | 16:25:27 |
@eighthave:matrix.org | I mean they are probably not considered PII by GDPR, unless things are deanonymized. | 16:26:14 |
@eighthave:matrix.org | avoiding tracking people makes a lot of other things much easier | 16:26:34 |