The chat logs are already published in the vault so I don't see how this is any different tbh
In game chat dump from 631 711 replays and 23 929 players
You don't see a difference between browsing through chat logs manually and mass-profiling single users and publish the results?
"Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
ā Benno Rice
I mean, it's just presenting public information in a more organized and readable way. It's the same difference as instead of looking at rating changes from each game in the replay vault there is a tool that shows you rating changes of each user every day and other similar tools eg. Kazbeks tool that allows you to see what map a specific user plays. Yes, in theory, it can be used in a harmful way, idk shaming someone for thrash-talking people or swearing at them or something.
How would FAF not be liable for some information issue here?
They :
- have made replay information public for everyone
- have made a parser to allow you to, uh, parse this information and even included instructions on how to use it
No idea about Europe but in the US there is a liability doctrine that doesn't let you just give a person tools, say "don't do that bad thing with the tools" and then wash your hands when you put zero effort into making it difficult to actually do said bad thing.
The only thing FAF hasn't done is give you step-by-step instructions on how to download replays from the vault to then use the tool.
I mean I don't get the issue in the first place, do people have legal ownership over the words they write in game or something? Wouldn't this already make the replay vault a "legal liability" unless you requested consent before publishing any replay?
Also, "You are not allowed to analyze single person behaviors or do a social rating and publish this (or basically do anything that relates back to a single user)" isn't this essentially what moderation does? Don't report results get reported back to the person that made the report? That's a publication of the analysis of a singular person's behavior.
I dont mean to have caused any legal trouble here, I was just interested in some data analysis. I should probably have started with things other than text chat first and gauge a response but text chat was the easiest for me to parse and make sense of.
from my perspective the information is already highly available in the replay vault publicly.
I do understand open source data can become sensitive when massed together.
Perhaps we need a disclaimer that replays and all information contained are available publicly? along with name history and rating history and anything else, ect? It has always been very obvious to me that they are but to others it may not be?
Legality aside there are clear morality concerns. A LOT of miscellaneous personal information is public on the internet if you try hard enough to search for it, but collecting and publishing it on public forums is not really ok. Argument that "it's already public" only stands up if you delve into technicalities. And if we had some certain moderator still active giving them this kind of idea would likely result into a mass mega ban or a big drama fest.
āæ https://www.twitch.tv/petricpwnz āæ
Scientifically proving that Blackheart is a weeb - https://imgur.com/a/J436c | https://clips.twitch.tv/AssiduousAverageOxMikeHogu
Everything is open assuming good will:
a) don't misuse the data
b) don't cause performance issues on the server
As long as everybody behaves we're good. If I see misuse I'll shut it down / make it unavailable to the public.
So far no lines where crossed, but I hope I made my point clear where the red lines are.
So @Nooby you did not cause any trouble yet. I just tried to proactively step in before things go in the wrong direction.
"Nerds have a really complicated relationship with change: Change is awesome when WE'RE the ones doing it. As soon as change is coming from outside of us it becomes untrustworthy and it threatens what we think of is the familiar."
ā Benno Rice
APOLOGIES THIS POST IS IN CODE FORMAT - it was the only way I could show my post while keeping the tabination of the word & count tables..
Currently learning python and decided to play with and analyse this dataset just out of curiosity.
It contains 20,405,216 words, spread across 23,929 files (representing that number of games) for a total data size of 112+MB
I removed the 2,760,185 non English words as I am only able to speak one language. So that's all the Russian, German etc words removed.
So there are 17,645,031 English words remaining, let's look at these.
None of the following proves anything, I just thought it would be interesting to have a look.
What are the actual most commonly used words?
WORD COUNT
---------- -------
to 1464477
sent 1305978
mass 743675
energy 667411
you 274816
me 263903
i 245113
give 223129
gg 179505
can 151937
the 131350
Nothing too suprising there. Let's look at some other word counts now.
Other words of note very commonly used:
WORD COUNT
---------- -------
air 92538
units 83868
lol 66198
unit 60286
t3 51683
need 49956
dont 48234
why 30133
help 25714
How friendly are the games?
WORD COUNT
---------- -------
pls 63084
gl 41712
hf 39946
plz 26900
ty 26191
nice 26087
please 26005
glhf 19296
thx 16672
sorry 15812
thanks 8638
sry 5498
How toxic are the games? Actually, not as much as I might have worried..
WORD COUNT
---------- -------
fuck 21997
shit 19482
fucking 16963
frustrating 16911
fucked 6089
ffs 5988
damn 5585
idiot 5494
ass 3068
asshole 800
What about issues in the game?
WORD COUNT
---------- -------
lag 13433
re 19892
kick 10327
afk 5564
lagging 5035
eject 4792
lags 4251
How often are the game enders mentioned?
WORD COUNT
---------- -------
nuke 28959
mavor 13966
para 6724
paragon 3185
scathis 4679
yolo 4106
novax 1669
yolona 1296
salvation 1047
And the experimental units?
WORD COUNT
---------- -------
spider 6160
monkey 4272
gc 4031
chicken 2145
fatboy 2340
czar 1874
mega 1650
fatty 1335
tempest 1188
ahwassa 1083
monkeylord 501
megalith 473
ythotha 449
ripper 384
atlantis 379
asswasher 348
colossus 312
soulripper 55
Which races get talked about most? Presumably due to asking for engineers to make Hives and Kennels:
WORD COUNT
---------- -------
cybran 13614
uef 13065
aeon 8793
sera 5768
seraphim 1088 (much faster to just type sera!)
How are the commanders referred to?
WORD COUNT
---------- -------
com 11785
acu 11555
Does playing FAF give you headaches? Because ibuprofen is mentioned 187 times.
@scout_more_often Dont apologize! This looks amazing dude! Could even make a graphic out of this data! a FAF interesting Chat Facts sheet!
FAF Website Developer