6.2 Comparing Different Sources for WhatsApp Public Groups on Web
6.2.2 Dataset of Discovered Groups
With that process, a dataset 9 with more than 270,000 distinct public groups of WhatsApp from four disclosure sources, as described in Table6.7. We observed that online repositories have a relatively larger number of groups than social networks, although we need to take in account that these groups represents a short period for social networks compared to the whole database from online repositories since 2018. In concordance to other previous results made on Twitter exclusively, we can observe there is a significant number of WhatsApp public groups in Brazil, with thousands of groups available online in distinct sources. In addition, the Table 6.7 provides the number of unique users who shared groups and the total number of groups still active, as verified in March 2022.
When we observe the intersection of groups between sets with the diagram in Figure6.14, we notice the a small intersection between them. This indicates a considerable difference for each group source and a kind of independence between them, in which groups found on a specific social network are unlikely to appear elsewhere. This observation is relevant considering that works that look for public groups must seek for different sources of data in order not to ignore a significant portion of the data due to the bias of origin of the groups.
9The public groups found in the repositories are available at: <https://doi.org/10.5281/zenodo.
7017909>
6.2. Comparing Different Sources for WhatsApp Public Groups on Web 108 Figure 6.14: Venn Diagram of intersection of groups discovered in each source.
Source: The Author.
With this data, initially, we analyzed the volume of groups sharing, as well as the number of groups still active. Then, we carried out an extensive investigation of the topics discussed in the groups shared in the repositories and in social networks. To explore the categories of groups shared on social networks, we used a statistical model with latent variables to detail the topics in the groups.
There is a difference between the total content collected and the number of unique groups found in each source. This is because the invite link to the group can be shared more than once in each dataset. In Figure 6.15, we observe the cumulative distribution function (CDF) with the amount of occurrences for each invite URLs found. We’ve noticed that on social media, it’s common for different publications to post the same group invite link, whether it’s a single user spamming his own group or even multiple users sharing a single popular group through the platform. About half of the groups are advertised multiple times on social networks, while in repositories, more than 95% of groups were registered only once. This suggests that, unlike group repositories where there is a more horizontal and equal position between the groups, social networks have a wide variation in the visibility that each group receives on the platform, which means that exists a hierarchy structure that some groups may become more popular simply because they are more shared.
Furthermore, in online repositories, we have the date of inclusion of the group.
With that, we can see in Figure6.16, the number of groups created over time. The website
“Grupos de Zap” has had groups since 2017, while “Grupos de Whats” is more recent, with the first groups dating from 2019. Although the creation of groups on “Grupos de Zap” remains stable, after 2020, in addition to the “Whats Groups” groups, we have a
6.2. Comparing Different Sources for WhatsApp Public Groups on Web 109 Figure 6.15: Volume of occurrences for each WhatsApp group invite link in each dataset.
100 101 102 103 104
#Occurrences per Group 0.0
0.2 0.4 0.6 0.8 1.0
CDF
Twitter Facebook
(a) Twitter and Facebook.
100 101
#Occurrences per Group
0.0 0.2 0.4 0.6 0.8 1.0
CDF
Grupos de Zap Grupos de Whats
(b) Online Repositories.
Source: The Author.
growing interest in public WhatsApp groups, with weeks in which more than a thousand new groups were created on both platforms, further highlighting the importance that this ecosystem has attracted for the Brazilians. The sudden increase in the creation of groups on these sites also coincides with the emergence of the COVID-19 pandemic, a period when long-distance relationships became more common, and remote means of communication, such as WhatsApp, passed to represent an important part of our daily lives. to-day.
In addition to creating and publishing the invitation link to your group, admins can also revoke the invitation, making the group inaccessible. Therefore, not all links remain active after a certain time. In the “Active Links” column of the Table 6.7, we observe the percentage of invitations still valid, verified in March 2022. In practice, it is noticed that most of the public WhatsApp groups are available for a limited period.
On Twitter, just over half of the invitations were revoked after 3 months of collection, while on Facebook 30% of the groups are already inaccessible. In the repositories, an even smaller percentage is found, given that they have registered older groups. These results Figure 6.16: Number of groups registered weekly in the online repositories of public WhatsApp groups.
01/03/18 01/05/18
01/07/18 01/09/18
01/11/18 01/01/19
01/03/19 01/05/19
01/07/19 01/09/19
01/11/19 01/01/20
01/03/20 01/05/20
01/07/20 01/09/20
01/11/20 01/01/21
01/03/21 01/05/21
01/07/21 01/09/21
01/11/21 1
10 100 1.000
#Created Groups
Grupos de Zap Grupos de Whats
Source: The Author.
6.2. Comparing Different Sources for WhatsApp Public Groups on Web 110 show a dynamic and ephemeral nature of public groups, in which new groups are created and others become inaccessible in a short period of time.