Skip to content

Total blocked users #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ghost opened this issue Jun 15, 2023 · 3 comments
Open

Total blocked users #17

ghost opened this issue Jun 15, 2023 · 3 comments

Comments

@ghost
Copy link

ghost commented Jun 15, 2023

The total amount of active monthly users from instances BlockIng or Blocked By the current instance.

Would be useful to choose an instance with few total blocked users and don't have the urge to use another account to check if there are comments you aren't seeing. I think it's important because BlockIng or Blocked By a large community is very different from a small one.

@maltfield
Copy link
Owner

@JediMaster25 Do you know if it's possible to get this information from the API?

@maltfield
Copy link
Owner

maltfield commented Jun 15, 2023

Oh, you don't mean counting the number of blocked users that are not on the BI or BB list (because instances can block users explicitly in addition to blocking instances explicitly). You mean counting the number of users that are on the BI and BB list.

Or do you mean the sum of [a] explicitly blocked users, [b] all active users on all of the instances on the BI list, and [c] all active users on all of the instances on the BB list?

@ghost
Copy link
Author

ghost commented Jun 16, 2023

The code first defines a blocking_instances property that returns a list of instances from the blocked_domains. It does this by iterating through the blocked_domains and calling the get_instances_by_domain function for each domain. The get_domains function creates a set that contains unique domains by combining the blocking_instances and blocked_by lists and extracting the "domain" key from each instance. The resulting set is then converted back to a list.

The get_active_users_count function takes an instance as input and returns the value of the users_active_month property. The calculate_total_blocked_users function takes a list of domains and calculates the total number of blocked users by iterating through the combined list of blocking_instances and blocked_by, checking if the instance's domain is in the provided domains list, and summing the users_active_month property of the matching instances. Finally, the blocked_users property calls the get_domains function to get the unique domains and then calls the calculate_total_blocked_users function to calculate the total number of blocked users.

#!/usr/bin/env python3
import csv
import json
import numpy
import pandas as pd
from typing import List, Dict

LEMMY_STATS_CRAWLER_FILEPATH = "lemmy-stats-crawler/lemmy-stats-crawler.json"
UPTIME_FILENAME = "uptime.json"
OUT_CSV = "awesome-lemmy-instances.csv"
UPTIME_UNKNOWN = "??"
MIN_USERS = 60
MAX_USERS = 1000
CSV_HEADER = "Instance,NU,NC,Fed,Adult,↓V,Users,BI,BB,BU,UT\n"
README_FILENAME = "README.md"
README = """
# Awesome Lemmy Instances

This repo was created to help users migrate from reddit to lemmy (a federated reddit alternative).

Because lemmy is federated (like email), there are many different websites where you can register your new lemmy account. In general, it doesn't matter too much which server you register with. Just like with email, you can interact with users on other servers (eg hotmail, aol, gmail, etc).

However, each server has their own local policies and configurations (for example, some lemmy instances disable the "downvote" button). The table below will help you compare each site to decide where to register your new lemmy account.

### Terms

 * Instance = A lemmy instance is a website that runs the lemmy software
 * Community = Each instance has many communities. In reddit, **communities were called subreddits**.
 * NSFW = Not Safe For Work

### Legend

 * **NU** "Yes" means that **New Users** can register accounts. "No" means that this instance is not accepting new account registrations at this time.
 * **NC** "Yes" means that you can create a **New Community**. "No" means that only admins can create new communities on this instance.
 * **Fed** "Yes" means that you can interact with other **federated** lemmy instances. "No" means that the instance is partially or fully siloed (you can only subscribe to communities on this one instance or other instances that are explicitly added to an allowlist)
 * **Adult** "Yes" means there's no **profanity filters** or blocking of **NSFW** content. "No" means that there are profanity filters or NSFW content is not allowed. Note: "Yes" does not mean all NSFW content is allowed. Each instance may block some types of NSFW content, such as pornography. Additionally, you can configure your account to hide NSFW content. 
 * **↓V** "Yes" means this instance **allows downvotes**. "No" means this instance has turned-off downvote functionality.
 * **Users** The **number of users** that have been active on this instance **this month**. If there's too few users, the admin may shutdown the instance. If there's too many users, the instance may go offline due to load. Pick something in-between.
 * **BI** The number of instances that this instance is completely **BlockIng**. If this number is high, then users on this instance will be limited in what they can see on the lemmyverse.
 * **BB** The number of instances that this instances is completely **Blocked By**. If this number is high, then users on this instance will be limited in what they can see on the lemmyverse.
 * **UT** Percent **UpTime** that the server has been online
"""
README_RECOMMENDED_INSTANCES = """
# Recommended Instances

Just **click on a random instance** from the below "recommended" instances.

Don't overthink this. **It doesn't matter which instance you use.** You'll still be able to interact with communities (subreddits) on all other instances, regardless of which instance your account lives 🙂
"""
README_WHATS_NEXT = """
# What's next?

## Subscribe to ~~Subreddits~~ Communities

After you pick an instance and register an account, you'll want to subscribe to communities. You can subscribe to "local" communities on your instance, and (if you chose an instance that isn't siloed) you can also subscribe to "remote" communities on other instances.

To **find popular communities** across all lemmy instances in the fediverse, you can use the [Lemmy Community Browser](https://browse.feddit.de/) run by feddit.de.

 * https://browse.feddit.de/

<a href="https://tech.michaelaltfield.net/2023/06/11/lemmy-migration-find-subreddits-communities/"><img src="lemmy-migration-find-subreddits-communities.jpg" alt="How To Find Lemmy Communities" /></a>

For more information, see my guide on [How to Find Popular Lemmy Communities](https://tech.michaelaltfield.net/2023/06/11/lemmy-migration-find-subreddits-communities/)

## Other links

You may want to also checkout the following websites for more information about Lemmy

 * [Official Lemmy Documentation](https://join-lemmy.org/docs/en/index.html)
 * [Intro to Lemmy Guide](https://tech.michaelaltfield.net/2023/06/11/lemmy-migration-find-subreddits-communities/) - How to create a lemmy account, find, and subscribe-to popular communities
 * [Lemmy Community Browser](https://browse.feddit.de/) - List of all communities across all lemmy instances, sorted by popularity
 * [Lemmy Map](https://lemmymap.feddit.de) - Data visualization of lemmy instances
 * [The Federation Info](https://the-federation.info/platform/73) - Another table comparing lemmy instances (with pretty charts)
 * [Federation Observer](https://lemmy.fediverse.observer/list) - Yet another table comparing lemmy instances
 * [FediDB](https://fedidb.org/software/lemmy) - Yet another site comparing lemmy instances (with pretty charts)
 * [Lemmy Sourcecode](https://github.com/LemmyNet/lemmy)
 * [Jerboa (Official Android Client)](https://f-droid.org/packages/com.jerboa/)
 * [Mlem (iOS Client)](https://testflight.apple.com/join/xQfmkJhc)
"""
README_ALL_INSTANCES = """
# All Lemmy Instances

Download table as <a href="https://raw.githubusercontent.com/maltfield/awesome-lemmy-instances/main/awesome-lemmy-instances.csv" target="_blank" download>awesome-lemmy-instances.csv</a> file

> ⓘ Note To view a wider version of the table, [click here](README.md).
"""


class LemmyInstance:
    def __init__(self, instance_details: Dict, data: List[Dict]):
        self.instance_details = instance_details
        self.data = data

    @staticmethod
    def sanitize_text(text: str) -> str:
        return text.replace("|", "").replace("\r", "").replace("\n", "")

    @property
    def federated_instances(self):
        return self.instance_details["site_info"]["federated_instances"]

    @property
    def blocking_instances(self) -> List[Dict]:
        federated_instances = self.federated_instances
        if federated_instances is None:
            return []

        blocked_domains = federated_instances.get("blocked", []) or []
        return [
            instance
            for domain in blocked_domains
            for instance in get_instances_by_domain(self.data, domain)
        ]

    def get_domains(self) -> List[str]:
        return list(
            {
                instance["domain"]
                for instance in self.blocking_instances + self.blocked_by
            }
        )

    def get_active_users_count(self, instance: Dict) -> int:
        return instance["site_info"]["site_view"]["counts"]["users_active_month"]

    def calculate_total_blocked_users(self, domains: List[str]) -> int:
        total_blocked_users = sum(
            self.get_active_users_count(instance)
            for instance in self.blocking_instances + self.blocked_by
            if instance["domain"] in domains
        )
        return total_blocked_users

    @property
    def blocked_users(self) -> int:
        domains = self.get_domains()
        return self.calculate_total_blocked_users(domains)

    @property
    def domain(self) -> str:
        return self.sanitize_text(self.instance_details["domain"])

    @property
    def name(self) -> str:
        return self.sanitize_text(
            self.instance_details["site_info"]["site_view"]["site"]["name"]
        )

    @property
    def federation_enabled(self) -> bool:
        return self.instance_details["site_info"]["site_view"]["local_site"][
            "federation_enabled"
        ]

    @property
    def federated_linked(self) -> List[str]:
        if self.federation_enabled:
            return self.instance_details["site_info"]["federated_instances"]["linked"]
        else:
            return None

    @property
    def federated_allowed(self) -> List[str]:
        if self.federation_enabled:
            return self.instance_details["site_info"]["federated_instances"]["allowed"]
        else:
            return None

    @property
    def federated_blocked(self) -> List[str]:
        if self.federation_enabled:
            return self.instance_details["site_info"]["federated_instances"]["blocked"]
        else:
            return None

    @property
    def registration_mode(self) -> str:
        return self.instance_details["site_info"]["site_view"]["local_site"][
            "registration_mode"
        ]

    @property
    def slur_filter(self) -> str:
        return self.instance_details["site_info"]["site_view"]["local_site"][
            "slur_filter_regex"
        ]

    @property
    def community_creation_admin_only(self) -> bool:
        return self.instance_details["site_info"]["site_view"]["local_site"][
            "community_creation_admin_only"
        ]

    @property
    def enable_downvotes(self) -> bool:
        return self.instance_details["site_info"]["site_view"]["local_site"][
            "enable_downvotes"
        ]

    @property
    def enable_nsfw(self) -> bool:
        return self.instance_details["site_info"]["site_view"]["local_site"][
            "enable_nsfw"
        ]

    @property
    def users_month(self) -> int:
        return self.instance_details["site_info"]["site_view"]["counts"][
            "users_active_month"
        ]

    @property
    def blocked_by(self) -> List[Dict]:
        return get_blocked_by_instances(
            self.data["instance_details"], self.instance_details["domain"]
        )

    @property
    def blocked_by_count(self) -> int:
        return len(self.blocked_by)

    @property
    def blocking_count(self) -> int:
        blocking_instances = self.blocking_instances
        if blocking_instances is None:
            return 0
        else:
            return len(blocking_instances)

    @property
    def adult(self) -> str:
        if self.slur_filter is not None or not self.enable_nsfw:
            return "No"
        else:
            return "Yes"


class InstanceFilter:
    def __init__(self, instances: List[Dict]):
        self.instances = instances

    def filter_by_criteria(self):
        self.instances = [
            instance
            for instance in self.instances
            if (
                instance["NU"] == "Yes"
                and instance["NC"] == "Yes"
                and instance["Fed"] == "Yes"
                and instance["Adult"] == "Yes"
            )
        ]
        return self

    def filter_by_users(self):
        self.instances = [
            instance
            for instance in self.instances
            if MIN_USERS < int(instance["Users"]) < MAX_USERS
        ]
        return self

    def filter_by_blocking(self, bi_avg: float, bb_avg: float):
        self.instances = [
            instance
            for instance in self.instances
            if int(instance["BI"]) < bi_avg and int(instance["BB"]) < bb_avg
        ]
        return self

    def filter_by_uptime(self):
        uptime_available = [
            instance for instance in self.instances if instance["UT"] != UPTIME_UNKNOWN
        ]

        if not uptime_available:
            return self

        for percent_uptime in reversed(range(100)):
            high_uptime_instances = [
                instance
                for instance in self.instances
                if instance["UT"][:-1].isdigit()
                and int(instance["UT"][:-1]) > percent_uptime
            ]

            if len(high_uptime_instances) > 1:
                self.instances = high_uptime_instances
                break

        return self


def load_json_data(filepath: str) -> Dict:
    with open(filepath) as json_data:
        return json.load(json_data)


def get_instances_by_domain(data: Dict, domain: str) -> List[Dict]:
    return [
        instance
        for instance in data["instance_details"]
        if instance["domain"] == domain
    ]


def get_instances(data: Dict) -> List[LemmyInstance]:
    instances = []
    for instance_details in data["instance_details"]:
        instance = LemmyInstance(instance_details, data)
        instances.append(instance)
    return instances


def get_blocked_by_instances(instance_details: List[Dict], domain: str) -> List[Dict]:
    return [
        instance
        for instance in instance_details
        if instance["site_info"]["federated_instances"] is not None
        and instance["site_info"]["federated_instances"]["blocked"] is not None
        and domain in instance["site_info"]["federated_instances"]["blocked"]
    ]


def filter_instances_by_criteria(all_instances: List[Dict]) -> List[Dict]:
    return [
        instance
        for instance in all_instances
        if (
            instance["NU"] == "Yes"
            and instance["NC"] == "Yes"
            and instance["Fed"] == "Yes"
            and instance["Adult"] == "Yes"
        )
    ]


def calculate_averages(all_instances: List[Dict]) -> tuple:
    bi_avg = average_instances(all_instances, "BI")
    bb_avg = average_instances(all_instances, "BB")
    return bi_avg, bb_avg


def average_instances(all_instances: List[Dict], key: str) -> float:
    instances_list = [
        int(instance[key]) for instance in all_instances if int(instance[key]) > 1
    ]
    return numpy.average(instances_list)


def filter_recommended_instances(all_instances: List[Dict]) -> List[Dict]:
    instance_filter = InstanceFilter(all_instances)
    recommended_instances = (
        instance_filter.filter_by_criteria().filter_by_users().instances
    )

    bi_avg, bb_avg = calculate_averages(all_instances)
    return (
        InstanceFilter(recommended_instances)
        .filter_by_blocking(bi_avg, bb_avg)
        .filter_by_uptime()
        .instances
    )


def generate_csv_row(instance: LemmyInstance, uptime_data: Dict) -> str:
    uptime = [
        x["uptime_alltime"]
        for x in uptime_data["data"]["nodes"]
        if x["domain"] == instance.domain
    ]
    uptime = "??" if not uptime else f"{round(float(uptime[0]))}%"

    row_values = [
        f"~[{instance.domain}](https://{instance.domain})~",
        "Yes" if instance.registration_mode != "closed" else "No",
        "Yes" if not instance.community_creation_admin_only else "No",
        "No" if not instance.federation_enabled or instance.federated_allowed is not None else "Yes",
        instance.adult,
        "Yes" if instance.enable_downvotes else "No",
        instance.users_month,
        instance.blocking_count,
        instance.blocked_by_count,
        instance.blocked_users,
        uptime,
    ]
    return ",".join(map(str, row_values)) + "\n"


def generate_csv_contents(instances: List[LemmyInstance], uptime_data: Dict) -> str:
    csv_contents = [CSV_HEADER]
    for instance in instances:
        csv_row = generate_csv_row(instance, uptime_data)
        csv_contents.append(csv_row)
    return "".join(csv_contents)


def create_csv_contents(instances: List[Dict]) -> str:
    csv_contents = CSV_HEADER
    for instance in instances:
        csv_contents += (
            ",".join(
                [
                    instance["Instance"],
                    instance["NU"],
                    instance["NC"],
                    instance["Fed"],
                    instance["Adult"],
                    instance["↓V"],
                    instance["Users"],
                    instance["BI"],
                    instance["BB"],
                    instance["BU"],
                    instance["UT"],
                ]
            )
            + "\n"
        )
    return csv_contents


def convert_csv_to_markdown_table(csv_file_path: str) -> str:
    df = pd.read_csv(csv_file_path)
    return "\n" + df.to_markdown(tablefmt="pipe", index=False) + "\n"


def write_csv_to_file(csv_contents: str, output_file: str) -> None:
    with open(output_file, "w") as csv_file:
        csv_file.write(csv_contents)


def read_instances(csv_file_path: str) -> List[Dict]:
    with open(csv_file_path) as csv_file:
        return [instance for instance in csv.DictReader(csv_file)]


def write_csv_file(file_name: str, csv_contents: str) -> None:
    with open(file_name, "w") as csv_file:
        csv_file.write(csv_contents)


def write_readme_file(file_name: str, readme_contents: str) -> None:
    with open(file_name, "w") as readme_file:
        readme_file.write(readme_contents)


def generate_instances_csv_contents() -> str:
    data = load_json_data(LEMMY_STATS_CRAWLER_FILEPATH)
    uptime_data = load_json_data(UPTIME_FILENAME)

    instances = get_instances(data)
    return generate_csv_contents(instances, uptime_data)


def write_recommended_instances_csv(recommended_instances, file_name):
    csv_contents = create_csv_contents(recommended_instances)
    write_csv_file(file_name, csv_contents)


def write_markdown(input_csv) -> None:
    readme_content = generate_readme(input_csv)
    write_readme_file(README_FILENAME, readme_content)


def generate_readme(input_csv) -> str:
    recommended_markdown_table = convert_csv_to_markdown_table(input_csv)
    markdown_table = convert_csv_to_markdown_table(OUT_CSV)
    return (
        README
        + README_RECOMMENDED_INSTANCES
        + recommended_markdown_table
        + README_WHATS_NEXT
        + markdown_table
    )


def main():
    csv_contents = generate_instances_csv_contents()
    write_csv_to_file(csv_contents, OUT_CSV)

    all_instances = read_instances(OUT_CSV)
    recommended_instances = filter_recommended_instances(all_instances)
    write_recommended_instances_csv(recommended_instances, "recommended-instances.csv")

    write_markdown("recommended-instances.csv")


if __name__ == "__main__":
    main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant