Scraper Guide

This project includes a powerful Python-based scraper to extract officer data from stfc.space. Follow this guide to run it locally and generate your own dataset.

Prerequisites

  • Python 3.10 or higher
  • Node.js (for the website)
  • A terminal or command prompt

Installation

# Clone the repository
git clone https://gitlab.com/your-repo/stfc-space-scraper.git
cd stfc-space-scraper

# Set up virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e ./scripts/stfc_scraper
playwright install chromium

Running the Scraper

The scraper can be run using the following command:

stfc-scraper --output officers.csv

Options

Flag Description Default
--output, -o Output CSV path officers.csv
--no-headless Run with a visible browser window Headless
--delay, -d Delay between requests (seconds) 1.0
--lang Language for logs (en, es) en
--page-start Page to start scraping from 1
--page-end Page to stop scraping at 14
--limit, -l Limit the number of officers to scrape None

How it Works

The scraper visits the officer list on stfc.space, extracts the detail links, and then navigates to each officer profile. It uses a combination of Playwright for page loading and JavaScript injection to extract:

  • Basic info: Name, Rarity, Group, Faction, Avatar Image.
  • Abilities: Captain, Officer, and Below Decks abilities.
  • Stats: Attack, Defense, Health (Max Level).
  • Traits and Synergy Officers.