Setting up your own product monitor and scraper

Q: How do I run the code?

You should first install the dependencies by running pip install -r requirements.txt . After installing the dependencies, run the code by using python3 monitor.py or python3 scraper.py depending on which script you want to run.

Introduction

If you've been following sneakers for a while, you've probably heard about "monitors" and "scrapers". These are tools that allow you to monitor products and be notified when they're available. In this blog post we'll be covering how each of these tools works and how you can set up your own. This is in no way a tutorial on how to code, but rather a guide to help you start building your own tools.

Key takeaways

Setting up your own monitors and scrapers is not extremely difficult, as long as you have a basic understanding of programming.
Using correct headers, rotating proxies, and delays will help you avoid being detected by websites that have low security standards.
You can use your own laptop or computer to run your scripts while testing, then deploy them on a server to have them running 24/7.

What tools do you need?

Python (you can use any language you want!)
MongoDB (or any other database you prefer!)
Discord

Monitoring products

Shoe Palace MonitorSTANDBY

Use the interactive example above to see how a monitor works. Please keep in mind this interactive experience is a simplified and hardcoded example.

What is a monitor?

A monitor is a script which checks a website for updates. It can be used to monitor a product page, a search results page, or any other page that you want to track. Monitors are used most often for tracking new product releases and restocks. Setting up your own monitor requires a few different tools, which we'll cover in the next section.

How does a monitor work?

Here's a simplified description of how a monitor works:

Our script is connected to a database and a website.
Every few seconds, script checks website for updates.
Something updated? Save to database and notify user!

Here are some examples of how a monitor can be used in the context of sneaker releases:

When monitoring a product page, the script will check the page for product updates (such as price changes, stock updates, etc.). When the product is updated, the script will save the new data to the database and send a message to a Discord channel.
When monitoring a search results page, the script will check the page for new products. When a new product is found, the script will save the new product to the database and send a message to a Discord channel.

The first example is a "restock monitor", alerting you when a product is back in stock. The second example is a "release monitor", alerting you when new products are availble to purchase. In this post we'll be building a simple release monitor alerting us whenever Shoe Palace releases new products online.

Scraping products

JD Image ScraperSTANDBY

Use the interactive example above to see how a scraper works. Please keep in mind this interactive experience is a simplified and hardcoded example.

What is a scraper?

A scraper is a script which extracts data from a website. It can be used to extract images, information from product pages, or any other data that you want to extract. Scrapers are used most often for tracking new uploads (images, products, etc) before they're published for users. Setting up your own scraper requires a few different tools, which we'll cover in the next section.

How does a scraper work?

The theory behind a scraper is no different from a monitor. You have a script connected to a a database. The script is constantly checking the website for new data. When new data is found, the script will save the new data to the database and send a message to a Discord channel.

Here are some examples of how a scraper can be used for sneaker releases:

When scraping an image server, the script will check the server for new images. When a new image is found, the script will save the new image to the database and send a message to a Discord channel.
When scraping product pages, the script will check the website for new products. When a new product is found, the script will save the new product to the database and send a message to a Discord channel.

The first example is a "image scraper", alerting you when a new image is uploaded to a server. The second example is a "product scraper", alerting you when a new product is uploaded to a website. In this post we'll be building a simple image scraper alerting us whenever a new JD Sports image is live.

Note

The theory behind a monitor and a scraper is the same. The only difference is the type of data we're extracting. While monitoring is focused (you know which pages you'll monitor, or the product you want to check updates for), scraping is more general (you're constantly spraying and praying for new data, you never know what you'll find).

Building your own toolbox

Now that we've covered the basics of monitors and scrapers, let's build our own toolbox. We will be using Python as our language of choice, MongoDB as our database, and Discord to send notifications. I'm assuming you have a basic understanding of programming and will be focusing on the logic, rather than the code. You can download the code used in this blog post here. If you want to run this code, you need to install the Python libraries using pip install -r requirements.txt. If you don't have pip installed, you most likely need to tackle that whole install proces first before returning here.

Libraries? What libraries?

We use libraries in order to make our lives easier and not re-invent the wheel. Libraries are pre-written code that can be used to perform specific tasks. For example, the requests library allows us to make HTTP requests to a website. Another library we're using is pymongo, to connect to our MongoDB database. Last but not least, notifications will be sent using discord_webhook.

Keep in mind this post provides minimal examples. You can parse HTML using the beautifulsoup4 library, or use a headless browser like playwright to scrape websites, but we're not going to cover those here. This post is meant to get your gears turning - you can always learn more stuff on your own.

Building your own monitor

Monitor example

We will be building a monitor which checks Shoe Palace's website for new products. Shoe Palace uses Shopify (which still has a really useful JSON endpoint publicly available), therefore we don't have to worry about parsing HTML. Let's think about what the functions of our code should do:

Function	Description
`connect_to_mongodb()`	Connects to our MongoDB database
`fetch_products()`	Fetches all products from a website
`extract_product_data()`	Parses product data for each item
`save_to_database()`	Saves product data to our database
`send_discord_notification()`	Sends notification via Discord webhook
`main()`	Logic and functions are called from here

The logic of our code in relation to the functions should go like this:

Connect to database using connect_to_mongodb()
While true, try to fetch products using fetch_products()
- If successful, return product data
- If unsuccessful, print error message
For each product, extract data using extract_product_data()
- Return formatted product data
For each product, check if it's new using save_to_database()
- If product is new, save product to database
- If product is not new, skip over this product
For each new product, try send alert via Discord using send_discord_notification()
- If successful, congratulate yourself!
- If unsuccessful, print error message

That's basically it! Now let's see how the main function looks like:

def main():
    """Main monitor loop. Connects to MongoDB, fetches products, processes them, and sends notifications for new products."""
    
    # Connect to MongoDB
    print("Starting monitor...")
    db, collection = connect_to_mongodb()
    if not collection:
        print("MongoDB connection failed. Exiting.")
        return
    print("Monitor running...")
    
    # Main loop
    while True:
        try:
            # Fetch products from the website
            products = fetch_products()
            if not products:
                time.sleep(DELAY_IN_SECONDS)
                continue
            
            # Parse and process each product
            new_products_count = 0
            for product in products:
                product_data = extract_product_data(product)
                # Try check if product is new and save to database
                is_new = save_to_database(collection, product_data)
                # If product is new, send notification via Discord
                if is_new:
                    new_products_count += 1
                    send_discord_notification(product_data)
            
            if new_products_count > 0:
                print(f"Found {new_products_count} new products")
            
            # Wait for the next iteration
            time.sleep(DELAY_IN_SECONDS)
            
        # Stop loop using Ctrl+C
        except KeyboardInterrupt:
            print("Monitor stopped")
            break
        # Print exception when error
        except Exception as e:
            print(f"Error: {e}")
            time.sleep(DELAY_IN_SECONDS)

If you're want to check out how every function looks like, you can find the code here.

Building your own scraper

Scraper example

We will be building a scraper which checks JD Sports's website for new images. JD Sports uses a content delivery network (CDN) to serve images, and they use numerical IDs to identify each image. This makes it really easy to scrape by incrementing the ID in the URL, and checking if that image exists.

Our functions will be similar, here's a list of what they should do:

Function	Description
`generate_unique_string()`	Generates unique cache bypass string
`connect_to_mongodb()`	Connects to our MongoDB database
`fetch_image()`	Fetches an image from the website
`extract_image_data()`	Parses image data for each image
`save_to_database()`	Saves image data to our database
`send_discord_notification()`	Sends notification via Discord webhook
`main()`	Logic and functions are called from here

The new function you're seeing is generate_unique_string(). We're using it to bypass cache. When a website serves data from its cache, the content you see is not always the latest version. There are multiple ways to fix this issue - but the quickest one is to make the link unique each time you visit it.

import random
import string

def generate_unique_string():
    """Generate a random unique string for CDN requests."""
    # Generate 20 random characters (letters and numbers)
    characters = string.ascii_uppercase + string.digits
    return ''.join(random.choice(characters) for _ in range(20))

The logic of our code in relation to the functions should go like this:

Connect to database using connect_to_mongodb()
While true, try to fetch images using fetch_image() and generate_unique_string()
- If successful, return image data
- If unsuccessful, print error message
For each image, extract data using extract_image_data()
- Return formatted image data
For each image, check if it's new using save_to_database()
- If image is new, save image to database
- If image is not new, skip over this image
For each new image, try send alert via Discord using send_discord_notification()
- If successful, congratulate yourself!
- If unsuccessful, print error message

Scraping products or images is not rocket science. Here's what the main function looks like:

def main():
    """Main scraper loop. Connects to MongoDB, fetches images, processes them, and sends notifications for new images."""
    
    # Connect to MongoDB
    print("Starting scraper...")
    db, collection = connect_to_mongodb()
    if collection is None:
        print("MongoDB connection failed. Exiting.")
        return
    print("Scraper running...")
    
    # Main loop
    current_id = STARTING_ID
    while True: 
        try:
            # Fetch image from the website
            response = fetch_image(current_id)
            
            if response:
                # Image exists, extract data
                image_data = extract_image_data(current_id, response)
                # Try to save to database and check if it's new
                is_new = save_to_database(collection, image_data)
                # If image is new, send notification via Discord
                if is_new:
                    send_discord_notification(image_data)
                    print(f"Found new image: ID {current_id}")
                else:
                    # Image exists but already in database
                    print(f"Image already exists: ID {current_id}")
            else:
                # Image fetch was unsuccessful
                print(f"Failed to fetch image: ID {current_id}")
            
            # Move to next ID
            current_id += 1
            
            # Reset to starting ID when we reach the end
            if current_id > ENDING_ID:
                current_id = STARTING_ID
                print(f"Completed range {STARTING_ID} - {ENDING_ID}, starting over...")
            
            # Wait between requests
            time.sleep(DELAY_IN_SECONDS)
            
        # Stop loop using Ctrl+C
        except KeyboardInterrupt:
            print("Scraper stopped")
            break
        # Print exception when error
        except Exception as e:
            print(f"Error: {e}")
            time.sleep(DELAY_IN_SECONDS)

If you're want to check out how every function looks like, you can find the code here.

Conclusion

You've built your own monitor and scraper! Now you can expand your toolbox to include more complex features, such as proxy support, multi-threading or keyword filtering. You should probably save your code somewhere safe before making any major changes. The popular option is to use GitHub which can be used for version control and collaboration.

Not sure what to do now? Here's some ideas to get you started: Add a proxy to your request. Add support for rotating proxies. Add multiple endpoints for your script to monitor. Add multi-threading to monitor the endpoints concurrently. Add keyword filtering, and send notifications only for specific products.

Note

The code referenced in this blog post was written by AI after prompting it with quotes from this blog post. The example code is not perfect in any way, it's just a proof of concept to help you get started. While you can use this as a starting point, you should also learn more about the tools you're using, and how to use them properly.

Frequently asked questions

How do I run the code?

You should first install the dependencies by running pip install -r requirements.txt. After installing the dependencies, run the code by using python3 monitor.py or python3 scraper.py depending on which script you want to run.

Why is my monitor not working?

There are multiple reasons why your monitor might not be working. Here are some of the most common ones: Python is not installed on your machine. You don't have dependencies installed. The website is blocking your requests. Read your console error messages and try to fix the issue.

What can I actually use this for?

If you're working in sneaker media or affiliate marketing, you can probably add your affiliate link to the embed and start earning comissions. You can also use this to start building your own sneaker bot, or use existing bots and their quick task system. In theory, you could also populate a website or a mobile application in real time using this method. Collect data and analyze or vizualize it. Build something cool! Remember to always be ethical and respect the terms and conditions of the website you're scraping, or get their permission first before collecting data.

How do I avoid being detected?

First of all, you should be using proxies if you're going to be scraping a lot of data. You should also make sure to use delays and avoid making too many requests in a short period of time. Your request headers should match the ones of a real browser. If the website is using anti-bot measures, you could try using a headless browser like playwright to scrape the website. You can also try finding a way to bypass the anti-bot measures.

How can I scrape a website faster?

Use multi-threading to scrape multiple pages at the same time. This is a technique where you run multiple threads (processes) at the same time. This is useful because you can scrape multiple pages at the same time, which will speed up the process. Make sure to remember you can also get rate-limited by your Discord webhook or reach your maximum connection limit in MongoDB, so make sure to account for that.

Vlad Ciutacu

Menu