Finding Public Download Links To Private Files

Posted Mar 6, 2021 05:16 PM

Let us begin with the idea. We know the many people use file-hosting sites to upload their data. Not only random dog pictures but also malware or even somethings like paid e-books. So what can I do if I want to obtain these files? Most file hosting services don't have a search function on their site.

The two easiest options to solve this issue are brute-forcing and using search engines. For my examples, I am going to use anonfiles[WARNING POSSIBLY INFECTED].

Bruteforcing anonfiles[WARNING POSSIBLY INFECTED] Download Links With Python

This method is by far not efficient but interesting to play around with.
At first, let us take a look at some links: https://anonfiles[WARNING POSSIBLY INFECTED].com/Nc6f3a78qc/

Seems like we have 10 random characters, these can contain uppercase letters, lowercase letters, and numbers.

We start with importing the needed modules:

Code 
import requests

import random

import string

We will use requests to check the status code of the link and a combination out of random and string to generate our link list.

We create an own function called "link_generator". We define a size of 10 and the types of characters we want to use. Then we pick a random choice from our characters. We do this ten times and join them together.

Code 
def link_generator(size=10, chars=string.ascii_lowercase + string.ascii_uppercase + string.digits):

 return ''.join(random.choice(chars) for _ in range(size))

We need a function to load our link and check for the status code.

At first, we load our link with the requests module. We will only get the data from hour head, this will improve the speed of our script. Now let's check if the status code of the response is 200.
If we meet the requirements we want to print our link.

Code 
def load_url(url):

 r = requests.head(url)

 if r.status_code == 200:

 print(str(r.status_code) + ": " + url)

To use this function we need a list of links to check. Let us create a list of random links with our link generator function. We define an empty list called "urls" and append our main link with a random end.

Code 
for i in range(100):

 urls.append('https://anonfiles[WARNING POSSIBLY INFECTED].com/' + link_generator())

If we now call our "load_url" function we can start brute-forcing for links.

Code 
for url in urls:

 load_url(url)

Let's clear it a bit up and add multi-threading to check the links faster. I will also change some variables to input so it's easier to use from your command line. This is the finished code:

Code 
import requests

import random

import string

from time import time

from concurrent.futures import ThreadPoolExecutor, as_completed

import concurrent.futures

urls = []

out = []

results = []

THREADS = input("Threads: ")

AMOUNT = input("Links to scrape: ")

def link_generator(size=10, chars=string.ascii_lowercase + string.ascii_uppercase + string.digits):

 return ''.join(random.choice(chars) for _ in range(size))

def load_url(url):

 r = requests.head(url)

 if r.status_code == 200:

 print(str(r.status_code) + ": " + url)

 results.append(url)

for i in range(int(AMOUNT)):

 urls.append('https://anonfiles[WARNING POSSIBLY INFECTED].com/' + link_generator())

start = time()

with ThreadPoolExecutor(max_workers=int(THREADS)) as executor:

 future_to_url = (executor.submit(load_url, url) for url in urls)

 for future in concurrent.futures.as_completed(future_to_url):

 out.append(1)

 print("Checked: " + str(len(out)) + " - " + "Results: " + str(len(results)),end="\r")

print(f'Time taken: {time() - start}')

Finding anonfiles[WARNING POSSIBLY INFECTED] Download Links With Google

A more effective method is to search for links by using a search engine. In this example, we will use Google.

By just searching for the website we will find many useless links and no download. To solve this problem we use search operators. These are special characters and commends which are used to extend the capabilities of our search.

For our needs, the "site" operator is perfect: site:anonfiles[WARNING POSSIBLY INFECTED].com
If we enter this search term into google we will find our first results.

This is pretty slow and we only get a few results per site so let us modify our link a bit. The main link for a google search is:
https://www.google.de/search?

We add our keywords we want to search for:
https://www.google.de/search?q=site%3Aanonfiles.com

Last but not least let's show 100 results per site:
https://www.google.de/search?q=site%3Aanonfiles.com&num=100

Now that we got our data prepared, let's use python to scrape all the links from our search results.

We will use some modules to make our life easier. Let's import them first:

Code 
import requests

from bs4 import BeautifulSoup

import re

At first, create a function and call it whatever you like. We get our link from user input. This time we will not only get our header but the complete content. We use Beautiful Soup and "lxml" as our parser.

Code 
def googleScrape():

    url = input('Link: ')

    r = requests.get(url)

    soup = BeautifulSoup(r.text, 'lxml')

By inspecting the google search page we can see that the best way to find these links is to find the "href" value. We can find these inside a tag which defines a hyperlink.

Code 
    a = soup.find_all('a')

    for i in a:

        href_complete = i.get('href')

If we print out our list of links we can see that there is a small issue with them. We get a lot of unnecessary information and not clean links. With a regular expression search, we can find all of our links. Now group them in a temporary variable and split them on every "&". At last, we can print our results.

Code 
        re_search = re.search("(?P<url>https?://[^\s]+)", href_complete)

        temp = re_search.group(0)

        result = temp.split('&')[0]

        print(result)

Let's try the finished code:

Code 
import requests

from bs4 import BeautifulSoup

import re

def googleScrape():

    url = input('Link: ')

    r = requests.get(url)

    soup = BeautifulSoup(r.text, 'lxml')

    a = soup.find_all('a')

    for i in a:

        href_complete = i.get('href')

        re_search = re.search("(?P<url>https?://[^\s]+)", href_complete)

        temp = re_search.group(0)

        result = temp.split('&')[0]

        print(result)

googleScrape()

I was able to find some Indian movies and some Chinese manuals.

I hope you could understand the main theory behind this. Why not try it with some other download sites? Modify the code and try your luck!

Neon™

Mar 21, 2021 05:19 AM

Good stuff! Thanks for the share!

Tor

Mar 8, 2021 10:31 AM

Love the blog post <3.

Login
Username/Email:
Password:	Lost Password?
	Remember me