Merge branch 'master' of github.com:benbusby/whoogle-search

README.md
This commit is contained in:
Dee-Jay Logozzo 2020-08-28 22:15:29 +10:00
commit ed0c96823a
40 changed files with 1545 additions and 302 deletions

View File

@ -1 +1,2 @@
.git/ .git/
venv/

View File

@ -1,9 +1,9 @@
--- ---
name: Bug report name: Bug report
about: Create a bug report to help improve Whoogle about: Create a bug report to help fix an issue with Whoogle
title: "[BUG] " title: "[BUG] <brief bug description>"
labels: bug labels: bug
assignees: benbusby assignees: ''
--- ---
@ -17,11 +17,18 @@ Steps to reproduce the behavior:
3. Scroll down to '....' 3. Scroll down to '....'
4. See error 4. See error
**Expected behavior** **Deployment Method**
A clear and concise description of what you expected to happen. - [ ] Heroku (one-click deploy)
- [ ] Docker
- [ ] `run` executable
- [ ] pip/pipx
- [ ] Other: [describe setup]
**Version of Whoogle Search**
- [ ] Latest build from [source] (i.e. GitHub, Docker Hub, pip, etc)
- [ ] Version [version number]
- [ ] Not sure
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):** **Desktop (please complete the following information):**
- OS: [e.g. iOS] - OS: [e.g. iOS]

View File

@ -0,0 +1,17 @@
---
name: Feature request
about: Suggest a feature that would improve Whoogle
title: "[FEATURE] <description of feature>"
labels: enhancement
assignees: ''
---
**Describe the feature you'd like to see added**
A short description of the feature, and what it would accomplish.
**Describe which parts of the project this would modify (front end/back end/configuration/etc)**
A short description of which aspects of Whoogle Search would need modification
**Additional context**
Add any other context or screenshots about the feature request here.

10
.github/ISSUE_TEMPLATE/question.md vendored Normal file
View File

@ -0,0 +1,10 @@
---
name: Question
about: Ask a (simple) question about Whoogle
title: "[QUESTION] <question here>"
labels: question
assignees: ''
---
Type out your question here. Please make sure that this is a topic that isn't already covered in the README.

4
.gitignore vendored
View File

@ -3,8 +3,12 @@ venv/
__pycache__/ __pycache__/
*.pyc *.pyc
*.pem *.pem
*.conf
config.json config.json
test/static test/static
flask_session/
app/static/config
app/static/custom_config
# pip stuff # pip stuff
build/ build/

View File

@ -5,4 +5,11 @@ before_install:
install: install:
- pip install -r requirements.txt - pip install -r requirements.txt
script: script:
- ./whoogle-search test - "./run test"
deploy:
provider: pypi
user: __token__
password:
secure: WNEH2Gg84MZF/AZEberFDGPPWb4cYyHAeD/XV8En94QRSI9Aznz6qiDKOvV4eVgjMAIEW5uB3TL1LHf6KU+Hrg6SmhF7JquqP1gsBOCDNFPTljO+k2Hc53uDdSnhi/HLgY7cnFNX4lc2nNrbyxZxMHuSA2oNz/tosyNGBEeyU+JA5va7uX0albGsLiNjimO4aeau83fsI0Hn2eN6ag68pewUMXNxzpyTeO2bRcCd5d5iILs07jMVwFoC2j7W11oNqrVuSWAs8CPe4+kwvNvXWxljUGiBGppNZ7RAsKNLwi6U6kGGUTWjQm09rY/2JBpJ2WEGmIWGIrno75iiFRbjnRp3mnXPvtVTyWhh+hQIUd7bJOVKM34i9eHotYTrkMJObgW1gnRzvI9VYldtgL/iP/Isn2Pv2EeMX8V+C9/8pxv0jkQkZMnFhE6gGlzpz37zTl04B2J7xyV5znM35Lx2Pn3zxdcmdCvD3yT8I4MuBbKqq2/v4emYCfPfOmfwnS0BEVSqr9lbx4xfUZV76tcvLcj4n86DJbx77pA2Ch8FRprpOOBcf0WuqTbZp8c3mb8prFp2EupUknXu7+C2VQ6sqrnzNuDeTGm/nyjjRQ81rlvlD4tqkwsEGEDDO44FF2eUTc5D2MvoHs4cnz095FWjy63gn5IxUjhMi31b5tGRz2Q=
on:
tags: true

View File

@ -1,9 +1,28 @@
FROM python:3 FROM python:3.8-slim
WORKDIR /usr/src/app WORKDIR /usr/src/app
RUN apt-get update && apt-get install -y build-essential libcurl4-openssl-dev libssl-dev
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
ARG config_dir=/config
RUN mkdir -p $config_dir
VOLUME $config_dir
ENV CONFIG_VOLUME=$config_dir
ARG username=''
ENV WHOOGLE_USER=$username
ARG password=''
ENV WHOOGLE_PASS=$password
ARG use_https=''
ENV HTTPS_ONLY=$use_https
ARG whoogle_port=5000
ENV EXPOSE_PORT=$whoogle_port
COPY . . COPY . .
RUN pip install --no-cache-dir -r requirements.txt EXPOSE $EXPOSE_PORT
RUN chmod +x ./whoogle-search
CMD ["./whoogle-search"] CMD ["./run"]

View File

@ -1,3 +1,4 @@
graft app/static graft app/static
graft app/templates graft app/templates
include requirements.txt
global-exclude *.pyc global-exclude *.pyc

View File

@ -4,6 +4,7 @@
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://travis-ci.com/benbusby/whoogle-search.svg?branch=master)](https://travis-ci.com/benbusby/whoogle-search) [![Build Status](https://travis-ci.com/benbusby/whoogle-search.svg?branch=master)](https://travis-ci.com/benbusby/whoogle-search)
[![codebeat badge](https://codebeat.co/badges/e96cada2-fb6f-4528-8285-7d72abd74e8d)](https://codebeat.co/projects/github-com-benbusby-shoogle-master) [![codebeat badge](https://codebeat.co/badges/e96cada2-fb6f-4528-8285-7d72abd74e8d)](https://codebeat.co/projects/github-com-benbusby-shoogle-master)
[![Docker Pulls](https://img.shields.io/docker/pulls/benbusby/whoogle-search)](https://hub.docker.com/r/benbusby/whoogle-search)
Get Google search results, but without any ads, javascript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile. Get Google search results, but without any ads, javascript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile.
@ -24,7 +25,8 @@ Contents
- No AMP links - No AMP links
- No URL tracking tags (i.e. utm=%s) - No URL tracking tags (i.e. utm=%s)
- No referrer header - No referrer header
- POST request search queries (when possible) - Autocomplete/search suggestions
- POST request search and suggestion queries (when possible)
- View images at full res without site redirect (currently mobile only) - View images at full res without site redirect (currently mobile only)
- Dark mode - Dark mode
- Randomly generated User Agent - Randomly generated User Agent
@ -47,7 +49,7 @@ If using Heroku Quick Deploy, **you can skip this section**.
There are a few different ways to begin using the app, depending on your preferences: There are a few different ways to begin using the app, depending on your preferences:
### A) [Heroku Quick Deploy](https://heroku.com/about) ### A) [Heroku Quick Deploy](https://heroku.com/about)
[![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy?template=https://github.com/benbusby/whoogle-search) [![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy?template=https://github.com/benbusby/whoogle-search/tree/heroku-app)
*Note: Requires a (free) Heroku account* *Note: Requires a (free) Heroku account*
@ -57,11 +59,11 @@ Provides:
- Downtime after periods of inactivity \([solution](https://github.com/benbusby/whoogle-search#prevent-downtime-heroku-only)\) - Downtime after periods of inactivity \([solution](https://github.com/benbusby/whoogle-search#prevent-downtime-heroku-only)\)
### B) [pipx](https://github.com/pipxproject/pipx#install-pipx) ### B) [pipx](https://github.com/pipxproject/pipx#install-pipx)
Persistent install: Persistent install:
`pipx install git+https://github.com/benbusby/whoogle-search.git` `pipx install git+https://github.com/benbusby/whoogle-search.git`
Sandboxed temporary instance: Sandboxed temporary instance:
`pipx run git+https://github.com/benbusby/whoogle-search.git whoogle-search` `pipx run git+https://github.com/benbusby/whoogle-search.git whoogle-search`
@ -71,14 +73,16 @@ Sandboxed temporary instance:
```bash ```bash
$ whoogle-search --help $ whoogle-search --help
usage: whoogle-search [-h] [--port <port number>] [--host <ip address>] [--debug] usage: whoogle-search [-h] [--port <port number>] [--host <ip address>] [--debug]
[--https-only]
Whoogle Search console runner Whoogle Search console runner
optional arguments: optional arguments:
-h, --help show this help message and exit -h, --help show this help message and exit
--port <port number> Specifies a port to run on (default 8888) --port <port number> Specifies a port to run on (default 5000)
--host <ip address> Specifies the host address to use (default 127.0.0.1) --host <ip address> Specifies the host address to use (default 127.0.0.1)
--debug Activates debug mode for the Flask server (default False) --debug Activates debug mode for the server (default False)
--https-only Enforces HTTPS redirects for all requests (default False)
``` ```
### D) Manual ### D) Manual
@ -90,7 +94,34 @@ cd whoogle-search
python3 -m venv venv python3 -m venv venv
source venv/bin/activate source venv/bin/activate
pip install -r requirements.txt pip install -r requirements.txt
./whoogle-search ./run
```
#### systemd Configuration
After building the virtual environment, you can add the following to `/lib/systemd/system/whoogle.service` to set up a Whoogle Search systemd service:
```
[Unit]
Description=Whoogle
[Service]
Type=simple
User=root
WorkingDirectory=<whoogle_directory>
ExecStart=<whoogle_directory>/venv/bin/python3 -um app --host 0.0.0.0 --port 5000
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=3
SyslogIdentifier=whoogle
[Install]
WantedBy=multi-user.target
```
Then,
```
sudo systemctl daemon-reload
sudo systemctl enable whoogle
sudo systemctl start whoogle
``` ```
### E) Manual (Docker) ### E) Manual (Docker)
@ -100,14 +131,30 @@ pip install -r requirements.txt
2. Clone and deploy the docker app using a method below: 2. Clone and deploy the docker app using a method below:
#### Docker CLI #### Docker CLI
Through Docker Hub:
```bash
docker pull benbusby/whoogle-search
docker run --publish 5000:5000 --detach --name whoogle-search benbusby/whoogle-search:latest
```
or with docker-compose:
```bash ```bash
git clone https://github.com/benbusby/whoogle-search.git git clone https://github.com/benbusby/whoogle-search.git
cd whoogle-search cd whoogle-search
docker build --tag whooglesearch:1.0 . docker-compose up
docker run --publish 8888:5000 --detach --name whooglesearch whooglesearch:1.0
``` ```
And kill with: `docker rm --force whooglesearch` or by building yourself:
```bash
git clone https://github.com/benbusby/whoogle-search.git
cd whoogle-search
docker build --tag whoogle-search:1.0 .
docker run --publish 5000:5000 --detach --name whoogle-search whoogle-search:1.0
```
And kill with: `docker rm --force whoogle-search`
#### Using [Heroku CLI](https://devcenter.heroku.com/articles/heroku-cli) #### Using [Heroku CLI](https://devcenter.heroku.com/articles/heroku-cli)
```bash ```bash
@ -139,6 +186,8 @@ To filter by a range of time, append ":past <time>" to the end of your search, w
## Extra Steps ## Extra Steps
### Set Whoogle as your primary search engine ### Set Whoogle as your primary search engine
*Note: If you're using a reverse proxy to run Whoogle Search, make sure the "Root URL" config option on the home page is set to your URL before going through these steps.*
Update browser settings: Update browser settings:
- Firefox (Desktop) - Firefox (Desktop)
- Navigate to your app's url, and click the 3 dot menu in the address bar. At the bottom, there should be an option to "Add Search Engine". Once you've clicked this, open your Firefox Preferences menu, click "Search" in the left menu, and use the available dropdown to select "Whoogle" from the list. - Navigate to your app's url, and click the 3 dot menu in the address bar. At the bottom, there should be an option to "Add Search Engine". Once you've clicked this, open your Firefox Preferences menu, click "Search" in the left menu, and use the available dropdown to select "Whoogle" from the list.
@ -161,6 +210,13 @@ Update browser settings:
- Select the 'Other' radio button - Select the 'Other' radio button
- Name: "Whoogle" - Name: "Whoogle"
- Search string to use: "http[s]://\<your whoogle url\>/search?q=%s" - Search string to use: "http[s]://\<your whoogle url\>/search?q=%s"
- [Alfred](https://www.alfredapp.com/) (Mac OS X)
1. Go to `Alfred Preferences` > `Features` > `Web Search` and click `Add Custom Search`. Then configure these settings
- Search URL: `https://\<your whoogle url\>/search?q={query}
- Title: `Whoogle for '{query}'` (or whatever you want)
- Keyword: `whoogle`
2. Go to `Default Results` and click the `Setup fallback results` button. Click `+` and add Whoogle, then drag it to the top.
- Others (TODO) - Others (TODO)
### Customizing and Configuration ### Customizing and Configuration
@ -179,12 +235,24 @@ A good solution for this is to set up a simple cronjob on any device at your hom
For instance, adding `*/20 7-23 * * * curl https://<your heroku app name>.herokuapp.com > /home/<username>/whoogle-refresh` will fetch the home page of the app every 20 minutes between 7am and midnight, allowing for downtime from midnight to 7am. And again, this wouldn't be a hard limit - you'd still have plenty of remaining hours of uptime each month in case you were searching after this window has closed. For instance, adding `*/20 7-23 * * * curl https://<your heroku app name>.herokuapp.com > /home/<username>/whoogle-refresh` will fetch the home page of the app every 20 minutes between 7am and midnight, allowing for downtime from midnight to 7am. And again, this wouldn't be a hard limit - you'd still have plenty of remaining hours of uptime each month in case you were searching after this window has closed.
Since the instance is destroyed and rebuilt after inactivity, config settings will be reset once the app enters downtime. If you have configuration settings active that you'd like to keep between periods of downtime (like dark mode for example), you could instead add `*/20 7-23 * * * curl -d "dark=1" -X POST https://<your heroku app name>.herokuapp.com > /home/<username>/whoogle-refresh` to keep these settings more or less permanent, and still keep the app from entering downtime when you're using it. Since the instance is destroyed and rebuilt after inactivity, config settings will be reset once the app enters downtime. If you have configuration settings active that you'd like to keep between periods of downtime (like dark mode for example), you could instead add `*/20 7-23 * * * curl -d "dark=1" -X POST https://<your heroku app name>.herokuapp.com/config > /home/<username>/whoogle-refresh` to keep these settings more or less permanent, and still keep the app from entering downtime when you're using it.
### HTTPS Enforcement
Only needed if your setup requires Flask to redirect to HTTPS on its own -- generally this is something that doesn't need to be handled by Whoogle Search.
Note: You should have your own domain name and [an https certificate](https://letsencrypt.org/getting-started/) in order for this to work properly.
- Heroku: Ensure that the `Root URL` configuration on the home page begins with `https://` and not `http://`
- Docker: Add `--build-arg use_https=1` to your run command
- Pip/Pipx: Add the `--https-only` flag to the end of the `whoogle-search` command
- Default `run` script: Modify the script locally to include the `--https-only` flag at the end of the python run command
Available config values are `near`, `nojs`, `dark` and `url`.
## FAQ ## FAQ
**What's the difference between this and [Searx](https://github.com/asciimoo/searx)?** **What's the difference between this and [Searx](https://github.com/asciimoo/searx)?**
Whoogle is intended to only ever be deployed to private instances by individuals of any background, with as little effort as possible. Prior knowledge of/experience with the command line or deploying applications is not necessary to deploy Whoogle, which isn't the case with Searx. As a result, Whoole is missing some features of Searx in order to be as easy to deploy as possible. Whoogle is intended to only ever be deployed to private instances by individuals of any background, with as little effort as possible. Prior knowledge of/experience with the command line or deploying applications is not necessary to deploy Whoogle, which isn't the case with Searx. As a result, Whoogle is missing some features of Searx in order to be as easy to deploy as possible.
Whoogle also only uses Google search results, not Bing/Quant/etc, and uses the existing Google search UI to make the transition away from Google search as unnoticeable as possible. Whoogle also only uses Google search results, not Bing/Quant/etc, and uses the existing Google search UI to make the transition away from Google search as unnoticeable as possible.

View File

@ -1,8 +1,27 @@
from cryptography.fernet import Fernet from app.utils.misc import generate_user_keys
from flask import Flask from flask import Flask
from flask_session import Session
import os import os
app = Flask(__name__, static_folder=os.path.dirname(os.path.abspath(__file__)) + '/static') app = Flask(__name__, static_folder=os.path.dirname(os.path.abspath(__file__)) + '/static')
app.secret_key = Fernet.generate_key() app.user_elements = {}
app.default_key_set = generate_user_keys()
app.no_cookie_ips = []
app.config['SECRET_KEY'] = os.urandom(32)
app.config['SESSION_TYPE'] = 'filesystem'
app.config['VERSION_NUMBER'] = '0.2.0'
app.config['APP_ROOT'] = os.getenv('APP_ROOT', os.path.dirname(os.path.abspath(__file__)))
app.config['STATIC_FOLDER'] = os.getenv('STATIC_FOLDER', os.path.join(app.config['APP_ROOT'], 'static'))
app.config['CONFIG_PATH'] = os.getenv('CONFIG_VOLUME', os.path.join(app.config['STATIC_FOLDER'], 'config'))
app.config['DEFAULT_CONFIG'] = os.path.join(app.config['CONFIG_PATH'], 'config.json')
app.config['SESSION_FILE_DIR'] = os.path.join(app.config['CONFIG_PATH'], 'session')
if not os.path.exists(app.config['CONFIG_PATH']):
os.makedirs(app.config['CONFIG_PATH'])
if not os.path.exists(app.config['SESSION_FILE_DIR']):
os.makedirs(app.config['SESSION_FILE_DIR'])
Session(app)
from app import routes from app import routes

3
app/__main__.py Normal file
View File

@ -0,0 +1,3 @@
from .routes import run_app
run_app()

View File

@ -1,5 +1,7 @@
from app.request import VALID_PARAMS from app.request import VALID_PARAMS
from app.utils.misc import BLACKLIST
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
from bs4.element import ResultSet
from cryptography.fernet import Fernet from cryptography.fernet import Fernet
import re import re
import urllib.parse as urlparse import urllib.parse as urlparse
@ -14,20 +16,63 @@ 
''' '''
def get_first_link(soup):
# Replace hrefs with only the intended destination (no "utm" type tags)
for a in soup.find_all('a', href=True):
# Return the first search result URL
if 'url?q=' in a['href']:
return filter_link_args(a['href'])
def filter_link_args(query_link):
parsed_link = urlparse.urlparse(query_link)
link_args = parse_qs(parsed_link.query)
safe_args = {}
if len(link_args) == 0 and len(parsed_link) > 0:
return query_link
for arg in link_args.keys():
if arg in SKIP_ARGS:
continue
safe_args[arg] = link_args[arg]
# Remove original link query and replace with filtered args
query_link = query_link.replace(parsed_link.query, '')
if len(safe_args) > 0:
query_link = query_link + urlparse.urlencode(safe_args, doseq=True)
else:
query_link = query_link.replace('?', '')
return query_link
def has_ad_content(element: str):
return element.upper() in (value.upper() for value in BLACKLIST) or '' in element
class Filter: class Filter:
def __init__(self, mobile=False, config=None, secret_key=''): def __init__(self, user_keys: dict, mobile=False, config=None):
if config is None: if config is None:
config = {} config = {}
self.near = config['near'] if 'near' in config else None self.near = config['near'] if 'near' in config else ''
self.dark = config['dark'] if 'dark' in config else False self.dark = config['dark'] if 'dark' in config else False
self.nojs = config['nojs'] if 'nojs' in config else False self.nojs = config['nojs'] if 'nojs' in config else False
self.new_tab = config['new_tab'] if 'new_tab' in config else False
self.mobile = mobile self.mobile = mobile
self.secret_key = secret_key self.user_keys = user_keys
self.main_divs = ResultSet('')
self._elements = 0
def __getitem__(self, name): def __getitem__(self, name):
return getattr(self, name) return getattr(self, name)
@property
def elements(self):
return self._elements
def reskin(self, page): def reskin(self, page):
# Aesthetic only re-skinning # Aesthetic only re-skinning
page = page.replace('>G<', '>Wh<') page = page.replace('>G<', '>Wh<')
@ -38,11 +83,31 @@ class Filter:
return page return page
def encrypt_path(self, msg, is_element=False):
# Encrypts path to avoid plaintext results in logs
if is_element:
# Element paths are tracked differently in order for the element key to be regenerated
# once all elements have been loaded
enc_path = Fernet(self.user_keys['element_key']).encrypt(msg.encode()).decode()
self._elements += 1
return enc_path
return Fernet(self.user_keys['text_key']).encrypt(msg.encode()).decode()
def clean(self, soup): def clean(self, soup):
self.remove_ads(soup) self.main_divs = soup.find('div', {'id': 'main'})
self.update_image_paths(soup) self.remove_ads()
self.fix_question_section()
self.update_styling(soup) self.update_styling(soup)
self.update_links(soup)
for img in [_ for _ in soup.find_all('img') if 'src' in _.attrs]:
self.update_element_src(img, 'image/png')
for audio in [_ for _ in soup.find_all('audio') if 'src' in _.attrs]:
self.update_element_src(audio, 'audio/mpeg')
for link in soup.find_all('a', href=True):
self.update_link(link)
input_form = soup.find('form') input_form = soup.find('form')
if input_form is not None: if input_form is not None:
@ -52,43 +117,54 @@ class Filter:
for script in soup('script'): for script in soup('script'):
script.decompose() script.decompose()
footer = soup.find('div', id='sfooter') # Update default footer and header
if footer is not None: footer = soup.find('footer')
footer.decompose() if footer:
# Remove divs that have multiple links beyond just page navigation
[_.decompose() for _ in footer.find_all('div', recursive=False) if len(_.find_all('a', href=True)) > 2]
header = soup.find('header')
if header:
header.decompose()
return soup return soup
def remove_ads(self, soup): def remove_ads(self):
main_divs = soup.find('div', {'id': 'main'}) if not self.main_divs:
if main_divs is None:
return return
result_divs = main_divs.find_all('div', recursive=False)
# Only ads/sponsored content use classes in the list of result divs for div in [_ for _ in self.main_divs.find_all('div', recursive=True)]:
ad_divs = [ad_div for ad_div in result_divs if 'class' in ad_div.attrs] has_ad = len([_ for _ in div.find_all('span', recursive=True) if has_ad_content(_.text)])
for div in ad_divs: _ = div.decompose() if has_ad else None
div.decompose()
def update_image_paths(self, soup): def fix_question_section(self):
for img in [_ for _ in soup.find_all('img') if 'src' in _.attrs]: if not self.main_divs:
img_src = img['src'] return
if img_src.startswith('//'):
img_src = 'https:' + img_src
elif img_src.startswith(GOOG_IMG):
# Special rebranding for image search results
if img_src.startswith(LOGO_URL):
img['src'] = '/static/img/logo.png'
img['height'] = 40
else:
img['src'] = BLANK_B64
continue question_divs = [_ for _ in self.main_divs.find_all('div', recursive=False) if len(_.find_all('h2')) > 0]
for question_div in question_divs:
questions = [_ for _ in question_div.find_all('div', recursive=True) if _.text.endswith('?')]
for question in questions:
question['style'] = 'padding: 10px; font-style: italic;'
enc_src = Fernet(self.secret_key).encrypt(img_src.encode()) def update_element_src(self, element, mime):
img['src'] = '/tmp?image_url=' + enc_src.decode() element_src = element['src']
# TODO: Non-mobile image results link to website instead of image if element_src.startswith('//'):
# if not self.mobile: element_src = 'https:' + element_src
# img.append(BeautifulSoup(FULL_RES_IMG.format(img_src), 'html.parser')) elif element_src.startswith(LOGO_URL):
# Re-brand with Whoogle logo
element['src'] = '/static/img/logo.png'
element['style'] = 'height:40px;width:162px'
return
elif element_src.startswith(GOOG_IMG):
element['src'] = BLANK_B64
return
element['src'] = '/element?url=' + self.encrypt_path(element_src, is_element=True) + \
'&type=' + urlparse.quote(mime)
# TODO: Non-mobile image results link to website instead of image
# if not self.mobile:
# img.append(BeautifulSoup(FULL_RES_IMG.format(element_src), 'html.parser'))
def update_styling(self, soup): def update_styling(self, soup):
# Remove unnecessary button(s) # Remove unnecessary button(s)
@ -114,65 +190,52 @@ class Filter:
# Set up dark mode if active # Set up dark mode if active
if self.dark: if self.dark:
soup.find('html')['style'] = 'scrollbar-color: #333 #111;' soup.find('html')['style'] = 'scrollbar-color: #333 #111;color:#fff !important;background:#000 !important'
for input_element in soup.findAll('input'): for input_element in soup.findAll('input'):
input_element['style'] = 'color:#fff;' input_element['style'] = 'color:#fff;background:#000;'
def update_links(self, soup): for span_element in soup.findAll('span'):
# Replace hrefs with only the intended destination (no "utm" type tags) span_element['style'] = 'color: white;'
for a in soup.find_all('a', href=True):
href = a['href'].replace('https://www.google.com', '')
if '/advanced_search' in href:
a.decompose()
continue
result_link = urlparse.urlparse(href) for href_element in soup.findAll('a'):
query_link = parse_qs(result_link.query)['q'][0] if '?q=' in href else '' href_element['style'] = 'color: white' if href_element['href'].startswith('/search') else ''
if '/search?q=' in href: def update_link(self, link):
enc_result = Fernet(self.secret_key).encrypt(query_link.encode()) # Replace href with only the intended destination (no "utm" type tags)
new_search = '/search?q=' + enc_result.decode() href = link['href'].replace('https://www.google.com', '')
if '/advanced_search' in href:
link.decompose()
return
elif self.new_tab:
link['target'] = '_blank'
query_params = parse_qs(urlparse.urlparse(href).query) result_link = urlparse.urlparse(href)
for param in VALID_PARAMS: query_link = parse_qs(result_link.query)['q'][0] if '?q=' in href else ''
param_val = query_params[param][0] if param in query_params else ''
new_search += '&' + param + '=' + param_val
a['href'] = new_search
elif 'url?q=' in href:
# Strip unneeded arguments
parsed_link = urlparse.urlparse(query_link)
link_args = parse_qs(parsed_link.query)
safe_args = {}
if len(link_args) == 0 and len(parsed_link) > 0: if query_link.startswith('/'):
a['href'] = query_link link['href'] = 'https://google.com' + query_link
continue elif '/search?q=' in href:
new_search = '/search?q=' + self.encrypt_path(query_link)
for arg in link_args.keys(): query_params = parse_qs(urlparse.urlparse(href).query)
if arg in SKIP_ARGS: for param in VALID_PARAMS:
continue param_val = query_params[param][0] if param in query_params else ''
new_search += '&' + param + '=' + param_val
link['href'] = new_search
elif 'url?q=' in href:
# Strip unneeded arguments
link['href'] = filter_link_args(query_link)
safe_args[arg] = link_args[arg] # Add no-js option
if self.nojs:
# Remove original link query and replace with filtered args gen_nojs(link)
query_link = query_link.replace(parsed_link.query, '') else:
if len(safe_args) > 0: link['href'] = href
query_link = query_link + urlparse.urlencode(safe_args, doseq=True)
else:
query_link = query_link.replace('?', '')
a['href'] = query_link
# Add no-js option
if self.nojs:
gen_nojs(soup, query_link, a)
else:
a['href'] = href
def gen_nojs(soup, link, sibling): def gen_nojs(sibling):
nojs_link = soup.new_tag('a') nojs_link = BeautifulSoup().new_tag('a')
nojs_link['href'] = '/window?location=' + link nojs_link['href'] = '/window?location=' + sibling['href']
nojs_link['style'] = 'display:block;width:100%;' nojs_link['style'] = 'display:block;width:100%;'
nojs_link.string = 'NoJS Link: ' + nojs_link['href'] nojs_link.string = 'NoJS Link: ' + nojs_link['href']
sibling.append(BeautifulSoup('<br><hr><br>', 'html.parser')) sibling.append(BeautifulSoup('<br><hr><br>', 'html.parser'))

0
app/models/__init__.py Normal file
View File

323
app/models/config.py Normal file
View File

@ -0,0 +1,323 @@
class Config:
# Derived from here:
# https://sites.google.com/site/tomihasa/google-language-codes#searchlanguage
LANGUAGES = [
{'name': 'English', 'value': 'lang_en'},
{'name': 'Afrikaans', 'value': 'lang_af'},
{'name': 'Arabic', 'value': 'lang_ar'},
{'name': 'Armenian', 'value': 'lang_hy'},
{'name': 'Belarusian', 'value': 'lang_be'},
{'name': 'Bulgarian', 'value': 'lang_bg'},
{'name': 'Catalan', 'value': 'lang_ca'},
{'name': 'Chinese (Simplified)', 'value': 'lang_zh-CN'},
{'name': 'Chinese (Traditional)', 'value': 'lang_zh-TW'},
{'name': 'Croatian', 'value': 'lang_hr'},
{'name': 'Czech', 'value': 'lang_cs'},
{'name': 'Danish', 'value': 'lang_da'},
{'name': 'Dutch', 'value': 'lang_nl'},
{'name': 'Esperanto', 'value': 'lang_eo'},
{'name': 'Estonian', 'value': 'lang_et'},
{'name': 'Filipino', 'value': 'lang_tl'},
{'name': 'Finnish', 'value': 'lang_fi'},
{'name': 'French', 'value': 'lang_fr'},
{'name': 'German', 'value': 'lang_de'},
{'name': 'Greek', 'value': 'lang_el'},
{'name': 'Hebrew', 'value': 'lang_iw'},
{'name': 'Hindi', 'value': 'lang_hi'},
{'name': 'Hungarian', 'value': 'lang_hu'},
{'name': 'Icelandic', 'value': 'lang_is'},
{'name': 'Indonesian', 'value': 'lang_id'},
{'name': 'Italian', 'value': 'lang_it'},
{'name': 'Japanese', 'value': 'lang_ja'},
{'name': 'Korean', 'value': 'lang_ko'},
{'name': 'Latvian', 'value': 'lang_lv'},
{'name': 'Lithuanian', 'value': 'lang_lt'},
{'name': 'Norwegian', 'value': 'lang_no'},
{'name': 'Persian', 'value': 'lang_fa'},
{'name': 'Polish', 'value': 'lang_pl'},
{'name': 'Portuguese', 'value': 'lang_pt'},
{'name': 'Romanian', 'value': 'lang_ro'},
{'name': 'Russian', 'value': 'lang_ru'},
{'name': 'Serbian', 'value': 'lang_sr'},
{'name': 'Slovak', 'value': 'lang_sk'},
{'name': 'Slovenian', 'value': 'lang_sl'},
{'name': 'Spanish', 'value': 'lang_es'},
{'name': 'Swahili', 'value': 'lang_sw'},
{'name': 'Swedish', 'value': 'lang_sv'},
{'name': 'Thai', 'value': 'lang_th'},
{'name': 'Turkish', 'value': 'lang_tr'},
{'name': 'Ukrainian', 'value': 'lang_uk'},
{'name': 'Vietnamese', 'value': 'lang_vi'},
]
COUNTRIES = [
{'name': 'Default (use server location)', 'value': ''},
{'name': 'Afghanistan', 'value': 'countryAF'},
{'name': 'Albania', 'value': 'countryAL'},
{'name': 'Algeria', 'value': 'countryDZ'},
{'name': 'American Samoa', 'value': 'countryAS'},
{'name': 'Andorra', 'value': 'countryAD'},
{'name': 'Angola', 'value': 'countryAO'},
{'name': 'Anguilla', 'value': 'countryAI'},
{'name': 'Antarctica', 'value': 'countryAQ'},
{'name': 'Antigua and Barbuda', 'value': 'countryAG'},
{'name': 'Argentina', 'value': 'countryAR'},
{'name': 'Armenia', 'value': 'countryAM'},
{'name': 'Aruba', 'value': 'countryAW'},
{'name': 'Australia', 'value': 'countryAU'},
{'name': 'Austria', 'value': 'countryAT'},
{'name': 'Azerbaijan', 'value': 'countryAZ'},
{'name': 'Bahamas', 'value': 'countryBS'},
{'name': 'Bahrain', 'value': 'countryBH'},
{'name': 'Bangladesh', 'value': 'countryBD'},
{'name': 'Barbados', 'value': 'countryBB'},
{'name': 'Belarus', 'value': 'countryBY'},
{'name': 'Belgium', 'value': 'countryBE'},
{'name': 'Belize', 'value': 'countryBZ'},
{'name': 'Benin', 'value': 'countryBJ'},
{'name': 'Bermuda', 'value': 'countryBM'},
{'name': 'Bhutan', 'value': 'countryBT'},
{'name': 'Bolivia', 'value': 'countryBO'},
{'name': 'Bosnia and Herzegovina', 'value': 'countryBA'},
{'name': 'Botswana', 'value': 'countryBW'},
{'name': 'Bouvet Island', 'value': 'countryBV'},
{'name': 'Brazil', 'value': 'countryBR'},
{'name': 'British Indian Ocean Territory', 'value': 'countryIO'},
{'name': 'Brunei Darussalam', 'value': 'countryBN'},
{'name': 'Bulgaria', 'value': 'countryBG'},
{'name': 'Burkina Faso', 'value': 'countryBF'},
{'name': 'Burundi', 'value': 'countryBI'},
{'name': 'Cambodia', 'value': 'countryKH'},
{'name': 'Cameroon', 'value': 'countryCM'},
{'name': 'Canada', 'value': 'countryCA'},
{'name': 'Cape Verde', 'value': 'countryCV'},
{'name': 'Cayman Islands', 'value': 'countryKY'},
{'name': 'Central African Republic', 'value': 'countryCF'},
{'name': 'Chad', 'value': 'countryTD'},
{'name': 'Chile', 'value': 'countryCL'},
{'name': 'China', 'value': 'countryCN'},
{'name': 'Christmas Island', 'value': 'countryCX'},
{'name': 'Cocos (Keeling) Islands', 'value': 'countryCC'},
{'name': 'Colombia', 'value': 'countryCO'},
{'name': 'Comoros', 'value': 'countryKM'},
{'name': 'Congo', 'value': 'countryCG'},
{'name': 'Congo, Democratic Republic of the', 'value': 'countryCD'},
{'name': 'Cook Islands', 'value': 'countryCK'},
{'name': 'Costa Rica', 'value': 'countryCR'},
{'name': 'Cote D\'ivoire', 'value': 'countryCI'},
{'name': 'Croatia (Hrvatska)', 'value': 'countryHR'},
{'name': 'Cuba', 'value': 'countryCU'},
{'name': 'Cyprus', 'value': 'countryCY'},
{'name': 'Czech Republic', 'value': 'countryCZ'},
{'name': 'Denmark', 'value': 'countryDK'},
{'name': 'Djibouti', 'value': 'countryDJ'},
{'name': 'Dominica', 'value': 'countryDM'},
{'name': 'Dominican Republic', 'value': 'countryDO'},
{'name': 'East Timor', 'value': 'countryTP'},
{'name': 'Ecuador', 'value': 'countryEC'},
{'name': 'Egypt', 'value': 'countryEG'},
{'name': 'El Salvador', 'value': 'countrySV'},
{'name': 'Equatorial Guinea', 'value': 'countryGQ'},
{'name': 'Eritrea', 'value': 'countryER'},
{'name': 'Estonia', 'value': 'countryEE'},
{'name': 'Ethiopia', 'value': 'countryET'},
{'name': 'European Union', 'value': 'countryEU'},
{'name': 'Falkland Islands (Malvinas)', 'value': 'countryFK'},
{'name': 'Faroe Islands', 'value': 'countryFO'},
{'name': 'Fiji', 'value': 'countryFJ'},
{'name': 'Finland', 'value': 'countryFI'},
{'name': 'France', 'value': 'countryFR'},
{'name': 'France\, Metropolitan', 'value': 'countryFX'},
{'name': 'French Guiana', 'value': 'countryGF'},
{'name': 'French Polynesia', 'value': 'countryPF'},
{'name': 'French Southern Territories', 'value': 'countryTF'},
{'name': 'Gabon', 'value': 'countryGA'},
{'name': 'Gambia', 'value': 'countryGM'},
{'name': 'Georgia', 'value': 'countryGE'},
{'name': 'Germany', 'value': 'countryDE'},
{'name': 'Ghana', 'value': 'countryGH'},
{'name': 'Gibraltar', 'value': 'countryGI'},
{'name': 'Greece', 'value': 'countryGR'},
{'name': 'Greenland', 'value': 'countryGL'},
{'name': 'Grenada', 'value': 'countryGD'},
{'name': 'Guadeloupe', 'value': 'countryGP'},
{'name': 'Guam', 'value': 'countryGU'},
{'name': 'Guatemala', 'value': 'countryGT'},
{'name': 'Guinea', 'value': 'countryGN'},
{'name': 'Guinea-Bissau', 'value': 'countryGW'},
{'name': 'Guyana', 'value': 'countryGY'},
{'name': 'Haiti', 'value': 'countryHT'},
{'name': 'Heard Island and Mcdonald Islands', 'value': 'countryHM'},
{'name': 'Holy See (Vatican City State)', 'value': 'countryVA'},
{'name': 'Honduras', 'value': 'countryHN'},
{'name': 'Hong Kong', 'value': 'countryHK'},
{'name': 'Hungary', 'value': 'countryHU'},
{'name': 'Iceland', 'value': 'countryIS'},
{'name': 'India', 'value': 'countryIN'},
{'name': 'Indonesia', 'value': 'countryID'},
{'name': 'Iran, Islamic Republic of', 'value': 'countryIR'},
{'name': 'Iraq', 'value': 'countryIQ'},
{'name': 'Ireland', 'value': 'countryIE'},
{'name': 'Israel', 'value': 'countryIL'},
{'name': 'Italy', 'value': 'countryIT'},
{'name': 'Jamaica', 'value': 'countryJM'},
{'name': 'Japan', 'value': 'countryJP'},
{'name': 'Jordan', 'value': 'countryJO'},
{'name': 'Kazakhstan', 'value': 'countryKZ'},
{'name': 'Kenya', 'value': 'countryKE'},
{'name': 'Kiribati', 'value': 'countryKI'},
{'name': 'Korea, Democratic People\'s Republic of', 'value': 'countryKP'},
{'name': 'Korea, Republic of', 'value': 'countryKR'},
{'name': 'Kuwait', 'value': 'countryKW'},
{'name': 'Kyrgyzstan', 'value': 'countryKG'},
{'name': 'Lao People\'s Democratic Republic', 'value': 'countryLA'},
{'name': 'Latvia', 'value': 'countryLV'},
{'name': 'Lebanon', 'value': 'countryLB'},
{'name': 'Lesotho', 'value': 'countryLS'},
{'name': 'Liberia', 'value': 'countryLR'},
{'name': 'Libyan Arab Jamahiriya', 'value': 'countryLY'},
{'name': 'Liechtenstein', 'value': 'countryLI'},
{'name': 'Lithuania', 'value': 'countryLT'},
{'name': 'Luxembourg', 'value': 'countryLU'},
{'name': 'Macao', 'value': 'countryMO'},
{'name': 'Macedonia, the Former Yugosalv Republic of', 'value': 'countryMK'},
{'name': 'Madagascar', 'value': 'countryMG'},
{'name': 'Malawi', 'value': 'countryMW'},
{'name': 'Malaysia', 'value': 'countryMY'},
{'name': 'Maldives', 'value': 'countryMV'},
{'name': 'Mali', 'value': 'countryML'},
{'name': 'Malta', 'value': 'countryMT'},
{'name': 'Marshall Islands', 'value': 'countryMH'},
{'name': 'Martinique', 'value': 'countryMQ'},
{'name': 'Mauritania', 'value': 'countryMR'},
{'name': 'Mauritius', 'value': 'countryMU'},
{'name': 'Mayotte', 'value': 'countryYT'},
{'name': 'Mexico', 'value': 'countryMX'},
{'name': 'Micronesia, Federated States of', 'value': 'countryFM'},
{'name': 'Moldova, Republic of', 'value': 'countryMD'},
{'name': 'Monaco', 'value': 'countryMC'},
{'name': 'Mongolia', 'value': 'countryMN'},
{'name': 'Montserrat', 'value': 'countryMS'},
{'name': 'Morocco', 'value': 'countryMA'},
{'name': 'Mozambique', 'value': 'countryMZ'},
{'name': 'Myanmar', 'value': 'countryMM'},
{'name': 'Namibia', 'value': 'countryNA'},
{'name': 'Nauru', 'value': 'countryNR'},
{'name': 'Nepal', 'value': 'countryNP'},
{'name': 'Netherlands', 'value': 'countryNL'},
{'name': 'Netherlands Antilles', 'value': 'countryAN'},
{'name': 'New Caledonia', 'value': 'countryNC'},
{'name': 'New Zealand', 'value': 'countryNZ'},
{'name': 'Nicaragua', 'value': 'countryNI'},
{'name': 'Niger', 'value': 'countryNE'},
{'name': 'Nigeria', 'value': 'countryNG'},
{'name': 'Niue', 'value': 'countryNU'},
{'name': 'Norfolk Island', 'value': 'countryNF'},
{'name': 'Northern Mariana Islands', 'value': 'countryMP'},
{'name': 'Norway', 'value': 'countryNO'},
{'name': 'Oman', 'value': 'countryOM'},
{'name': 'Pakistan', 'value': 'countryPK'},
{'name': 'Palau', 'value': 'countryPW'},
{'name': 'Palestinian Territory', 'value': 'countryPS'},
{'name': 'Panama', 'value': 'countryPA'},
{'name': 'Papua New Guinea', 'value': 'countryPG'},
{'name': 'Paraguay', 'value': 'countryPY'},
{'name': 'Peru', 'value': 'countryPE'},
{'name': 'Philippines', 'value': 'countryPH'},
{'name': 'Pitcairn', 'value': 'countryPN'},
{'name': 'Poland', 'value': 'countryPL'},
{'name': 'Portugal', 'value': 'countryPT'},
{'name': 'Puerto Rico', 'value': 'countryPR'},
{'name': 'Qatar', 'value': 'countryQA'},
{'name': 'Reunion', 'value': 'countryRE'},
{'name': 'Romania', 'value': 'countryRO'},
{'name': 'Russian Federation', 'value': 'countryRU'},
{'name': 'Rwanda', 'value': 'countryRW'},
{'name': 'Saint Helena', 'value': 'countrySH'},
{'name': 'Saint Kitts and Nevis', 'value': 'countryKN'},
{'name': 'Saint Lucia', 'value': 'countryLC'},
{'name': 'Saint Pierre and Miquelon', 'value': 'countryPM'},
{'name': 'Saint Vincent and the Grenadines', 'value': 'countryVC'},
{'name': 'Samoa', 'value': 'countryWS'},
{'name': 'San Marino', 'value': 'countrySM'},
{'name': 'Sao Tome and Principe', 'value': 'countryST'},
{'name': 'Saudi Arabia', 'value': 'countrySA'},
{'name': 'Senegal', 'value': 'countrySN'},
{'name': 'Serbia and Montenegro', 'value': 'countryCS'},
{'name': 'Seychelles', 'value': 'countrySC'},
{'name': 'Sierra Leone', 'value': 'countrySL'},
{'name': 'Singapore', 'value': 'countrySG'},
{'name': 'Slovakia', 'value': 'countrySK'},
{'name': 'Slovenia', 'value': 'countrySI'},
{'name': 'Solomon Islands', 'value': 'countrySB'},
{'name': 'Somalia', 'value': 'countrySO'},
{'name': 'South Africa', 'value': 'countryZA'},
{'name': 'South Georgia and the South Sandwich Islands', 'value': 'countryGS'},
{'name': 'Spain', 'value': 'countryES'},
{'name': 'Sri Lanka', 'value': 'countryLK'},
{'name': 'Sudan', 'value': 'countrySD'},
{'name': 'Suriname', 'value': 'countrySR'},
{'name': 'Svalbard and Jan Mayen', 'value': 'countrySJ'},
{'name': 'Swaziland', 'value': 'countrySZ'},
{'name': 'Sweden', 'value': 'countrySE'},
{'name': 'Switzerland', 'value': 'countryCH'},
{'name': 'Syrian Arab Republic', 'value': 'countrySY'},
{'name': 'Taiwan, Province of China', 'value': 'countryTW'},
{'name': 'Tajikistan', 'value': 'countryTJ'},
{'name': 'Tanzania, United Republic of', 'value': 'countryTZ'},
{'name': 'Thailand', 'value': 'countryTH'},
{'name': 'Togo', 'value': 'countryTG'},
{'name': 'Tokelau', 'value': 'countryTK'},
{'name': 'Tonga', 'value': 'countryTO'},
{'name': 'Trinidad and Tobago', 'value': 'countryTT'},
{'name': 'Tunisia', 'value': 'countryTN'},
{'name': 'Turkey', 'value': 'countryTR'},
{'name': 'Turkmenistan', 'value': 'countryTM'},
{'name': 'Turks and Caicos Islands', 'value': 'countryTC'},
{'name': 'Tuvalu', 'value': 'countryTV'},
{'name': 'Uganda', 'value': 'countryUG'},
{'name': 'Ukraine', 'value': 'countryUA'},
{'name': 'United Arab Emirates', 'value': 'countryAE'},
{'name': 'United Kingdom', 'value': 'countryUK'},
{'name': 'United States', 'value': 'countryUS'},
{'name': 'United States Minor Outlying Islands', 'value': 'countryUM'},
{'name': 'Uruguay', 'value': 'countryUY'},
{'name': 'Uzbekistan', 'value': 'countryUZ'},
{'name': 'Vanuatu', 'value': 'countryVU'},
{'name': 'Venezuela', 'value': 'countryVE'},
{'name': 'Vietnam', 'value': 'countryVN'},
{'name': 'Virgin Islands, British', 'value': 'countryVG'},
{'name': 'Virgin Islands, U.S.', 'value': 'countryVI'},
{'name': 'Wallis and Futuna', 'value': 'countryWF'},
{'name': 'Western Sahara', 'value': 'countryEH'},
{'name': 'Yemen', 'value': 'countryYE'},
{'name': 'Yugoslavia', 'value': 'countryYU'},
{'name': 'Zambia', 'value': 'countryZM'},
{'name': 'Zimbabwe', 'value': 'countryZW'}
]
def __init__(self, **kwargs):
self.url = ''
self.lang = 'lang_en'
self.ctry = ''
self.safe = False
self.dark = False
self.nojs = False
self.near = ''
self.new_tab = False
self.get_only = False
for key, value in kwargs.items():
setattr(self, key, value)
def __getitem__(self, name):
return getattr(self, name)
def __setitem__(self, name, value):
return setattr(self, name, value)
def __delitem__(self, name):
return delattr(self, name)
def __contains__(self, name):
return hasattr(self, name)

View File

@ -1,38 +1,49 @@
from app import rhyme from lxml import etree
from io import BytesIO import random
import pycurl import requests
from requests import Response
import urllib.parse as urlparse import urllib.parse as urlparse
# Base search url # Core Google search URLs
SEARCH_URL = 'https://www.google.com/search?gbv=1&q=' SEARCH_URL = 'https://www.google.com/search?gbv=1&q='
AUTOCOMPLETE_URL = 'https://suggestqueries.google.com/complete/search?client=toolbar&'
MOBILE_UA = '{}/5.0 (Android 0; Mobile; rv:54.0) Gecko/54.0 {}/59.0' MOBILE_UA = '{}/5.0 (Android 0; Mobile; rv:54.0) Gecko/54.0 {}/59.0'
DESKTOP_UA = '{}/5.0 (X11; {} x86_64; rv:75.0) Gecko/20100101 {}/75.0' DESKTOP_UA = '{}/5.0 (X11; {} x86_64; rv:75.0) Gecko/20100101 {}/75.0'
# Valid query params # Valid query params
VALID_PARAMS = ['tbs', 'tbm', 'start', 'near'] VALID_PARAMS = ['tbs', 'tbm', 'start', 'near', 'source']
def gen_user_agent(normal_ua): def gen_user_agent(is_mobile):
is_mobile = 'Android' in normal_ua or 'iPhone' in normal_ua mozilla = random.choice(['Moo', 'Woah', 'Bro', 'Slow']) + 'zilla'
firefox = random.choice(['Choir', 'Squier', 'Higher', 'Wire']) + 'fox'
mozilla = rhyme.get_rhyme('Mo') + rhyme.get_rhyme('zilla') linux = random.choice(['Win', 'Sin', 'Gin', 'Fin', 'Kin']) + 'ux'
firefox = rhyme.get_rhyme('Fire') + rhyme.get_rhyme('fox')
linux = rhyme.get_rhyme('Lin') + 'ux'
if is_mobile: if is_mobile:
return MOBILE_UA.format(mozilla, firefox) return MOBILE_UA.format(mozilla, firefox)
else:
return DESKTOP_UA.format(mozilla, linux, firefox) return DESKTOP_UA.format(mozilla, linux, firefox)
def gen_query(query, args, near_city=None): def gen_query(query, args, config, near_city=None):
param_dict = {key: '' for key in VALID_PARAMS} param_dict = {key: '' for key in VALID_PARAMS}
# Use :past(hour/day/week/month/year) if available # Use :past(hour/day/week/month/year) if available
# example search "new restaurants :past month" # example search "new restaurants :past month"
if ':past' in query: sub_lang = ''
if ':past' in query and 'tbs' not in args:
time_range = str.strip(query.split(':past', 1)[-1]) time_range = str.strip(query.split(':past', 1)[-1])
param_dict['tbs'] = '&tbs=qdr:' + str.lower(time_range[0]) param_dict['tbs'] = '&tbs=' + ('qdr:' + str.lower(time_range[0]))
elif 'tbs' in args:
result_tbs = args.get('tbs')
param_dict['tbs'] = '&tbs=' + result_tbs
# Occasionally the 'tbs' param provided by google also contains a field for 'lr', but formatted
# strangely. This is a (admittedly not very elegant) solution for this.
# Ex/ &tbs=qdr:h,lr:lang_1pl --> the lr param needs to be extracted and have the "1" digit removed in this case
sub_lang = [_ for _ in result_tbs.split(',') if 'lr:' in _]
sub_lang = sub_lang[0][sub_lang[0].find('lr:') + 3:len(sub_lang[0])] if len(sub_lang) > 0 else ''
# Ensure search query is parsable # Ensure search query is parsable
query = urlparse.quote(query) query = urlparse.quote(query)
@ -46,11 +57,23 @@ def gen_query(query, args, near_city=None):
param_dict['start'] = '&start=' + args.get('start') param_dict['start'] = '&start=' + args.get('start')
# Search for results near a particular city, if available # Search for results near a particular city, if available
if near_city is not None: if near_city:
param_dict['near'] = '&near=' + urlparse.quote(near_city) param_dict['near'] = '&near=' + urlparse.quote(near_city)
# Set language for results (lr) if source isn't set, otherwise use the result
# language param provided by google (but with the strange digit(s) removed)
if 'source' in args:
param_dict['source'] = '&source=' + args.get('source')
param_dict['lr'] = ('&lr=' + ''.join([_ for _ in sub_lang if not _.isdigit()])) if sub_lang else ''
else:
param_dict['lr'] = '&lr=' + config.lang
param_dict['cr'] = ('&cr=' + config.ctry) if config.ctry else ''
param_dict['hl'] = '&hl=' + config.lang.replace('lang_', '')
param_dict['safe'] = '&safe=' + ('active' if config.safe else 'off')
for val in param_dict.values(): for val in param_dict.values():
if not val or val is None: if not val:
continue continue
query += val query += val
@ -58,26 +81,27 @@ def gen_query(query, args, near_city=None):
class Request: class Request:
def __init__(self, normal_ua): def __init__(self, normal_ua, language='lang_en'):
self.modified_user_agent = gen_user_agent(normal_ua) self.language = language
self.mobile = 'Android' in normal_ua or 'iPhone' in normal_ua
self.modified_user_agent = gen_user_agent(self.mobile)
def __getitem__(self, name): def __getitem__(self, name):
return getattr(self, name) return getattr(self, name)
def send(self, base_url=SEARCH_URL, query='', return_bytes=False): def autocomplete(self, query):
response_header = [] ac_query = dict(hl=self.language, q=query)
response = self.send(base_url=AUTOCOMPLETE_URL, query=urlparse.urlencode(ac_query)).text
b_obj = BytesIO() if response:
crl = pycurl.Curl() dom = etree.fromstring(response)
crl.setopt(crl.URL, base_url + query) return dom.xpath('//suggestion/@data')
crl.setopt(crl.USERAGENT, self.modified_user_agent)
crl.setopt(crl.WRITEDATA, b_obj)
crl.setopt(crl.HEADERFUNCTION, response_header.append)
crl.setopt(pycurl.FOLLOWLOCATION, 1)
crl.perform()
crl.close()
if return_bytes: return []
return b_obj.getvalue()
else: def send(self, base_url=SEARCH_URL, query='') -> Response:
return b_obj.getvalue().decode('unicode-escape', 'ignore') headers = {
'User-Agent': self.modified_user_agent
}
return requests.get(base_url + query, headers=headers)

View File

@ -1,25 +0,0 @@
import itertools
from Phyme import Phyme
import random
import sys
import time
random.seed(time.time())
ph = Phyme()
def get_rhyme(word):
# Get all rhymes and merge to one list (normally separated by syllable count)
rhymes = ph.get_perfect_rhymes(word)
rhyme_vals = list(itertools.chain.from_iterable(list(rhymes.values())))
# Pick a random rhyme and strip out any non alpha characters
rhymed_word = rhyme_vals[random.randint(0, len(rhyme_vals) - 1)]
rhymed_word = ''.join(letter for letter in rhymed_word if letter.isalpha())
return rhymed_word.capitalize()
if __name__ == '__main__':
print(get_rhyme(sys.argv[1]))

View File

@ -1,91 +1,207 @@
from app import app
from app.filter import Filter
from app.request import Request, gen_query
import argparse import argparse
from bs4 import BeautifulSoup import base64
from cryptography.fernet import Fernet, InvalidToken
from flask import g, make_response, request, redirect, render_template, send_file
import io import io
import json import json
import os import os
import pickle
import urllib.parse as urlparse import urllib.parse as urlparse
import uuid
from functools import wraps
app.config['APP_ROOT'] = os.getenv('APP_ROOT', os.path.dirname(os.path.abspath(__file__))) import waitress
app.config['STATIC_FOLDER'] = os.getenv('STATIC_FOLDER', os.path.join(app.config['APP_ROOT'], 'static')) from flask import jsonify, make_response, request, redirect, render_template, send_file, session
from requests import exceptions
CONFIG_PATH = app.config['STATIC_FOLDER'] + '/config.json' from app import app
from app.models.config import Config
from app.request import Request
from app.utils.misc import valid_user_session
from app.utils.routing_utils import *
def auth_required(f):
@wraps(f)
def decorated(*args, **kwargs):
auth = request.authorization
# Skip if username/password not set
whoogle_user = os.getenv('WHOOGLE_USER', '')
whoogle_pass = os.getenv('WHOOGLE_PASS', '')
if (not whoogle_user or not whoogle_pass) or \
(auth and whoogle_user == auth.username and whoogle_pass == auth.password):
return f(*args, **kwargs)
else:
return make_response('Not logged in', 401, {'WWW-Authenticate': 'Basic realm="Login Required"'})
return decorated
@app.before_request @app.before_request
def before_request_func(): def before_request_func():
g.user_request = Request(request.headers.get('User-Agent')) g.request_params = request.args if request.method == 'GET' else request.form
g.user_config = json.load(open(CONFIG_PATH)) if os.path.exists(CONFIG_PATH) else {} g.cookies_disabled = False
# Generate session values for user if unavailable
if not valid_user_session(session):
session['config'] = json.load(open(app.config['DEFAULT_CONFIG'])) \
if os.path.exists(app.config['DEFAULT_CONFIG']) else {'url': request.url_root}
session['uuid'] = str(uuid.uuid4())
session['fernet_keys'] = generate_user_keys(True)
# Flag cookies as possibly disabled in order to prevent against
# unnecessary session directory expansion
g.cookies_disabled = True
if session['uuid'] not in app.user_elements:
app.user_elements.update({session['uuid']: 0})
# Always redirect to https if HTTPS_ONLY is set (otherwise default to False)
https_only = os.getenv('HTTPS_ONLY', False)
if https_only and request.url.startswith('http://'):
return redirect(request.url.replace('http://', 'https://', 1), code=308)
g.user_config = Config(**session['config'])
if not g.user_config.url:
g.user_config.url = request.url_root.replace('http://', 'https://') if https_only else request.url_root
g.user_request = Request(request.headers.get('User-Agent'), language=g.user_config.lang)
g.app_location = g.user_config.url
@app.after_request
def after_request_func(response):
if app.user_elements[session['uuid']] <= 0 and '/element' in request.url:
# Regenerate element key if all elements have been served to user
session['fernet_keys']['element_key'] = '' if not g.cookies_disabled else app.default_key_set['element_key']
app.user_elements[session['uuid']] = 0
# Check if address consistently has cookies blocked, in which case start removing session
# files after creation.
# Note: This is primarily done to prevent overpopulation of session directories, since browsers that
# block cookies will still trigger Flask's session creation routine with every request.
if g.cookies_disabled and request.remote_addr not in app.no_cookie_ips:
app.no_cookie_ips.append(request.remote_addr)
elif g.cookies_disabled and request.remote_addr in app.no_cookie_ips:
session_list = list(session.keys())
for key in session_list:
session.pop(key)
return response
@app.errorhandler(404) @app.errorhandler(404)
def unknown_page(e): def unknown_page(e):
return redirect('/') return redirect(g.app_location)
@app.route('/', methods=['GET']) @app.route('/', methods=['GET'])
@auth_required
def index(): def index():
bg = '#000' if 'dark' in g.user_config and g.user_config['dark'] else '#fff' # Reset keys
return render_template('index.html', bg=bg, ua=g.user_request.modified_user_agent) session['fernet_keys'] = generate_user_keys(g.cookies_disabled)
return render_template('index.html',
languages=Config.LANGUAGES,
countries=Config.COUNTRIES,
config=g.user_config,
version_number=app.config['VERSION_NUMBER'])
@app.route('/opensearch.xml', methods=['GET']) @app.route('/opensearch.xml', methods=['GET'])
@auth_required
def opensearch(): def opensearch():
url_root = request.url_root opensearch_url = g.app_location
if url_root.endswith('/'): if opensearch_url.endswith('/'):
url_root = url_root[:-1] opensearch_url = opensearch_url[:-1]
template = render_template('opensearch.xml', main_url=url_root) template = render_template('opensearch.xml',
main_url=opensearch_url,
request_type='get' if g.user_config.get_only else 'post')
response = make_response(template) response = make_response(template)
response.headers['Content-Type'] = 'application/xml' response.headers['Content-Type'] = 'application/xml'
return response return response
@app.route('/autocomplete', methods=['GET', 'POST'])
def autocomplete():
q = g.request_params.get('q')
if not q and not request.data:
return jsonify({'?': []})
elif request.data:
q = urlparse.unquote_plus(request.data.decode('utf-8').replace('q=', ''))
return jsonify([q, g.user_request.autocomplete(q)])
@app.route('/search', methods=['GET', 'POST']) @app.route('/search', methods=['GET', 'POST'])
@auth_required
def search(): def search():
request_params = request.args if request.method == 'GET' else request.form # Reset element counter
q = request_params.get('q') app.user_elements[session['uuid']] = 0
if q is None or len(q) == 0: search_util = RoutingUtils(request, g.user_config, session, cookies_disabled=g.cookies_disabled)
query = search_util.new_search_query()
# Redirect to home if invalid/blank search
if not query:
return redirect('/') return redirect('/')
else:
# Attempt to decrypt if this is an internal link
try:
q = Fernet(app.secret_key).decrypt(q.encode()).decode()
except InvalidToken:
pass
user_agent = request.headers.get('User-Agent') # Generate response and number of external elements from the page
mobile = 'Android' in user_agent or 'iPhone' in user_agent response, elements = search_util.generate_response()
if search_util.feeling_lucky:
return redirect(response, code=303)
content_filter = Filter(mobile, g.user_config, secret_key=app.secret_key) # Keep count of external elements to fetch before element key can be regenerated
full_query = gen_query(q, request_params, content_filter.near) app.user_elements[session['uuid']] = elements
get_body = g.user_request.send(query=full_query)
results = content_filter.reskin(get_body) return render_template(
formatted_results = content_filter.clean(BeautifulSoup(results, 'html.parser')) 'display.html',
query=urlparse.unquote(query),
return render_template('display.html', query=urlparse.unquote(q), response=formatted_results) search_type=search_util.search_type,
dark_mode=g.user_config.dark,
response=response,
version_number=app.config['VERSION_NUMBER'],
search_header=render_template(
'header.html',
dark_mode=g.user_config.dark,
query=urlparse.unquote(query),
search_type=search_util.search_type,
mobile=g.user_request.mobile) if 'isch' not in search_util.search_type else '')
@app.route('/config', methods=['GET', 'POST']) @app.route('/config', methods=['GET', 'POST', 'PUT'])
@auth_required
def config(): def config():
if request.method == 'GET': if request.method == 'GET':
return json.dumps(g.user_config) return json.dumps(g.user_config.__dict__)
elif request.method == 'PUT':
if 'name' in request.args:
config_pkl = os.path.join(app.config['CONFIG_PATH'], request.args.get('name'))
session['config'] = pickle.load(open(config_pkl, 'rb')) if os.path.exists(config_pkl) else session['config']
return json.dumps(session['config'])
else:
return json.dumps({})
else: else:
config_data = request.form.to_dict() config_data = request.form.to_dict()
with open(app.config['STATIC_FOLDER'] + '/config.json', 'w') as config_file: if 'url' not in config_data or not config_data['url']:
config_file.write(json.dumps(config_data, indent=4)) config_data['url'] = g.user_config.url
config_file.close()
return redirect('/') # Save config by name to allow a user to easily load later
if 'name' in request.args:
pickle.dump(config_data, open(os.path.join(app.config['CONFIG_PATH'], request.args.get('name')), 'wb'))
# Overwrite default config if user has cookies disabled
if g.cookies_disabled:
open(app.config['DEFAULT_CONFIG'], 'w').write(json.dumps(config_data, indent=4))
session['config'] = config_data
return redirect(config_data['url'])
@app.route('/url', methods=['GET']) @app.route('/url', methods=['GET'])
@auth_required
def url(): def url():
if 'url' in request.args: if 'url' in request.args:
return redirect(request.args.get('url')) return redirect(request.args.get('url'))
@ -98,30 +214,37 @@ def url():
@app.route('/imgres') @app.route('/imgres')
@auth_required
def imgres(): def imgres():
return redirect(request.args.get('imgurl')) return redirect(request.args.get('imgurl'))
@app.route('/tmp') @app.route('/element')
def tmp(): @auth_required
cipher_suite = Fernet(app.secret_key) def element():
img_url = cipher_suite.decrypt(request.args.get('image_url').encode()).decode() cipher_suite = Fernet(session['fernet_keys']['element_key'])
file_data = g.user_request.send(base_url=img_url, return_bytes=True) src_url = cipher_suite.decrypt(request.args.get('url').encode()).decode()
tmp_mem = io.BytesIO() src_type = request.args.get('type')
tmp_mem.write(file_data)
tmp_mem.seek(0)
return send_file( try:
tmp_mem, file_data = g.user_request.send(base_url=src_url).content
as_attachment=True, app.user_elements[session['uuid']] -= 1
attachment_filename='tmp.png', tmp_mem = io.BytesIO()
mimetype='image/png' tmp_mem.write(file_data)
) tmp_mem.seek(0)
return send_file(tmp_mem, mimetype=src_type)
except exceptions.RequestException:
pass
empty_gif = base64.b64decode('R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==')
return send_file(io.BytesIO(empty_gif), mimetype='image/gif')
@app.route('/window') @app.route('/window')
@auth_required
def window(): def window():
get_body = g.user_request.send(base_url=request.args.get('location')) get_body = g.user_request.send(base_url=request.args.get('location')).text
get_body = get_body.replace('src="/', 'src="' + request.args.get('location') + '"') get_body = get_body.replace('src="/', 'src="' + request.args.get('location') + '"')
get_body = get_body.replace('href="/', 'href="' + request.args.get('location') + '"') get_body = get_body.replace('href="/', 'href="' + request.args.get('location') + '"')
@ -138,12 +261,26 @@ def window():
def run_app(): def run_app():
parser = argparse.ArgumentParser(description='Whoogle Search console runner') parser = argparse.ArgumentParser(description='Whoogle Search console runner')
parser.add_argument('--port', default=8888, metavar='<port number>', parser.add_argument('--port', default=5000, metavar='<port number>',
help='Specifies a port to run on (default 8888)') help='Specifies a port to run on (default 5000)')
parser.add_argument('--host', default='127.0.0.1', metavar='<ip address>', parser.add_argument('--host', default='127.0.0.1', metavar='<ip address>',
help='Specifies the host address to use (default 127.0.0.1)') help='Specifies the host address to use (default 127.0.0.1)')
parser.add_argument('--debug', default=False, action='store_true', parser.add_argument('--debug', default=False, action='store_true',
help='Activates debug mode for the Flask server (default False)') help='Activates debug mode for the server (default False)')
parser.add_argument('--https-only', default=False, action='store_true',
help='Enforces HTTPS redirects for all requests')
parser.add_argument('--userpass', default='', metavar='<username:password>',
help='Sets a username/password basic auth combo (default None)')
args = parser.parse_args() args = parser.parse_args()
app.run(host=args.host, port=args.port, debug=args.debug) if args.userpass:
user_pass = args.userpass.split(':')
os.environ['WHOOGLE_USER'] = user_pass[0]
os.environ['WHOOGLE_PASS'] = user_pass[1]
os.environ['HTTPS_ONLY'] = '1' if args.https_only else ''
if args.debug:
app.run(host=args.host, port=args.port, debug=args.debug)
else:
waitress.serve(app, listen="{}:{}".format(args.host, args.port))

55
app/static/css/header.css Normal file
View File

@ -0,0 +1,55 @@
header {
font-family: Roboto,HelveticaNeue,Arial,sans-serif;
font-size: 14px;
line-height: 20px;
color: #3C4043;
word-wrap: break-word;
}
.logo-link, .logo-letter {
text-decoration: none !important;
letter-spacing: -1px;
text-align: center;
border-radius: 2px 0 0 0;
}
.mobile-logo {
font: 22px/36px Futura, Arial, sans-serif;
padding-left: 5px;
}
.logo-div {
letter-spacing: -1px;
text-align: center;
font: 22pt Futura, Arial, sans-serif;
padding: 10px 0 5px 0;
height: 37px;
font-smoothing: antialiased;
}
.search-div {
border-radius: 8px 8px 0 0;
box-shadow: 0 1px 6px rgba(32, 33, 36, 0.18);
margin-top: 10px;
}
.search-form {
height: 39px;
display: flex;
width: 100%;
}
.search-input {
background: none;
margin: 2px 4px 2px 8px;
display: block;
font-size: 16px;
padding: 0 0 0 8px;
flex: 1;
height: 35px;
outline: none;
border: none;
width: 100%;
-webkit-tap-highlight-color: rgba(0,0,0,0);
overflow: hidden;
}

View File

@ -1,3 +1,7 @@
body {
font-family: Avenir, Helvetica, Arial, sans-serif;
}
.logo { .logo {
width: 80%; width: 80%;
display: block; display: block;
@ -113,3 +117,15 @@ button::-moz-focus-inner {
-webkit-box-decoration-break: clone; -webkit-box-decoration-break: clone;
box-decoration-break: clone; box-decoration-break: clone;
} }
.hidden {
display: none;
}
footer {
position: fixed;
bottom: 0%;
text-align: center;
width: 100%;
z-index: -1;
}

View File

@ -0,0 +1,35 @@
.autocomplete {
position: relative;
display: inline-block;
width: 100%;
}
.autocomplete-items {
position: absolute;
border: 1px solid #685e79;
border-bottom: none;
border-top: none;
z-index: 99;
/*position the autocomplete items to be the same width as the container:*/
top: 100%;
left: 0;
right: 0;
}
.autocomplete-items div {
padding: 10px;
cursor: pointer;
color: #fff;
background-color: #000;
border-bottom: 1px solid #242424;
}
.autocomplete-items div:hover {
background-color: #404040;
}
.autocomplete-active {
background-color: #685e79 !important;
color: #ffffff;
}

34
app/static/css/search.css Normal file
View File

@ -0,0 +1,34 @@
.autocomplete {
position: relative;
display: inline-block;
width: 100%;
}
.autocomplete-items {
position: absolute;
border: 1px solid #d4d4d4;
border-bottom: none;
border-top: none;
z-index: 99;
/*position the autocomplete items to be the same width as the container:*/
top: 100%;
left: 0;
right: 0;
}
.autocomplete-items div {
padding: 10px;
cursor: pointer;
background-color: #fff;
border-bottom: 1px solid #d4d4d4;
}
.autocomplete-items div:hover {
background-color: #e9e9e9;
}
.autocomplete-active {
background-color: #685e79 !important;
color: #ffffff;
}

View File

@ -0,0 +1,98 @@
const handleUserInput = searchBar => {
let xhrRequest = new XMLHttpRequest();
xhrRequest.open("POST", "/autocomplete");
xhrRequest.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
xhrRequest.onload = function() {
if (xhrRequest.readyState === 4 && xhrRequest.status !== 200) {
// Do nothing if failed to fetch autocomplete results
return;
}
// Fill autocomplete with fetched results
let autocompleteResults = JSON.parse(xhrRequest.responseText);
autocomplete(searchBar, autocompleteResults[1]);
};
xhrRequest.send('q=' + searchBar.value);
};
const autocomplete = (searchInput, autocompleteResults) => {
let currentFocus;
searchInput.addEventListener("input", function () {
let autocompleteList, autocompleteItem, i, val = this.value;
closeAllLists();
if (!val || !autocompleteResults) {
return false;
}
currentFocus = -1;
autocompleteList = document.createElement("div");
autocompleteList.setAttribute("id", this.id + "-autocomplete-list");
autocompleteList.setAttribute("class", "autocomplete-items");
this.parentNode.appendChild(autocompleteList);
for (i = 0; i < autocompleteResults.length; i++) {
if (autocompleteResults[i].substr(0, val.length).toUpperCase() === val.toUpperCase()) {
autocompleteItem = document.createElement("div");
autocompleteItem.innerHTML = "<strong>" + autocompleteResults[i].substr(0, val.length) + "</strong>";
autocompleteItem.innerHTML += autocompleteResults[i].substr(val.length);
autocompleteItem.innerHTML += "<input type=\"hidden\" value=\"" + autocompleteResults[i] + "\">";
autocompleteItem.addEventListener("click", function () {
searchInput.value = this.getElementsByTagName("input")[0].value;
closeAllLists();
document.getElementById("search-form").submit();
});
autocompleteList.appendChild(autocompleteItem);
}
}
});
searchInput.addEventListener("keydown", function (e) {
let suggestion = document.getElementById(this.id + "-autocomplete-list");
if (suggestion) suggestion = suggestion.getElementsByTagName("div");
if (e.keyCode === 40) { // down
currentFocus++;
addActive(suggestion);
} else if (e.keyCode === 38) { //up
currentFocus--;
addActive(suggestion);
} else if (e.keyCode === 13) { // enter
e.preventDefault();
if (currentFocus > -1) {
if (suggestion) suggestion[currentFocus].click();
}
}
});
const addActive = suggestion => {
if (!suggestion || !suggestion[currentFocus]) return false;
removeActive(suggestion);
if (currentFocus >= suggestion.length) currentFocus = 0;
if (currentFocus < 0) currentFocus = (suggestion.length - 1);
suggestion[currentFocus].classList.add("autocomplete-active");
};
const removeActive = suggestion => {
for (let i = 0; i < suggestion.length; i++) {
suggestion[i].classList.remove("autocomplete-active");
}
};
const closeAllLists = el => {
let suggestions = document.getElementsByClassName("autocomplete-items");
for (let i = 0; i < suggestions.length; i++) {
if (el !== suggestions[i] && el !== searchInput) {
suggestions[i].parentNode.removeChild(suggestions[i]);
}
}
};
// Close lists and search when user selects a suggestion
document.addEventListener("click", function (e) {
closeAllLists(e.target);
});
};

View File

@ -11,11 +11,22 @@ const setupSearchLayout = () => {
if (event.keyCode === 13) { if (event.keyCode === 13) {
event.preventDefault(); event.preventDefault();
searchBtn.click(); searchBtn.click();
} else {
handleUserInput(searchBar);
} }
}); });
} };
const fillConfigValues = () => {
// Establish all config value elements
const near = document.getElementById("config-near");
const noJS = document.getElementById("config-nojs");
const dark = document.getElementById("config-dark");
const safe = document.getElementById("config-safe");
const url = document.getElementById("config-url");
const newTab = document.getElementById("config-new-tab");
const getOnly = document.getElementById("config-get-only");
const fillConfigValues = (near, nojs, dark) => {
// Request existing config info // Request existing config info
let xhrGET = new XMLHttpRequest(); let xhrGET = new XMLHttpRequest();
xhrGET.open("GET", "/config"); xhrGET.open("GET", "/config");
@ -29,23 +40,18 @@ const fillConfigValues = (near, nojs, dark) => {
let configSettings = JSON.parse(xhrGET.responseText); let configSettings = JSON.parse(xhrGET.responseText);
near.value = configSettings["near"] ? configSettings["near"] : ""; near.value = configSettings["near"] ? configSettings["near"] : "";
near.addEventListener("keyup", function() { noJS.checked = !!configSettings["nojs"];
configSettings["near"] = near.value;
});
nojs.checked = !!configSettings["nojs"];
nojs.addEventListener("change", function() {
configSettings["nojs"] = nojs.checked ? 1 : 0;
});
dark.checked = !!configSettings["dark"]; dark.checked = !!configSettings["dark"];
dark.addEventListener("change", function() { safe.checked = !!configSettings["safe"];
configSettings["dark"] = dark.checked ? 1 : 0; getOnly.checked = !!configSettings["get_only"];
}); newTab.checked = !!configSettings["new_tab"];
// Addresses the issue of incorrect URL being used behind reverse proxy
url.value = configSettings["url"] ? configSettings["url"] : "";
}; };
xhrGET.send(); xhrGET.send();
} };
const setupConfigLayout = () => { const setupConfigLayout = () => {
// Setup whoogle config // Setup whoogle config
@ -62,12 +68,43 @@ const setupConfigLayout = () => {
content.classList.toggle("open"); content.classList.toggle("open");
}); });
const near = document.getElementById("config-near"); fillConfigValues();
const noJS = document.getElementById("config-nojs"); };
const dark = document.getElementById("config-dark");
fillConfigValues(near, noJS, dark); const loadConfig = event => {
} event.preventDefault();
let config = prompt("Enter name of config:");
if (!config) {
alert("Must specify a name for the config to load");
return;
}
let xhrPUT = new XMLHttpRequest();
xhrPUT.open("PUT", "/config?name=" + config + ".conf");
xhrPUT.onload = function() {
if (xhrPUT.readyState === 4 && xhrPUT.status !== 200) {
alert("Error loading Whoogle config");
return;
}
location.reload(true);
};
xhrPUT.send();
};
const saveConfig = event => {
event.preventDefault();
let config = prompt("Enter name for this config:");
if (!config) {
alert("Must specify a name for the config to save");
return;
}
let configForm = document.getElementById("config-form");
configForm.action = '/config?name=' + config + ".conf";
configForm.submit();
};
document.addEventListener("DOMContentLoaded", function() { document.addEventListener("DOMContentLoaded", function() {
setTimeout(function() { setTimeout(function() {

View File

@ -5,9 +5,19 @@
<link rel="search" href="/opensearch.xml" type="application/opensearchdescription+xml" title="Whoogle Search"> <link rel="search" href="/opensearch.xml" type="application/opensearchdescription+xml" title="Whoogle Search">
<meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="referrer" content="no-referrer"> <meta name="referrer" content="no-referrer">
<script type="text/javascript" src="/static/js/autocomplete.js"></script>
<link rel="stylesheet" href="/static/css/{{ 'search-dark' if dark_mode else 'search' }}.css">
<link rel="stylesheet" href="/static/css/header.css">
<title>{{ query }} - Whoogle Search</title> <title>{{ query }} - Whoogle Search</title>
</head> </head>
<body> <body>
{{ response|safe }} {{ search_header|safe }}
{{ response|safe }}
</body> </body>
<footer>
<p style="color: {{ '#fff' if dark_mode else '#000' }};">
Whoogle Search v{{ version_number }} ||
<a style="color: #685e79" href="https://github.com/benbusby/whoogle-search">View on GitHub</a>
</p>
</footer>
</html> </html>

63
app/templates/header.html Normal file
View File

@ -0,0 +1,63 @@
{% if mobile %}
<header>
<div class="bz1lBb">
<form class="Pg70bf" id="search-form" method="POST">
<a class="logo-link mobile-logo"
href="/"
style="display:flex; justify-content:center; align-items:center; color:#685e79; font-size:18px; ">
<span class="V6gwVd">Wh</span><span class="iWkuvd">o</span><span class="cDrQ7">o</span><span
class="V6gwVd">g</span><span class="ntlR9">l</span><span
class="iWkuvd tJ3Myc">e</span>
</a>
<div class="H0PQec" style="width: 100%;">
<div class="sbc esbc autocomplete">
<input id="search-bar" autocapitalize="none" autocomplete="off" class="noHIxc" name="q"
style="background-color: {{ '#000' if dark_mode else '#fff' }};
color: {{ '#685e79' if dark_mode else '#000' }};
border: {{ '1px solid #685e79' if dark_mode else '' }}"
spellcheck="false" type="text" value="{{ query }}">
<input name="tbm" value="{{ search_type }}" style="display: none">
<div class="sc"></div>
</div>
</div>
</form>
</div>
</header>
{% else %}
<header>
<div class="logo-div">
<a class="logo-link" href="/">
<span class="V6gwVd logo-letter">Wh</span><span class="iWkuvd logo-letter">o</span><span
class="cDrQ7 logo-letter">o</span><span class="V6gwVd logo-letter">g</span><span
class="ntlR9 logo-letter">l</span><span class="iWkuvd tJ3Myc logo-letter">e</span>
</a>
</div>
<div class="search-div">
<form id="search-form" class="search-form" id="sf" method="POST">
<div class="autocomplete" style="width: 100%; flex: 1">
<div style="width: 100%; display: flex">
<input id="search-bar" autocapitalize="none" autocomplete="off" class="noHIxc" name="q"
spellcheck="false" type="text" value="{{ query }}"
style="background-color: {{ '#000' if dark_mode else '#fff' }};
color: {{ '#685e79' if dark_mode else '#000' }};
border: {{ '1px solid #685e79' if dark_mode else '' }}">
<input name="tbm" value="{{ search_type }}" style="display: none">
<div class="sc"></div>
</div>
</div>
</form>
</div>
</header>
{% endif %}
<script>
const searchBar = document.getElementById("search-bar");
searchBar.addEventListener("keyup", function (event) {
if (event.keyCode !== 13) {
handleUserInput(searchBar);
} else {
document.getElementById("search-form").submit();
}
});
</script>

View File

@ -17,18 +17,22 @@
<meta name="referrer" content="no-referrer"> <meta name="referrer" content="no-referrer">
<meta name="msapplication-TileColor" content="#ffffff"> <meta name="msapplication-TileColor" content="#ffffff">
<meta name="msapplication-TileImage" content="/static/img/favicon/ms-icon-144x144.png"> <meta name="msapplication-TileImage" content="/static/img/favicon/ms-icon-144x144.png">
<script type="text/javascript" src="/static/js/autocomplete.js"></script>
<script type="text/javascript" src="/static/js/controller.js"></script> <script type="text/javascript" src="/static/js/controller.js"></script>
<link rel="search" href="/opensearch.xml" type="application/opensearchdescription+xml" title="Whoogle Search"> <link rel="search" href="/opensearch.xml" type="application/opensearchdescription+xml" title="Whoogle Search">
<meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="/static/css/{{ 'search-dark' if config.dark else 'search' }}.css">
<link rel="stylesheet" href="/static/css/main.css"> <link rel="stylesheet" href="/static/css/main.css">
<title>Whoogle Search</title> <title>Whoogle Search</title>
</head> </head>
<body id="main" style="display: none; background-color: {{ bg }}"> <body id="main" style="display: none; background-color: {{ '#000' if config.dark else '#fff' }}">
<div class="search-container"> <div class="search-container">
<img class="logo" src="/static/img/logo.png"> <img class="logo" src="/static/img/logo.png">
<form action="/search" method="post"> <form id="search-form" action="/search" method="{{ 'get' if config.get_only else 'post' }}">
<div class="search-fields"> <div class="search-fields">
<input type="text" name="q" id="search-bar"> <div class="autocomplete">
<input type="text" name="q" id="search-bar" autofocus="autofocus">
</div>
<input type="submit" id="search-submit" value="Search"> <input type="submit" id="search-submit" value="Search">
</div> </div>
</form> </form>
@ -36,10 +40,32 @@
<button id="config-collapsible" class="collapsible">Configuration</button> <button id="config-collapsible" class="collapsible">Configuration</button>
<div class="content"> <div class="content">
<div class="config-fields"> <div class="config-fields">
<form action="/config" method="post"> <form id="config-form" action="/config" method="post">
<div class="config-div"> <div class="config-div">
<!-- TODO: Add option to regenerate user agent? --> <label for="config-ctry">Country: </label>
<span class="ua-span">User Agent: {{ ua }}</span> <select name="ctry" id="config-ctry">
{% for ctry in countries %}
<option value="{{ ctry.value }}"
{% if ctry.value in config.ctry %}
selected
{% endif %}>
{{ ctry.name }}
</option>
{% endfor %}
</select>
</div>
<div class="config-div">
<label for="config-lang">Language: </label>
<select name="lang" id="config-lang">
{% for lang in languages %}
<option value="{{ lang.value }}"
{% if lang.value in config.lang %}
selected
{% endif %}>
{{ lang.name }}
</option>
{% endfor %}
</select>
</div> </div>
<div class="config-div"> <div class="config-div">
<label for="config-near">Near: </label> <label for="config-near">Near: </label>
@ -54,12 +80,35 @@
<input type="checkbox" name="dark" id="config-dark"> <input type="checkbox" name="dark" id="config-dark">
</div> </div>
<div class="config-div"> <div class="config-div">
<input type="submit" id="config-submit" value="Save"> <label for="config-safe">Safe Search: </label>
<input type="checkbox" name="safe" id="config-safe">
</div>
<div class="config-div">
<label for="config-new-tab">Open Links in New Tab: </label>
<input type="checkbox" name="new_tab" id="config-new-tab">
</div>
<div class="config-div">
<label for="config-get-only">GET Requests Only: </label>
<input type="checkbox" name="get_only" id="config-get-only">
</div>
<div class="config-div">
<label for="config-url">Root URL: </label>
<input type="text" name="url" id="config-url" value="">
</div>
<div class="config-div">
<input type="submit" id="config-load" onclick="loadConfig(event)" value="Load">&nbsp;
<input type="submit" id="config-submit" value="Apply">&nbsp;
<input type="submit" id="config-submit" onclick="saveConfig(event)" value="Save As...">
</div> </div>
</form> </form>
</div> </div>
</div> </div>
</div> </div>
<footer>
<p style="color: {{ '#fff' if config.dark else '#000' }};">
Whoogle Search v{{ version_number }} ||
<a style="color: #685e79" href="https://github.com/benbusby/whoogle-search">View on GitHub</a>
</p>
</footer>
</body> </body>
</html> </html>

View File

@ -4,10 +4,12 @@
<Description>Whoogle: A lightweight, deployable Google search proxy for desktop/mobile that removes Javascript, AMP links, and ads</Description> <Description>Whoogle: A lightweight, deployable Google search proxy for desktop/mobile that removes Javascript, AMP links, and ads</Description>
<InputEncoding>UTF-8</InputEncoding> <InputEncoding>UTF-8</InputEncoding>
<Image width="32" height="32" type="image/x-icon">/static/img/favicon/favicon-32x32.png</Image> <Image width="32" height="32" type="image/x-icon">/static/img/favicon/favicon-32x32.png</Image>
<Url type="text/html" method="post" template="{{ main_url }}/search"> <Url type="text/html" method="{{ request_type }}" template="{{ main_url }}/search">
<Param name="q" value="{searchTerms}"/>
</Url>
<Url type="application/x-suggestions+json" method="{{ request_type }}" template="{{ main_url }}/autocomplete">
<Param name="q" value="{searchTerms}"/> <Param name="q" value="{searchTerms}"/>
</Url> </Url>
<Url type="application/x-suggestions+json" template="{{ main_url }}/search"/>
<moz:SearchForm>{{ main_url }}/search</moz:SearchForm> <moz:SearchForm>{{ main_url }}/search</moz:SearchForm>
</OpenSearchDescription> </OpenSearchDescription>

0
app/utils/__init__.py Normal file
View File

29
app/utils/misc.py Normal file
View File

@ -0,0 +1,29 @@
from cryptography.fernet import Fernet
from flask import current_app as app
REQUIRED_SESSION_VALUES = ['uuid', 'config', 'fernet_keys']
BLACKLIST = [
'ad', 'anuncio', 'annuncio', 'annonce', 'Anzeige', '广告', '廣告', 'Reklama', 'Реклама', 'Anunț', '광고',
'annons', 'Annonse', 'Iklan', '広告', 'Augl.', 'Mainos', 'Advertentie', 'إعلان', 'Գովազդ', 'विज्ञापन', 'Reklam',
'آگهی', 'Reklāma', 'Reklaam', 'Διαφήμιση', 'מודעה', 'Hirdetés'
]
def generate_user_keys(cookies_disabled=False) -> dict:
if cookies_disabled:
return app.default_key_set
# Generate/regenerate unique key per user
return {
'element_key': Fernet.generate_key(),
'text_key': Fernet.generate_key()
}
def valid_user_session(session):
# Generate secret key for user if unavailable
for value in REQUIRED_SESSION_VALUES:
if value not in session:
return False
return True

View File

@ -0,0 +1,72 @@
from app.filter import Filter, get_first_link
from app.utils.misc import generate_user_keys
from app.request import gen_query
from bs4 import BeautifulSoup
from cryptography.fernet import Fernet, InvalidToken
from flask import g
from typing import Any, Tuple
class RoutingUtils:
def __init__(self, request, config, session, cookies_disabled=False):
self.request_params = request.args if request.method == 'GET' else request.form
self.user_agent = request.headers.get('User-Agent')
self.feeling_lucky = False
self.config = config
self.session = session
self.query = ''
self.cookies_disabled = cookies_disabled
self.search_type = self.request_params.get('tbm') if 'tbm' in self.request_params else ''
def __getitem__(self, name):
return getattr(self, name)
def __setitem__(self, name, value):
return setattr(self, name, value)
def __delitem__(self, name):
return delattr(self, name)
def __contains__(self, name):
return hasattr(self, name)
def new_search_query(self) -> str:
# Generate a new element key each time a new search is performed
self.session['fernet_keys']['element_key'] = generate_user_keys(
cookies_disabled=self.cookies_disabled)['element_key']
q = self.request_params.get('q')
if q is None or len(q) == 0:
return ''
else:
# Attempt to decrypt if this is an internal link
try:
q = Fernet(self.session['fernet_keys']['text_key']).decrypt(q.encode()).decode()
except InvalidToken:
pass
# Reset text key
self.session['fernet_keys']['text_key'] = generate_user_keys(
cookies_disabled=self.cookies_disabled)['text_key']
# Format depending on whether or not the query is a "feeling lucky" query
self.feeling_lucky = q.startswith('! ')
self.query = q[2:] if self.feeling_lucky else q
return self.query
def generate_response(self) -> Tuple[Any, int]:
mobile = 'Android' in self.user_agent or 'iPhone' in self.user_agent
content_filter = Filter(self.session['fernet_keys'], mobile=mobile, config=self.config)
full_query = gen_query(self.query, self.request_params, self.config, content_filter.near)
get_body = g.user_request.send(query=full_query).text
# Produce cleanable html soup from response
html_soup = BeautifulSoup(content_filter.reskin(get_body), 'html.parser')
if self.feeling_lucky:
return get_first_link(html_soup), 1
else:
formatted_results = content_filter.clean(html_soup)
return formatted_results, content_filter.elements

9
docker-compose.yml Normal file
View File

@ -0,0 +1,9 @@
version: "3"
services:
whoogle-search:
image: benbusby/whoogle-search
container_name: whoogle-search
ports:
- 5000:5000
restart: unless-stopped

View File

@ -4,15 +4,17 @@ cffi==1.13.2
Click==7.0 Click==7.0
cryptography==2.8 cryptography==2.8
Flask==1.1.1 Flask==1.1.1
Flask-Session==0.3.2
itsdangerous==1.1.0 itsdangerous==1.1.0
Jinja2==2.10.3 Jinja2==2.10.3
lxml==4.5.1
MarkupSafe==1.1.1 MarkupSafe==1.1.1
Phyme==0.0.9
pycparser==2.19 pycparser==2.19
pycurl==7.43.0.4
pyOpenSSL==19.1.0 pyOpenSSL==19.1.0
pytest==5.4.1 pytest==5.4.1
python-dateutil==2.8.1 python-dateutil==2.8.1
requests==2.23.0
six==1.14.0 six==1.14.0
soupsieve==1.9.5 soupsieve==1.9.5
Werkzeug==0.16.0 Werkzeug==0.16.0
waitress==1.4.3

24
run Executable file
View File

@ -0,0 +1,24 @@
#!/bin/bash
# Usage:
# ./run # Runs the full web app
# ./run test # Runs the testing suite
set -euo pipefail
SCRIPT_DIR="$(builtin cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
# Set directory to serve static content from
SUBDIR="${1:-app}"
export APP_ROOT="$SCRIPT_DIR/$SUBDIR"
export STATIC_FOLDER="$APP_ROOT/static"
mkdir -p "$STATIC_FOLDER"
# Check for regular vs test run
if [[ "$SUBDIR" == "test" ]]; then
pytest -sv
else
python3 -um app \
--host "${ADDRESS:-0.0.0.0}" \
--port "${PORT:-"${EXPOSE_PORT:-5000}"}"
fi

View File

@ -8,11 +8,10 @@ setuptools.setup(
author='Ben Busby', author='Ben Busby',
author_email='benbusby@protonmail.com', author_email='benbusby@protonmail.com',
name='whoogle-search', name='whoogle-search',
version='0.1.0', version='0.2.0',
scripts=['whoogle-search'],
include_package_data=True, include_package_data=True,
install_requires=requirements, install_requires=requirements,
description='Self-hosted, ad-free, privacy-respecting alternative to Google search', description='Self-hosted, ad-free, privacy-respecting Google metasearch engine',
long_description=long_description, long_description=long_description,
long_description_content_type='text/markdown', long_description_content_type='text/markdown',
url='https://github.com/benbusby/whoogle-search', url='https://github.com/benbusby/whoogle-search',

View File

@ -1,8 +1,13 @@
from app import app from app import app
from app.utils.misc import generate_user_keys
import pytest import pytest
@pytest.fixture @pytest.fixture
def client(): def client():
client = app.test_client() with app.test_client() as client:
yield client with client.session_transaction() as session:
session['uuid'] = 'test'
session['fernet_keys'] = generate_user_keys()
session['config'] = {}
yield client

12
test/test_autocomplete.py Normal file
View File

@ -0,0 +1,12 @@
def test_autocomplete_get(client):
rv = client.get('/autocomplete?q=green+eggs+and')
assert rv._status_code == 200
assert len(rv.data) >= 1
assert b'green eggs and ham' in rv.data
def test_autocomplete_post(client):
rv = client.post('/autocomplete', data=dict(q='the+cat+in+the'))
assert rv._status_code == 200
assert len(rv.data) >= 1
assert b'the cat in the hat' in rv.data

33
test/test_misc.py Normal file
View File

@ -0,0 +1,33 @@
from app.utils.misc import generate_user_keys, valid_user_session
def test_generate_user_keys():
keys = generate_user_keys()
assert 'text_key' in keys
assert 'element_key' in keys
assert keys['text_key'] not in keys['element_key']
def test_valid_session(client):
assert not valid_user_session({'fernet_keys': '', 'config': {}})
with client.session_transaction() as session:
assert valid_user_session(session)
def test_request_key_generation(client):
rv = client.get('/')
cookie = rv.headers['Set-Cookie']
rv = client.get('/search?q=test+1', headers={'Cookie': cookie})
assert rv._status_code == 200
with client.session_transaction() as session:
assert valid_user_session(session)
text_key = session['fernet_keys']['text_key']
rv = client.get('/search?q=test+2', headers={'Cookie': cookie})
assert rv._status_code == 200
with client.session_transaction() as session:
assert valid_user_session(session)
assert text_key not in session['fernet_keys']['text_key']

View File

@ -1,13 +1,13 @@
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
from cryptography.fernet import Fernet
from app.filter import Filter from app.filter import Filter
from app.utils.misc import generate_user_keys
from datetime import datetime from datetime import datetime
from dateutil.parser import * from dateutil.parser import *
def get_search_results(data): def get_search_results(data):
secret_key = Fernet.generate_key() secret_key = generate_user_keys()
soup = Filter(secret_key=secret_key).clean(BeautifulSoup(data, 'html.parser')) soup = Filter(user_keys=secret_key).clean(BeautifulSoup(data, 'html.parser'))
main_divs = soup.find('div', {'id': 'main'}) main_divs = soup.find('div', {'id': 'main'})
assert len(main_divs) > 1 assert len(main_divs) > 1
@ -62,6 +62,6 @@ def test_recent_results(client):
try: try:
date = parse(date_span) date = parse(date_span)
assert (current_date - date).days <= num_days assert (current_date - date).days <= (num_days + 5) # Date can have a little bit of wiggle room
except ParserError: except ParserError:
assert ' ago' in date_span pass

View File

@ -1,10 +1,13 @@
from app.models.config import Config
import json import json
import random import random
demo_config = { demo_config = {
'near': random.choice(['Seattle', 'New York', 'San Francisco']), 'near': random.choice(['Seattle', 'New York', 'San Francisco']),
'dark_mode': str(random.getrandbits(1)), 'dark_mode': str(random.getrandbits(1)),
'nojs': str(random.getrandbits(1)) 'nojs': str(random.getrandbits(1)),
'lang': random.choice(Config.LANGUAGES)['value'],
'ctry': random.choice(Config.COUNTRIES)['value']
} }
@ -18,6 +21,11 @@ def test_search(client):
assert rv._status_code == 200 assert rv._status_code == 200
def test_feeling_lucky(client):
rv = client.get('/search?q=!%20test')
assert rv._status_code == 303
def test_config(client): def test_config(client):
rv = client.post('/config', data=demo_config) rv = client.post('/config', data=demo_config)
assert rv._status_code == 302 assert rv._status_code == 302

View File

@ -1,27 +0,0 @@
#!/bin/bash
# Usage:
# ./whoogle-search # Runs the full web app
# ./whoogle-search test # Runs the testing suite
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd -P)"
# Set default port if unavailable
if [[ -z "${PORT}" ]]; then
PORT=5000
fi
# Set directory to serve static content from
[[ ! -z $1 ]] && SUBDIR="$1" || SUBDIR="app"
export APP_ROOT=$SCRIPT_DIR/$SUBDIR
export STATIC_FOLDER=$APP_ROOT/static
mkdir -p $STATIC_FOLDER
pkill flask
# Check for regular vs test run
if [[ $SUBDIR == "test" ]]; then
pytest -sv
else
flask run --host="0.0.0.0" --port=$PORT
fi