Merge branch 'master' of github.com:benbusby/whoogle-search

README.md
This commit is contained in:
Dee-Jay Logozzo 2020-08-28 22:15:29 +10:00
commit ed0c96823a
40 changed files with 1545 additions and 302 deletions

View File

@ -1 +1,2 @@
.git/
venv/

View File

@ -1,9 +1,9 @@
---
name: Bug report
about: Create a bug report to help improve Whoogle
title: "[BUG] "
about: Create a bug report to help fix an issue with Whoogle
title: "[BUG] <brief bug description>"
labels: bug
assignees: benbusby
assignees: ''
---
@ -17,11 +17,18 @@ Steps to reproduce the behavior:
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Deployment Method**
- [ ] Heroku (one-click deploy)
- [ ] Docker
- [ ] `run` executable
- [ ] pip/pipx
- [ ] Other: [describe setup]
**Version of Whoogle Search**
- [ ] Latest build from [source] (i.e. GitHub, Docker Hub, pip, etc)
- [ ] Version [version number]
- [ ] Not sure
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. iOS]

View File

@ -0,0 +1,17 @@
---
name: Feature request
about: Suggest a feature that would improve Whoogle
title: "[FEATURE] <description of feature>"
labels: enhancement
assignees: ''
---
**Describe the feature you'd like to see added**
A short description of the feature, and what it would accomplish.
**Describe which parts of the project this would modify (front end/back end/configuration/etc)**
A short description of which aspects of Whoogle Search would need modification
**Additional context**
Add any other context or screenshots about the feature request here.

10
.github/ISSUE_TEMPLATE/question.md vendored Normal file
View File

@ -0,0 +1,10 @@
---
name: Question
about: Ask a (simple) question about Whoogle
title: "[QUESTION] <question here>"
labels: question
assignees: ''
---
Type out your question here. Please make sure that this is a topic that isn't already covered in the README.

4
.gitignore vendored
View File

@ -3,8 +3,12 @@ venv/
__pycache__/
*.pyc
*.pem
*.conf
config.json
test/static
flask_session/
app/static/config
app/static/custom_config
# pip stuff
build/

View File

@ -5,4 +5,11 @@ before_install:
install:
- pip install -r requirements.txt
script:
- ./whoogle-search test
- "./run test"
deploy:
provider: pypi
user: __token__
password:
secure: WNEH2Gg84MZF/AZEberFDGPPWb4cYyHAeD/XV8En94QRSI9Aznz6qiDKOvV4eVgjMAIEW5uB3TL1LHf6KU+Hrg6SmhF7JquqP1gsBOCDNFPTljO+k2Hc53uDdSnhi/HLgY7cnFNX4lc2nNrbyxZxMHuSA2oNz/tosyNGBEeyU+JA5va7uX0albGsLiNjimO4aeau83fsI0Hn2eN6ag68pewUMXNxzpyTeO2bRcCd5d5iILs07jMVwFoC2j7W11oNqrVuSWAs8CPe4+kwvNvXWxljUGiBGppNZ7RAsKNLwi6U6kGGUTWjQm09rY/2JBpJ2WEGmIWGIrno75iiFRbjnRp3mnXPvtVTyWhh+hQIUd7bJOVKM34i9eHotYTrkMJObgW1gnRzvI9VYldtgL/iP/Isn2Pv2EeMX8V+C9/8pxv0jkQkZMnFhE6gGlzpz37zTl04B2J7xyV5znM35Lx2Pn3zxdcmdCvD3yT8I4MuBbKqq2/v4emYCfPfOmfwnS0BEVSqr9lbx4xfUZV76tcvLcj4n86DJbx77pA2Ch8FRprpOOBcf0WuqTbZp8c3mb8prFp2EupUknXu7+C2VQ6sqrnzNuDeTGm/nyjjRQ81rlvlD4tqkwsEGEDDO44FF2eUTc5D2MvoHs4cnz095FWjy63gn5IxUjhMi31b5tGRz2Q=
on:
tags: true

View File

@ -1,9 +1,28 @@
FROM python:3
FROM python:3.8-slim
WORKDIR /usr/src/app
RUN apt-get update && apt-get install -y build-essential libcurl4-openssl-dev libssl-dev
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
ARG config_dir=/config
RUN mkdir -p $config_dir
VOLUME $config_dir
ENV CONFIG_VOLUME=$config_dir
ARG username=''
ENV WHOOGLE_USER=$username
ARG password=''
ENV WHOOGLE_PASS=$password
ARG use_https=''
ENV HTTPS_ONLY=$use_https
ARG whoogle_port=5000
ENV EXPOSE_PORT=$whoogle_port
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
RUN chmod +x ./whoogle-search
EXPOSE $EXPOSE_PORT
CMD ["./whoogle-search"]
CMD ["./run"]

View File

@ -1,3 +1,4 @@
graft app/static
graft app/templates
include requirements.txt
global-exclude *.pyc

View File

@ -4,6 +4,7 @@
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://travis-ci.com/benbusby/whoogle-search.svg?branch=master)](https://travis-ci.com/benbusby/whoogle-search)
[![codebeat badge](https://codebeat.co/badges/e96cada2-fb6f-4528-8285-7d72abd74e8d)](https://codebeat.co/projects/github-com-benbusby-shoogle-master)
[![Docker Pulls](https://img.shields.io/docker/pulls/benbusby/whoogle-search)](https://hub.docker.com/r/benbusby/whoogle-search)
Get Google search results, but without any ads, javascript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile.
@ -24,7 +25,8 @@ Contents
- No AMP links
- No URL tracking tags (i.e. utm=%s)
- No referrer header
- POST request search queries (when possible)
- Autocomplete/search suggestions
- POST request search and suggestion queries (when possible)
- View images at full res without site redirect (currently mobile only)
- Dark mode
- Randomly generated User Agent
@ -47,7 +49,7 @@ If using Heroku Quick Deploy, **you can skip this section**.
There are a few different ways to begin using the app, depending on your preferences:
### A) [Heroku Quick Deploy](https://heroku.com/about)
[![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy?template=https://github.com/benbusby/whoogle-search)
[![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy?template=https://github.com/benbusby/whoogle-search/tree/heroku-app)
*Note: Requires a (free) Heroku account*
@ -57,11 +59,11 @@ Provides:
- Downtime after periods of inactivity \([solution](https://github.com/benbusby/whoogle-search#prevent-downtime-heroku-only)\)
### B) [pipx](https://github.com/pipxproject/pipx#install-pipx)
Persistent install:
Persistent install:
`pipx install git+https://github.com/benbusby/whoogle-search.git`
Sandboxed temporary instance:
Sandboxed temporary instance:
`pipx run git+https://github.com/benbusby/whoogle-search.git whoogle-search`
@ -71,14 +73,16 @@ Sandboxed temporary instance:
```bash
$ whoogle-search --help
usage: whoogle-search [-h] [--port <port number>] [--host <ip address>] [--debug]
[--https-only]
Whoogle Search console runner
optional arguments:
-h, --help show this help message and exit
--port <port number> Specifies a port to run on (default 8888)
--port <port number> Specifies a port to run on (default 5000)
--host <ip address> Specifies the host address to use (default 127.0.0.1)
--debug Activates debug mode for the Flask server (default False)
--debug Activates debug mode for the server (default False)
--https-only Enforces HTTPS redirects for all requests (default False)
```
### D) Manual
@ -90,7 +94,34 @@ cd whoogle-search
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
./whoogle-search
./run
```
#### systemd Configuration
After building the virtual environment, you can add the following to `/lib/systemd/system/whoogle.service` to set up a Whoogle Search systemd service:
```
[Unit]
Description=Whoogle
[Service]
Type=simple
User=root
WorkingDirectory=<whoogle_directory>
ExecStart=<whoogle_directory>/venv/bin/python3 -um app --host 0.0.0.0 --port 5000
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=3
SyslogIdentifier=whoogle
[Install]
WantedBy=multi-user.target
```
Then,
```
sudo systemctl daemon-reload
sudo systemctl enable whoogle
sudo systemctl start whoogle
```
### E) Manual (Docker)
@ -100,14 +131,30 @@ pip install -r requirements.txt
2. Clone and deploy the docker app using a method below:
#### Docker CLI
Through Docker Hub:
```bash
docker pull benbusby/whoogle-search
docker run --publish 5000:5000 --detach --name whoogle-search benbusby/whoogle-search:latest
```
or with docker-compose:
```bash
git clone https://github.com/benbusby/whoogle-search.git
cd whoogle-search
docker build --tag whooglesearch:1.0 .
docker run --publish 8888:5000 --detach --name whooglesearch whooglesearch:1.0
docker-compose up
```
And kill with: `docker rm --force whooglesearch`
or by building yourself:
```bash
git clone https://github.com/benbusby/whoogle-search.git
cd whoogle-search
docker build --tag whoogle-search:1.0 .
docker run --publish 5000:5000 --detach --name whoogle-search whoogle-search:1.0
```
And kill with: `docker rm --force whoogle-search`
#### Using [Heroku CLI](https://devcenter.heroku.com/articles/heroku-cli)
```bash
@ -139,6 +186,8 @@ To filter by a range of time, append ":past <time>" to the end of your search, w
## Extra Steps
### Set Whoogle as your primary search engine
*Note: If you're using a reverse proxy to run Whoogle Search, make sure the "Root URL" config option on the home page is set to your URL before going through these steps.*
Update browser settings:
- Firefox (Desktop)
- Navigate to your app's url, and click the 3 dot menu in the address bar. At the bottom, there should be an option to "Add Search Engine". Once you've clicked this, open your Firefox Preferences menu, click "Search" in the left menu, and use the available dropdown to select "Whoogle" from the list.
@ -161,6 +210,13 @@ Update browser settings:
- Select the 'Other' radio button
- Name: "Whoogle"
- Search string to use: "http[s]://\<your whoogle url\>/search?q=%s"
- [Alfred](https://www.alfredapp.com/) (Mac OS X)
1. Go to `Alfred Preferences` > `Features` > `Web Search` and click `Add Custom Search`. Then configure these settings
- Search URL: `https://\<your whoogle url\>/search?q={query}
- Title: `Whoogle for '{query}'` (or whatever you want)
- Keyword: `whoogle`
2. Go to `Default Results` and click the `Setup fallback results` button. Click `+` and add Whoogle, then drag it to the top.
- Others (TODO)
### Customizing and Configuration
@ -179,12 +235,24 @@ A good solution for this is to set up a simple cronjob on any device at your hom
For instance, adding `*/20 7-23 * * * curl https://<your heroku app name>.herokuapp.com > /home/<username>/whoogle-refresh` will fetch the home page of the app every 20 minutes between 7am and midnight, allowing for downtime from midnight to 7am. And again, this wouldn't be a hard limit - you'd still have plenty of remaining hours of uptime each month in case you were searching after this window has closed.
Since the instance is destroyed and rebuilt after inactivity, config settings will be reset once the app enters downtime. If you have configuration settings active that you'd like to keep between periods of downtime (like dark mode for example), you could instead add `*/20 7-23 * * * curl -d "dark=1" -X POST https://<your heroku app name>.herokuapp.com > /home/<username>/whoogle-refresh` to keep these settings more or less permanent, and still keep the app from entering downtime when you're using it.
Since the instance is destroyed and rebuilt after inactivity, config settings will be reset once the app enters downtime. If you have configuration settings active that you'd like to keep between periods of downtime (like dark mode for example), you could instead add `*/20 7-23 * * * curl -d "dark=1" -X POST https://<your heroku app name>.herokuapp.com/config > /home/<username>/whoogle-refresh` to keep these settings more or less permanent, and still keep the app from entering downtime when you're using it.
### HTTPS Enforcement
Only needed if your setup requires Flask to redirect to HTTPS on its own -- generally this is something that doesn't need to be handled by Whoogle Search.
Note: You should have your own domain name and [an https certificate](https://letsencrypt.org/getting-started/) in order for this to work properly.
- Heroku: Ensure that the `Root URL` configuration on the home page begins with `https://` and not `http://`
- Docker: Add `--build-arg use_https=1` to your run command
- Pip/Pipx: Add the `--https-only` flag to the end of the `whoogle-search` command
- Default `run` script: Modify the script locally to include the `--https-only` flag at the end of the python run command
Available config values are `near`, `nojs`, `dark` and `url`.
## FAQ
**What's the difference between this and [Searx](https://github.com/asciimoo/searx)?**
Whoogle is intended to only ever be deployed to private instances by individuals of any background, with as little effort as possible. Prior knowledge of/experience with the command line or deploying applications is not necessary to deploy Whoogle, which isn't the case with Searx. As a result, Whoole is missing some features of Searx in order to be as easy to deploy as possible.
Whoogle is intended to only ever be deployed to private instances by individuals of any background, with as little effort as possible. Prior knowledge of/experience with the command line or deploying applications is not necessary to deploy Whoogle, which isn't the case with Searx. As a result, Whoogle is missing some features of Searx in order to be as easy to deploy as possible.
Whoogle also only uses Google search results, not Bing/Quant/etc, and uses the existing Google search UI to make the transition away from Google search as unnoticeable as possible.

View File

@ -1,8 +1,27 @@
from cryptography.fernet import Fernet
from app.utils.misc import generate_user_keys
from flask import Flask
from flask_session import Session
import os
app = Flask(__name__, static_folder=os.path.dirname(os.path.abspath(__file__)) + '/static')
app.secret_key = Fernet.generate_key()
app.user_elements = {}
app.default_key_set = generate_user_keys()
app.no_cookie_ips = []
app.config['SECRET_KEY'] = os.urandom(32)
app.config['SESSION_TYPE'] = 'filesystem'
app.config['VERSION_NUMBER'] = '0.2.0'
app.config['APP_ROOT'] = os.getenv('APP_ROOT', os.path.dirname(os.path.abspath(__file__)))
app.config['STATIC_FOLDER'] = os.getenv('STATIC_FOLDER', os.path.join(app.config['APP_ROOT'], 'static'))
app.config['CONFIG_PATH'] = os.getenv('CONFIG_VOLUME', os.path.join(app.config['STATIC_FOLDER'], 'config'))
app.config['DEFAULT_CONFIG'] = os.path.join(app.config['CONFIG_PATH'], 'config.json')
app.config['SESSION_FILE_DIR'] = os.path.join(app.config['CONFIG_PATH'], 'session')
if not os.path.exists(app.config['CONFIG_PATH']):
os.makedirs(app.config['CONFIG_PATH'])
if not os.path.exists(app.config['SESSION_FILE_DIR']):
os.makedirs(app.config['SESSION_FILE_DIR'])
Session(app)
from app import routes

3
app/__main__.py Normal file
View File

@ -0,0 +1,3 @@
from .routes import run_app
run_app()

View File

@ -1,5 +1,7 @@
from app.request import VALID_PARAMS
from app.utils.misc import BLACKLIST
from bs4 import BeautifulSoup
from bs4.element import ResultSet
from cryptography.fernet import Fernet
import re
import urllib.parse as urlparse
@ -14,20 +16,63 @@ 
'''
def get_first_link(soup):
# Replace hrefs with only the intended destination (no "utm" type tags)
for a in soup.find_all('a', href=True):
# Return the first search result URL
if 'url?q=' in a['href']:
return filter_link_args(a['href'])
def filter_link_args(query_link):
parsed_link = urlparse.urlparse(query_link)
link_args = parse_qs(parsed_link.query)
safe_args = {}
if len(link_args) == 0 and len(parsed_link) > 0:
return query_link
for arg in link_args.keys():
if arg in SKIP_ARGS:
continue
safe_args[arg] = link_args[arg]
# Remove original link query and replace with filtered args
query_link = query_link.replace(parsed_link.query, '')
if len(safe_args) > 0:
query_link = query_link + urlparse.urlencode(safe_args, doseq=True)
else:
query_link = query_link.replace('?', '')
return query_link
def has_ad_content(element: str):
return element.upper() in (value.upper() for value in BLACKLIST) or '' in element
class Filter:
def __init__(self, mobile=False, config=None, secret_key=''):
def __init__(self, user_keys: dict, mobile=False, config=None):
if config is None:
config = {}
self.near = config['near'] if 'near' in config else None
self.near = config['near'] if 'near' in config else ''
self.dark = config['dark'] if 'dark' in config else False
self.nojs = config['nojs'] if 'nojs' in config else False
self.new_tab = config['new_tab'] if 'new_tab' in config else False
self.mobile = mobile
self.secret_key = secret_key
self.user_keys = user_keys
self.main_divs = ResultSet('')
self._elements = 0
def __getitem__(self, name):
return getattr(self, name)
@property
def elements(self):
return self._elements
def reskin(self, page):
# Aesthetic only re-skinning
page = page.replace('>G<', '>Wh<')
@ -38,11 +83,31 @@ class Filter:
return page
def encrypt_path(self, msg, is_element=False):
# Encrypts path to avoid plaintext results in logs
if is_element:
# Element paths are tracked differently in order for the element key to be regenerated
# once all elements have been loaded
enc_path = Fernet(self.user_keys['element_key']).encrypt(msg.encode()).decode()
self._elements += 1
return enc_path
return Fernet(self.user_keys['text_key']).encrypt(msg.encode()).decode()
def clean(self, soup):
self.remove_ads(soup)
self.update_image_paths(soup)
self.main_divs = soup.find('div', {'id': 'main'})
self.remove_ads()
self.fix_question_section()
self.update_styling(soup)
self.update_links(soup)
for img in [_ for _ in soup.find_all('img') if 'src' in _.attrs]:
self.update_element_src(img, 'image/png')
for audio in [_ for _ in soup.find_all('audio') if 'src' in _.attrs]:
self.update_element_src(audio, 'audio/mpeg')
for link in soup.find_all('a', href=True):
self.update_link(link)
input_form = soup.find('form')
if input_form is not None:
@ -52,43 +117,54 @@ class Filter:
for script in soup('script'):
script.decompose()
footer = soup.find('div', id='sfooter')
if footer is not None:
footer.decompose()
# Update default footer and header
footer = soup.find('footer')
if footer:
# Remove divs that have multiple links beyond just page navigation
[_.decompose() for _ in footer.find_all('div', recursive=False) if len(_.find_all('a', href=True)) > 2]
header = soup.find('header')
if header:
header.decompose()
return soup
def remove_ads(self, soup):
main_divs = soup.find('div', {'id': 'main'})
if main_divs is None:
def remove_ads(self):
if not self.main_divs:
return
result_divs = main_divs.find_all('div', recursive=False)
# Only ads/sponsored content use classes in the list of result divs
ad_divs = [ad_div for ad_div in result_divs if 'class' in ad_div.attrs]
for div in ad_divs:
div.decompose()
for div in [_ for _ in self.main_divs.find_all('div', recursive=True)]:
has_ad = len([_ for _ in div.find_all('span', recursive=True) if has_ad_content(_.text)])
_ = div.decompose() if has_ad else None
def update_image_paths(self, soup):
for img in [_ for _ in soup.find_all('img') if 'src' in _.attrs]:
img_src = img['src']
if img_src.startswith('//'):
img_src = 'https:' + img_src
elif img_src.startswith(GOOG_IMG):
# Special rebranding for image search results
if img_src.startswith(LOGO_URL):
img['src'] = '/static/img/logo.png'
img['height'] = 40
else:
img['src'] = BLANK_B64
def fix_question_section(self):
if not self.main_divs:
return
continue
question_divs = [_ for _ in self.main_divs.find_all('div', recursive=False) if len(_.find_all('h2')) > 0]
for question_div in question_divs:
questions = [_ for _ in question_div.find_all('div', recursive=True) if _.text.endswith('?')]
for question in questions:
question['style'] = 'padding: 10px; font-style: italic;'
enc_src = Fernet(self.secret_key).encrypt(img_src.encode())
img['src'] = '/tmp?image_url=' + enc_src.decode()
# TODO: Non-mobile image results link to website instead of image
# if not self.mobile:
# img.append(BeautifulSoup(FULL_RES_IMG.format(img_src), 'html.parser'))
def update_element_src(self, element, mime):
element_src = element['src']
if element_src.startswith('//'):
element_src = 'https:' + element_src
elif element_src.startswith(LOGO_URL):
# Re-brand with Whoogle logo
element['src'] = '/static/img/logo.png'
element['style'] = 'height:40px;width:162px'
return
elif element_src.startswith(GOOG_IMG):
element['src'] = BLANK_B64
return
element['src'] = '/element?url=' + self.encrypt_path(element_src, is_element=True) + \
'&type=' + urlparse.quote(mime)
# TODO: Non-mobile image results link to website instead of image
# if not self.mobile:
# img.append(BeautifulSoup(FULL_RES_IMG.format(element_src), 'html.parser'))
def update_styling(self, soup):
# Remove unnecessary button(s)
@ -114,65 +190,52 @@ class Filter:
# Set up dark mode if active
if self.dark:
soup.find('html')['style'] = 'scrollbar-color: #333 #111;'
soup.find('html')['style'] = 'scrollbar-color: #333 #111;color:#fff !important;background:#000 !important'
for input_element in soup.findAll('input'):
input_element['style'] = 'color:#fff;'
input_element['style'] = 'color:#fff;background:#000;'
def update_links(self, soup):
# Replace hrefs with only the intended destination (no "utm" type tags)
for a in soup.find_all('a', href=True):
href = a['href'].replace('https://www.google.com', '')
if '/advanced_search' in href:
a.decompose()
continue
for span_element in soup.findAll('span'):
span_element['style'] = 'color: white;'
result_link = urlparse.urlparse(href)
query_link = parse_qs(result_link.query)['q'][0] if '?q=' in href else ''
for href_element in soup.findAll('a'):
href_element['style'] = 'color: white' if href_element['href'].startswith('/search') else ''
if '/search?q=' in href:
enc_result = Fernet(self.secret_key).encrypt(query_link.encode())
new_search = '/search?q=' + enc_result.decode()
def update_link(self, link):
# Replace href with only the intended destination (no "utm" type tags)
href = link['href'].replace('https://www.google.com', '')
if '/advanced_search' in href:
link.decompose()
return
elif self.new_tab:
link['target'] = '_blank'
query_params = parse_qs(urlparse.urlparse(href).query)
for param in VALID_PARAMS:
param_val = query_params[param][0] if param in query_params else ''
new_search += '&' + param + '=' + param_val
a['href'] = new_search
elif 'url?q=' in href:
# Strip unneeded arguments
parsed_link = urlparse.urlparse(query_link)
link_args = parse_qs(parsed_link.query)
safe_args = {}
result_link = urlparse.urlparse(href)
query_link = parse_qs(result_link.query)['q'][0] if '?q=' in href else ''
if len(link_args) == 0 and len(parsed_link) > 0:
a['href'] = query_link
continue
if query_link.startswith('/'):
link['href'] = 'https://google.com' + query_link
elif '/search?q=' in href:
new_search = '/search?q=' + self.encrypt_path(query_link)
for arg in link_args.keys():
if arg in SKIP_ARGS:
continue
query_params = parse_qs(urlparse.urlparse(href).query)
for param in VALID_PARAMS:
param_val = query_params[param][0] if param in query_params else ''
new_search += '&' + param + '=' + param_val
link['href'] = new_search
elif 'url?q=' in href:
# Strip unneeded arguments
link['href'] = filter_link_args(query_link)
safe_args[arg] = link_args[arg]
# Remove original link query and replace with filtered args
query_link = query_link.replace(parsed_link.query, '')
if len(safe_args) > 0:
query_link = query_link + urlparse.urlencode(safe_args, doseq=True)
else:
query_link = query_link.replace('?', '')
a['href'] = query_link
# Add no-js option
if self.nojs:
gen_nojs(soup, query_link, a)
else:
a['href'] = href
# Add no-js option
if self.nojs:
gen_nojs(link)
else:
link['href'] = href
def gen_nojs(soup, link, sibling):
nojs_link = soup.new_tag('a')
nojs_link['href'] = '/window?location=' + link
def gen_nojs(sibling):
nojs_link = BeautifulSoup().new_tag('a')
nojs_link['href'] = '/window?location=' + sibling['href']
nojs_link['style'] = 'display:block;width:100%;'
nojs_link.string = 'NoJS Link: ' + nojs_link['href']
sibling.append(BeautifulSoup('<br><hr><br>', 'html.parser'))

0
app/models/__init__.py Normal file
View File

323
app/models/config.py Normal file
View File

@ -0,0 +1,323 @@
class Config:
# Derived from here:
# https://sites.google.com/site/tomihasa/google-language-codes#searchlanguage
LANGUAGES = [
{'name': 'English', 'value': 'lang_en'},
{'name': 'Afrikaans', 'value': 'lang_af'},
{'name': 'Arabic', 'value': 'lang_ar'},
{'name': 'Armenian', 'value': 'lang_hy'},
{'name': 'Belarusian', 'value': 'lang_be'},
{'name': 'Bulgarian', 'value': 'lang_bg'},
{'name': 'Catalan', 'value': 'lang_ca'},
{'name': 'Chinese (Simplified)', 'value': 'lang_zh-CN'},
{'name': 'Chinese (Traditional)', 'value': 'lang_zh-TW'},
{'name': 'Croatian', 'value': 'lang_hr'},
{'name': 'Czech', 'value': 'lang_cs'},
{'name': 'Danish', 'value': 'lang_da'},
{'name': 'Dutch', 'value': 'lang_nl'},
{'name': 'Esperanto', 'value': 'lang_eo'},
{'name': 'Estonian', 'value': 'lang_et'},
{'name': 'Filipino', 'value': 'lang_tl'},
{'name': 'Finnish', 'value': 'lang_fi'},
{'name': 'French', 'value': 'lang_fr'},
{'name': 'German', 'value': 'lang_de'},
{'name': 'Greek', 'value': 'lang_el'},
{'name': 'Hebrew', 'value': 'lang_iw'},
{'name': 'Hindi', 'value': 'lang_hi'},
{'name': 'Hungarian', 'value': 'lang_hu'},
{'name': 'Icelandic', 'value': 'lang_is'},
{'name': 'Indonesian', 'value': 'lang_id'},
{'name': 'Italian', 'value': 'lang_it'},
{'name': 'Japanese', 'value': 'lang_ja'},
{'name': 'Korean', 'value': 'lang_ko'},
{'name': 'Latvian', 'value': 'lang_lv'},
{'name': 'Lithuanian', 'value': 'lang_lt'},
{'name': 'Norwegian', 'value': 'lang_no'},
{'name': 'Persian', 'value': 'lang_fa'},
{'name': 'Polish', 'value': 'lang_pl'},
{'name': 'Portuguese', 'value': 'lang_pt'},
{'name': 'Romanian', 'value': 'lang_ro'},
{'name': 'Russian', 'value': 'lang_ru'},
{'name': 'Serbian', 'value': 'lang_sr'},
{'name': 'Slovak', 'value': 'lang_sk'},
{'name': 'Slovenian', 'value': 'lang_sl'},
{'name': 'Spanish', 'value': 'lang_es'},
{'name': 'Swahili', 'value': 'lang_sw'},
{'name': 'Swedish', 'value': 'lang_sv'},
{'name': 'Thai', 'value': 'lang_th'},
{'name': 'Turkish', 'value': 'lang_tr'},
{'name': 'Ukrainian', 'value': 'lang_uk'},
{'name': 'Vietnamese', 'value': 'lang_vi'},
]
COUNTRIES = [
{'name': 'Default (use server location)', 'value': ''},
{'name': 'Afghanistan', 'value': 'countryAF'},
{'name': 'Albania', 'value': 'countryAL'},
{'name': 'Algeria', 'value': 'countryDZ'},
{'name': 'American Samoa', 'value': 'countryAS'},
{'name': 'Andorra', 'value': 'countryAD'},
{'name': 'Angola', 'value': 'countryAO'},
{'name': 'Anguilla', 'value': 'countryAI'},
{'name': 'Antarctica', 'value': 'countryAQ'},
{'name': 'Antigua and Barbuda', 'value': 'countryAG'},
{'name': 'Argentina', 'value': 'countryAR'},
{'name': 'Armenia', 'value': 'countryAM'},
{'name': 'Aruba', 'value': 'countryAW'},
{'name': 'Australia', 'value': 'countryAU'},
{'name': 'Austria', 'value': 'countryAT'},
{'name': 'Azerbaijan', 'value': 'countryAZ'},
{'name': 'Bahamas', 'value': 'countryBS'},
{'name': 'Bahrain', 'value': 'countryBH'},
{'name': 'Bangladesh', 'value': 'countryBD'},
{'name': 'Barbados', 'value': 'countryBB'},
{'name': 'Belarus', 'value': 'countryBY'},
{'name': 'Belgium', 'value': 'countryBE'},
{'name': 'Belize', 'value': 'countryBZ'},
{'name': 'Benin', 'value': 'countryBJ'},
{'name': 'Bermuda', 'value': 'countryBM'},
{'name': 'Bhutan', 'value': 'countryBT'},
{'name': 'Bolivia', 'value': 'countryBO'},
{'name': 'Bosnia and Herzegovina', 'value': 'countryBA'},
{'name': 'Botswana', 'value': 'countryBW'},
{'name': 'Bouvet Island', 'value': 'countryBV'},
{'name': 'Brazil', 'value': 'countryBR'},
{'name': 'British Indian Ocean Territory', 'value': 'countryIO'},
{'name': 'Brunei Darussalam', 'value': 'countryBN'},
{'name': 'Bulgaria', 'value': 'countryBG'},
{'name': 'Burkina Faso', 'value': 'countryBF'},
{'name': 'Burundi', 'value': 'countryBI'},
{'name': 'Cambodia', 'value': 'countryKH'},
{'name': 'Cameroon', 'value': 'countryCM'},
{'name': 'Canada', 'value': 'countryCA'},
{'name': 'Cape Verde', 'value': 'countryCV'},
{'name': 'Cayman Islands', 'value': 'countryKY'},
{'name': 'Central African Republic', 'value': 'countryCF'},
{'name': 'Chad', 'value': 'countryTD'},
{'name': 'Chile', 'value': 'countryCL'},
{'name': 'China', 'value': 'countryCN'},
{'name': 'Christmas Island', 'value': 'countryCX'},
{'name': 'Cocos (Keeling) Islands', 'value': 'countryCC'},
{'name': 'Colombia', 'value': 'countryCO'},
{'name': 'Comoros', 'value': 'countryKM'},
{'name': 'Congo', 'value': 'countryCG'},
{'name': 'Congo, Democratic Republic of the', 'value': 'countryCD'},
{'name': 'Cook Islands', 'value': 'countryCK'},
{'name': 'Costa Rica', 'value': 'countryCR'},
{'name': 'Cote D\'ivoire', 'value': 'countryCI'},
{'name': 'Croatia (Hrvatska)', 'value': 'countryHR'},
{'name': 'Cuba', 'value': 'countryCU'},
{'name': 'Cyprus', 'value': 'countryCY'},
{'name': 'Czech Republic', 'value': 'countryCZ'},
{'name': 'Denmark', 'value': 'countryDK'},
{'name': 'Djibouti', 'value': 'countryDJ'},
{'name': 'Dominica', 'value': 'countryDM'},
{'name': 'Dominican Republic', 'value': 'countryDO'},
{'name': 'East Timor', 'value': 'countryTP'},
{'name': 'Ecuador', 'value': 'countryEC'},
{'name': 'Egypt', 'value': 'countryEG'},
{'name': 'El Salvador', 'value': 'countrySV'},
{'name': 'Equatorial Guinea', 'value': 'countryGQ'},
{'name': 'Eritrea', 'value': 'countryER'},
{'name': 'Estonia', 'value': 'countryEE'},
{'name': 'Ethiopia', 'value': 'countryET'},
{'name': 'European Union', 'value': 'countryEU'},
{'name': 'Falkland Islands (Malvinas)', 'value': 'countryFK'},
{'name': 'Faroe Islands', 'value': 'countryFO'},
{'name': 'Fiji', 'value': 'countryFJ'},
{'name': 'Finland', 'value': 'countryFI'},
{'name': 'France', 'value': 'countryFR'},
{'name': 'France\, Metropolitan', 'value': 'countryFX'},
{'name': 'French Guiana', 'value': 'countryGF'},
{'name': 'French Polynesia', 'value': 'countryPF'},
{'name': 'French Southern Territories', 'value': 'countryTF'},
{'name': 'Gabon', 'value': 'countryGA'},
{'name': 'Gambia', 'value': 'countryGM'},
{'name': 'Georgia', 'value': 'countryGE'},
{'name': 'Germany', 'value': 'countryDE'},
{'name': 'Ghana', 'value': 'countryGH'},
{'name': 'Gibraltar', 'value': 'countryGI'},
{'name': 'Greece', 'value': 'countryGR'},
{'name': 'Greenland', 'value': 'countryGL'},
{'name': 'Grenada', 'value': 'countryGD'},
{'name': 'Guadeloupe', 'value': 'countryGP'},
{'name': 'Guam', 'value': 'countryGU'},
{'name': 'Guatemala', 'value': 'countryGT'},
{'name': 'Guinea', 'value': 'countryGN'},
{'name': 'Guinea-Bissau', 'value': 'countryGW'},
{'name': 'Guyana', 'value': 'countryGY'},
{'name': 'Haiti', 'value': 'countryHT'},
{'name': 'Heard Island and Mcdonald Islands', 'value': 'countryHM'},
{'name': 'Holy See (Vatican City State)', 'value': 'countryVA'},
{'name': 'Honduras', 'value': 'countryHN'},
{'name': 'Hong Kong', 'value': 'countryHK'},
{'name': 'Hungary', 'value': 'countryHU'},
{'name': 'Iceland', 'value': 'countryIS'},
{'name': 'India', 'value': 'countryIN'},
{'name': 'Indonesia', 'value': 'countryID'},
{'name': 'Iran, Islamic Republic of', 'value': 'countryIR'},
{'name': 'Iraq', 'value': 'countryIQ'},
{'name': 'Ireland', 'value': 'countryIE'},
{'name': 'Israel', 'value': 'countryIL'},
{'name': 'Italy', 'value': 'countryIT'},
{'name': 'Jamaica', 'value': 'countryJM'},
{'name': 'Japan', 'value': 'countryJP'},
{'name': 'Jordan', 'value': 'countryJO'},
{'name': 'Kazakhstan', 'value': 'countryKZ'},
{'name': 'Kenya', 'value': 'countryKE'},
{'name': 'Kiribati', 'value': 'countryKI'},
{'name': 'Korea, Democratic People\'s Republic of', 'value': 'countryKP'},
{'name': 'Korea, Republic of', 'value': 'countryKR'},
{'name': 'Kuwait', 'value': 'countryKW'},
{'name': 'Kyrgyzstan', 'value': 'countryKG'},
{'name': 'Lao People\'s Democratic Republic', 'value': 'countryLA'},
{'name': 'Latvia', 'value': 'countryLV'},
{'name': 'Lebanon', 'value': 'countryLB'},
{'name': 'Lesotho', 'value': 'countryLS'},
{'name': 'Liberia', 'value': 'countryLR'},
{'name': 'Libyan Arab Jamahiriya', 'value': 'countryLY'},
{'name': 'Liechtenstein', 'value': 'countryLI'},
{'name': 'Lithuania', 'value': 'countryLT'},
{'name': 'Luxembourg', 'value': 'countryLU'},
{'name': 'Macao', 'value': 'countryMO'},
{'name': 'Macedonia, the Former Yugosalv Republic of', 'value': 'countryMK'},
{'name': 'Madagascar', 'value': 'countryMG'},
{'name': 'Malawi', 'value': 'countryMW'},
{'name': 'Malaysia', 'value': 'countryMY'},
{'name': 'Maldives', 'value': 'countryMV'},
{'name': 'Mali', 'value': 'countryML'},
{'name': 'Malta', 'value': 'countryMT'},
{'name': 'Marshall Islands', 'value': 'countryMH'},
{'name': 'Martinique', 'value': 'countryMQ'},
{'name': 'Mauritania', 'value': 'countryMR'},
{'name': 'Mauritius', 'value': 'countryMU'},
{'name': 'Mayotte', 'value': 'countryYT'},
{'name': 'Mexico', 'value': 'countryMX'},
{'name': 'Micronesia, Federated States of', 'value': 'countryFM'},
{'name': 'Moldova, Republic of', 'value': 'countryMD'},
{'name': 'Monaco', 'value': 'countryMC'},
{'name': 'Mongolia', 'value': 'countryMN'},
{'name': 'Montserrat', 'value': 'countryMS'},
{'name': 'Morocco', 'value': 'countryMA'},
{'name': 'Mozambique', 'value': 'countryMZ'},
{'name': 'Myanmar', 'value': 'countryMM'},
{'name': 'Namibia', 'value': 'countryNA'},
{'name': 'Nauru', 'value': 'countryNR'},
{'name': 'Nepal', 'value': 'countryNP'},
{'name': 'Netherlands', 'value': 'countryNL'},
{'name': 'Netherlands Antilles', 'value': 'countryAN'},
{'name': 'New Caledonia', 'value': 'countryNC'},
{'name': 'New Zealand', 'value': 'countryNZ'},
{'name': 'Nicaragua', 'value': 'countryNI'},
{'name': 'Niger', 'value': 'countryNE'},
{'name': 'Nigeria', 'value': 'countryNG'},
{'name': 'Niue', 'value': 'countryNU'},
{'name': 'Norfolk Island', 'value': 'countryNF'},
{'name': 'Northern Mariana Islands', 'value': 'countryMP'},
{'name': 'Norway', 'value': 'countryNO'},
{'name': 'Oman', 'value': 'countryOM'},
{'name': 'Pakistan', 'value': 'countryPK'},
{'name': 'Palau', 'value': 'countryPW'},
{'name': 'Palestinian Territory', 'value': 'countryPS'},
{'name': 'Panama', 'value': 'countryPA'},
{'name': 'Papua New Guinea', 'value': 'countryPG'},
{'name': 'Paraguay', 'value': 'countryPY'},
{'name': 'Peru', 'value': 'countryPE'},
{'name': 'Philippines', 'value': 'countryPH'},
{'name': 'Pitcairn', 'value': 'countryPN'},
{'name': 'Poland', 'value': 'countryPL'},
{'name': 'Portugal', 'value': 'countryPT'},
{'name': 'Puerto Rico', 'value': 'countryPR'},
{'name': 'Qatar', 'value': 'countryQA'},
{'name': 'Reunion', 'value': 'countryRE'},
{'name': 'Romania', 'value': 'countryRO'},
{'name': 'Russian Federation', 'value': 'countryRU'},
{'name': 'Rwanda', 'value': 'countryRW'},
{'name': 'Saint Helena', 'value': 'countrySH'},
{'name': 'Saint Kitts and Nevis', 'value': 'countryKN'},
{'name': 'Saint Lucia', 'value': 'countryLC'},
{'name': 'Saint Pierre and Miquelon', 'value': 'countryPM'},
{'name': 'Saint Vincent and the Grenadines', 'value': 'countryVC'},
{'name': 'Samoa', 'value': 'countryWS'},
{'name': 'San Marino', 'value': 'countrySM'},
{'name': 'Sao Tome and Principe', 'value': 'countryST'},
{'name': 'Saudi Arabia', 'value': 'countrySA'},
{'name': 'Senegal', 'value': 'countrySN'},
{'name': 'Serbia and Montenegro', 'value': 'countryCS'},
{'name': 'Seychelles', 'value': 'countrySC'},
{'name': 'Sierra Leone', 'value': 'countrySL'},
{'name': 'Singapore', 'value': 'countrySG'},
{'name': 'Slovakia', 'value': 'countrySK'},
{'name': 'Slovenia', 'value': 'countrySI'},
{'name': 'Solomon Islands', 'value': 'countrySB'},
{'name': 'Somalia', 'value': 'countrySO'},
{'name': 'South Africa', 'value': 'countryZA'},
{'name': 'South Georgia and the South Sandwich Islands', 'value': 'countryGS'},
{'name': 'Spain', 'value': 'countryES'},
{'name': 'Sri Lanka', 'value': 'countryLK'},
{'name': 'Sudan', 'value': 'countrySD'},
{'name': 'Suriname', 'value': 'countrySR'},
{'name': 'Svalbard and Jan Mayen', 'value': 'countrySJ'},
{'name': 'Swaziland', 'value': 'countrySZ'},
{'name': 'Sweden', 'value': 'countrySE'},
{'name': 'Switzerland', 'value': 'countryCH'},
{'name': 'Syrian Arab Republic', 'value': 'countrySY'},
{'name': 'Taiwan, Province of China', 'value': 'countryTW'},
{'name': 'Tajikistan', 'value': 'countryTJ'},
{'name': 'Tanzania, United Republic of', 'value': 'countryTZ'},
{'name': 'Thailand', 'value': 'countryTH'},
{'name': 'Togo', 'value': 'countryTG'},
{'name': 'Tokelau', 'value': 'countryTK'},
{'name': 'Tonga', 'value': 'countryTO'},
{'name': 'Trinidad and Tobago', 'value': 'countryTT'},
{'name': 'Tunisia', 'value': 'countryTN'},
{'name': 'Turkey', 'value': 'countryTR'},
{'name': 'Turkmenistan', 'value': 'countryTM'},
{'name': 'Turks and Caicos Islands', 'value': 'countryTC'},
{'name': 'Tuvalu', 'value': 'countryTV'},
{'name': 'Uganda', 'value': 'countryUG'},
{'name': 'Ukraine', 'value': 'countryUA'},
{'name': 'United Arab Emirates', 'value': 'countryAE'},
{'name': 'United Kingdom', 'value': 'countryUK'},
{'name': 'United States', 'value': 'countryUS'},
{'name': 'United States Minor Outlying Islands', 'value': 'countryUM'},
{'name': 'Uruguay', 'value': 'countryUY'},
{'name': 'Uzbekistan', 'value': 'countryUZ'},
{'name': 'Vanuatu', 'value': 'countryVU'},
{'name': 'Venezuela', 'value': 'countryVE'},
{'name': 'Vietnam', 'value': 'countryVN'},
{'name': 'Virgin Islands, British', 'value': 'countryVG'},
{'name': 'Virgin Islands, U.S.', 'value': 'countryVI'},
{'name': 'Wallis and Futuna', 'value': 'countryWF'},
{'name': 'Western Sahara', 'value': 'countryEH'},
{'name': 'Yemen', 'value': 'countryYE'},
{'name': 'Yugoslavia', 'value': 'countryYU'},
{'name': 'Zambia', 'value': 'countryZM'},
{'name': 'Zimbabwe', 'value': 'countryZW'}
]
def __init__(self, **kwargs):
self.url = ''
self.lang = 'lang_en'
self.ctry = ''
self.safe = False
self.dark = False
self.nojs = False
self.near = ''
self.new_tab = False
self.get_only = False
for key, value in kwargs.items():
setattr(self, key, value)
def __getitem__(self, name):
return getattr(self, name)
def __setitem__(self, name, value):
return setattr(self, name, value)
def __delitem__(self, name):
return delattr(self, name)
def __contains__(self, name):
return hasattr(self, name)

View File

@ -1,38 +1,49 @@
from app import rhyme
from io import BytesIO
import pycurl
from lxml import etree
import random
import requests
from requests import Response
import urllib.parse as urlparse
# Base search url
# Core Google search URLs
SEARCH_URL = 'https://www.google.com/search?gbv=1&q='
AUTOCOMPLETE_URL = 'https://suggestqueries.google.com/complete/search?client=toolbar&'
MOBILE_UA = '{}/5.0 (Android 0; Mobile; rv:54.0) Gecko/54.0 {}/59.0'
DESKTOP_UA = '{}/5.0 (X11; {} x86_64; rv:75.0) Gecko/20100101 {}/75.0'
# Valid query params
VALID_PARAMS = ['tbs', 'tbm', 'start', 'near']
VALID_PARAMS = ['tbs', 'tbm', 'start', 'near', 'source']
def gen_user_agent(normal_ua):
is_mobile = 'Android' in normal_ua or 'iPhone' in normal_ua
mozilla = rhyme.get_rhyme('Mo') + rhyme.get_rhyme('zilla')
firefox = rhyme.get_rhyme('Fire') + rhyme.get_rhyme('fox')
linux = rhyme.get_rhyme('Lin') + 'ux'
def gen_user_agent(is_mobile):
mozilla = random.choice(['Moo', 'Woah', 'Bro', 'Slow']) + 'zilla'
firefox = random.choice(['Choir', 'Squier', 'Higher', 'Wire']) + 'fox'
linux = random.choice(['Win', 'Sin', 'Gin', 'Fin', 'Kin']) + 'ux'
if is_mobile:
return MOBILE_UA.format(mozilla, firefox)
else:
return DESKTOP_UA.format(mozilla, linux, firefox)
return DESKTOP_UA.format(mozilla, linux, firefox)
def gen_query(query, args, near_city=None):
def gen_query(query, args, config, near_city=None):
param_dict = {key: '' for key in VALID_PARAMS}
# Use :past(hour/day/week/month/year) if available
# example search "new restaurants :past month"
if ':past' in query:
sub_lang = ''
if ':past' in query and 'tbs' not in args:
time_range = str.strip(query.split(':past', 1)[-1])
param_dict['tbs'] = '&tbs=qdr:' + str.lower(time_range[0])
param_dict['tbs'] = '&tbs=' + ('qdr:' + str.lower(time_range[0]))
elif 'tbs' in args:
result_tbs = args.get('tbs')
param_dict['tbs'] = '&tbs=' + result_tbs
# Occasionally the 'tbs' param provided by google also contains a field for 'lr', but formatted
# strangely. This is a (admittedly not very elegant) solution for this.
# Ex/ &tbs=qdr:h,lr:lang_1pl --> the lr param needs to be extracted and have the "1" digit removed in this case
sub_lang = [_ for _ in result_tbs.split(',') if 'lr:' in _]
sub_lang = sub_lang[0][sub_lang[0].find('lr:') + 3:len(sub_lang[0])] if len(sub_lang) > 0 else ''
# Ensure search query is parsable
query = urlparse.quote(query)
@ -46,11 +57,23 @@ def gen_query(query, args, near_city=None):
param_dict['start'] = '&start=' + args.get('start')
# Search for results near a particular city, if available
if near_city is not None:
if near_city:
param_dict['near'] = '&near=' + urlparse.quote(near_city)
# Set language for results (lr) if source isn't set, otherwise use the result
# language param provided by google (but with the strange digit(s) removed)
if 'source' in args:
param_dict['source'] = '&source=' + args.get('source')
param_dict['lr'] = ('&lr=' + ''.join([_ for _ in sub_lang if not _.isdigit()])) if sub_lang else ''
else:
param_dict['lr'] = '&lr=' + config.lang
param_dict['cr'] = ('&cr=' + config.ctry) if config.ctry else ''
param_dict['hl'] = '&hl=' + config.lang.replace('lang_', '')
param_dict['safe'] = '&safe=' + ('active' if config.safe else 'off')
for val in param_dict.values():
if not val or val is None:
if not val:
continue
query += val
@ -58,26 +81,27 @@ def gen_query(query, args, near_city=None):
class Request:
def __init__(self, normal_ua):
self.modified_user_agent = gen_user_agent(normal_ua)
def __init__(self, normal_ua, language='lang_en'):
self.language = language
self.mobile = 'Android' in normal_ua or 'iPhone' in normal_ua
self.modified_user_agent = gen_user_agent(self.mobile)
def __getitem__(self, name):
return getattr(self, name)
def send(self, base_url=SEARCH_URL, query='', return_bytes=False):
response_header = []
def autocomplete(self, query):
ac_query = dict(hl=self.language, q=query)
response = self.send(base_url=AUTOCOMPLETE_URL, query=urlparse.urlencode(ac_query)).text
b_obj = BytesIO()
crl = pycurl.Curl()
crl.setopt(crl.URL, base_url + query)
crl.setopt(crl.USERAGENT, self.modified_user_agent)
crl.setopt(crl.WRITEDATA, b_obj)
crl.setopt(crl.HEADERFUNCTION, response_header.append)
crl.setopt(pycurl.FOLLOWLOCATION, 1)
crl.perform()
crl.close()
if response:
dom = etree.fromstring(response)
return dom.xpath('//suggestion/@data')
if return_bytes:
return b_obj.getvalue()
else:
return b_obj.getvalue().decode('unicode-escape', 'ignore')
return []
def send(self, base_url=SEARCH_URL, query='') -> Response:
headers = {
'User-Agent': self.modified_user_agent
}
return requests.get(base_url + query, headers=headers)

View File

@ -1,25 +0,0 @@
import itertools
from Phyme import Phyme
import random
import sys
import time
random.seed(time.time())
ph = Phyme()
def get_rhyme(word):
# Get all rhymes and merge to one list (normally separated by syllable count)
rhymes = ph.get_perfect_rhymes(word)
rhyme_vals = list(itertools.chain.from_iterable(list(rhymes.values())))
# Pick a random rhyme and strip out any non alpha characters
rhymed_word = rhyme_vals[random.randint(0, len(rhyme_vals) - 1)]
rhymed_word = ''.join(letter for letter in rhymed_word if letter.isalpha())
return rhymed_word.capitalize()
if __name__ == '__main__':
print(get_rhyme(sys.argv[1]))

View File

@ -1,91 +1,207 @@
from app import app
from app.filter import Filter
from app.request import Request, gen_query
import argparse
from bs4 import BeautifulSoup
from cryptography.fernet import Fernet, InvalidToken
from flask import g, make_response, request, redirect, render_template, send_file
import base64
import io
import json
import os
import pickle
import urllib.parse as urlparse
import uuid
from functools import wraps
app.config['APP_ROOT'] = os.getenv('APP_ROOT', os.path.dirname(os.path.abspath(__file__)))
app.config['STATIC_FOLDER'] = os.getenv('STATIC_FOLDER', os.path.join(app.config['APP_ROOT'], 'static'))
import waitress
from flask import jsonify, make_response, request, redirect, render_template, send_file, session
from requests import exceptions
CONFIG_PATH = app.config['STATIC_FOLDER'] + '/config.json'
from app import app
from app.models.config import Config
from app.request import Request
from app.utils.misc import valid_user_session
from app.utils.routing_utils import *
def auth_required(f):
@wraps(f)
def decorated(*args, **kwargs):
auth = request.authorization
# Skip if username/password not set
whoogle_user = os.getenv('WHOOGLE_USER', '')
whoogle_pass = os.getenv('WHOOGLE_PASS', '')
if (not whoogle_user or not whoogle_pass) or \
(auth and whoogle_user == auth.username and whoogle_pass == auth.password):
return f(*args, **kwargs)
else:
return make_response('Not logged in', 401, {'WWW-Authenticate': 'Basic realm="Login Required"'})
return decorated
@app.before_request
def before_request_func():
g.user_request = Request(request.headers.get('User-Agent'))
g.user_config = json.load(open(CONFIG_PATH)) if os.path.exists(CONFIG_PATH) else {}
g.request_params = request.args if request.method == 'GET' else request.form
g.cookies_disabled = False
# Generate session values for user if unavailable
if not valid_user_session(session):
session['config'] = json.load(open(app.config['DEFAULT_CONFIG'])) \
if os.path.exists(app.config['DEFAULT_CONFIG']) else {'url': request.url_root}
session['uuid'] = str(uuid.uuid4())
session['fernet_keys'] = generate_user_keys(True)
# Flag cookies as possibly disabled in order to prevent against
# unnecessary session directory expansion
g.cookies_disabled = True
if session['uuid'] not in app.user_elements:
app.user_elements.update({session['uuid']: 0})
# Always redirect to https if HTTPS_ONLY is set (otherwise default to False)
https_only = os.getenv('HTTPS_ONLY', False)
if https_only and request.url.startswith('http://'):
return redirect(request.url.replace('http://', 'https://', 1), code=308)
g.user_config = Config(**session['config'])
if not g.user_config.url:
g.user_config.url = request.url_root.replace('http://', 'https://') if https_only else request.url_root
g.user_request = Request(request.headers.get('User-Agent'), language=g.user_config.lang)
g.app_location = g.user_config.url
@app.after_request
def after_request_func(response):
if app.user_elements[session['uuid']] <= 0 and '/element' in request.url:
# Regenerate element key if all elements have been served to user
session['fernet_keys']['element_key'] = '' if not g.cookies_disabled else app.default_key_set['element_key']
app.user_elements[session['uuid']] = 0
# Check if address consistently has cookies blocked, in which case start removing session
# files after creation.
# Note: This is primarily done to prevent overpopulation of session directories, since browsers that
# block cookies will still trigger Flask's session creation routine with every request.
if g.cookies_disabled and request.remote_addr not in app.no_cookie_ips:
app.no_cookie_ips.append(request.remote_addr)
elif g.cookies_disabled and request.remote_addr in app.no_cookie_ips:
session_list = list(session.keys())
for key in session_list:
session.pop(key)
return response
@app.errorhandler(404)
def unknown_page(e):
return redirect('/')
return redirect(g.app_location)
@app.route('/', methods=['GET'])
@auth_required
def index():
bg = '#000' if 'dark' in g.user_config and g.user_config['dark'] else '#fff'
return render_template('index.html', bg=bg, ua=g.user_request.modified_user_agent)
# Reset keys
session['fernet_keys'] = generate_user_keys(g.cookies_disabled)
return render_template('index.html',
languages=Config.LANGUAGES,
countries=Config.COUNTRIES,
config=g.user_config,
version_number=app.config['VERSION_NUMBER'])
@app.route('/opensearch.xml', methods=['GET'])
@auth_required
def opensearch():
url_root = request.url_root
if url_root.endswith('/'):
url_root = url_root[:-1]
opensearch_url = g.app_location
if opensearch_url.endswith('/'):
opensearch_url = opensearch_url[:-1]
template = render_template('opensearch.xml', main_url=url_root)
template = render_template('opensearch.xml',
main_url=opensearch_url,
request_type='get' if g.user_config.get_only else 'post')
response = make_response(template)
response.headers['Content-Type'] = 'application/xml'
return response
@app.route('/autocomplete', methods=['GET', 'POST'])
def autocomplete():
q = g.request_params.get('q')
if not q and not request.data:
return jsonify({'?': []})
elif request.data:
q = urlparse.unquote_plus(request.data.decode('utf-8').replace('q=', ''))
return jsonify([q, g.user_request.autocomplete(q)])
@app.route('/search', methods=['GET', 'POST'])
@auth_required
def search():
request_params = request.args if request.method == 'GET' else request.form
q = request_params.get('q')
# Reset element counter
app.user_elements[session['uuid']] = 0
if q is None or len(q) == 0:
search_util = RoutingUtils(request, g.user_config, session, cookies_disabled=g.cookies_disabled)
query = search_util.new_search_query()
# Redirect to home if invalid/blank search
if not query:
return redirect('/')
else:
# Attempt to decrypt if this is an internal link
try:
q = Fernet(app.secret_key).decrypt(q.encode()).decode()
except InvalidToken:
pass
user_agent = request.headers.get('User-Agent')
mobile = 'Android' in user_agent or 'iPhone' in user_agent
# Generate response and number of external elements from the page
response, elements = search_util.generate_response()
if search_util.feeling_lucky:
return redirect(response, code=303)
content_filter = Filter(mobile, g.user_config, secret_key=app.secret_key)
full_query = gen_query(q, request_params, content_filter.near)
get_body = g.user_request.send(query=full_query)
# Keep count of external elements to fetch before element key can be regenerated
app.user_elements[session['uuid']] = elements
results = content_filter.reskin(get_body)
formatted_results = content_filter.clean(BeautifulSoup(results, 'html.parser'))
return render_template('display.html', query=urlparse.unquote(q), response=formatted_results)
return render_template(
'display.html',
query=urlparse.unquote(query),
search_type=search_util.search_type,
dark_mode=g.user_config.dark,
response=response,
version_number=app.config['VERSION_NUMBER'],
search_header=render_template(
'header.html',
dark_mode=g.user_config.dark,
query=urlparse.unquote(query),
search_type=search_util.search_type,
mobile=g.user_request.mobile) if 'isch' not in search_util.search_type else '')
@app.route('/config', methods=['GET', 'POST'])
@app.route('/config', methods=['GET', 'POST', 'PUT'])
@auth_required
def config():
if request.method == 'GET':
return json.dumps(g.user_config)
return json.dumps(g.user_config.__dict__)
elif request.method == 'PUT':
if 'name' in request.args:
config_pkl = os.path.join(app.config['CONFIG_PATH'], request.args.get('name'))
session['config'] = pickle.load(open(config_pkl, 'rb')) if os.path.exists(config_pkl) else session['config']
return json.dumps(session['config'])
else:
return json.dumps({})
else:
config_data = request.form.to_dict()
with open(app.config['STATIC_FOLDER'] + '/config.json', 'w') as config_file:
config_file.write(json.dumps(config_data, indent=4))
config_file.close()
if 'url' not in config_data or not config_data['url']:
config_data['url'] = g.user_config.url
return redirect('/')
# Save config by name to allow a user to easily load later
if 'name' in request.args:
pickle.dump(config_data, open(os.path.join(app.config['CONFIG_PATH'], request.args.get('name')), 'wb'))
# Overwrite default config if user has cookies disabled
if g.cookies_disabled:
open(app.config['DEFAULT_CONFIG'], 'w').write(json.dumps(config_data, indent=4))
session['config'] = config_data
return redirect(config_data['url'])
@app.route('/url', methods=['GET'])
@auth_required
def url():
if 'url' in request.args:
return redirect(request.args.get('url'))
@ -98,30 +214,37 @@ def url():
@app.route('/imgres')
@auth_required
def imgres():
return redirect(request.args.get('imgurl'))
@app.route('/tmp')
def tmp():
cipher_suite = Fernet(app.secret_key)
img_url = cipher_suite.decrypt(request.args.get('image_url').encode()).decode()
file_data = g.user_request.send(base_url=img_url, return_bytes=True)
tmp_mem = io.BytesIO()
tmp_mem.write(file_data)
tmp_mem.seek(0)
@app.route('/element')
@auth_required
def element():
cipher_suite = Fernet(session['fernet_keys']['element_key'])
src_url = cipher_suite.decrypt(request.args.get('url').encode()).decode()
src_type = request.args.get('type')
return send_file(
tmp_mem,
as_attachment=True,
attachment_filename='tmp.png',
mimetype='image/png'
)
try:
file_data = g.user_request.send(base_url=src_url).content
app.user_elements[session['uuid']] -= 1
tmp_mem = io.BytesIO()
tmp_mem.write(file_data)
tmp_mem.seek(0)
return send_file(tmp_mem, mimetype=src_type)
except exceptions.RequestException:
pass
empty_gif = base64.b64decode('R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==')
return send_file(io.BytesIO(empty_gif), mimetype='image/gif')
@app.route('/window')
@auth_required
def window():
get_body = g.user_request.send(base_url=request.args.get('location'))
get_body = g.user_request.send(base_url=request.args.get('location')).text
get_body = get_body.replace('src="/', 'src="' + request.args.get('location') + '"')
get_body = get_body.replace('href="/', 'href="' + request.args.get('location') + '"')
@ -138,12 +261,26 @@ def window():
def run_app():
parser = argparse.ArgumentParser(description='Whoogle Search console runner')
parser.add_argument('--port', default=8888, metavar='<port number>',
help='Specifies a port to run on (default 8888)')
parser.add_argument('--port', default=5000, metavar='<port number>',
help='Specifies a port to run on (default 5000)')
parser.add_argument('--host', default='127.0.0.1', metavar='<ip address>',
help='Specifies the host address to use (default 127.0.0.1)')
parser.add_argument('--debug', default=False, action='store_true',
help='Activates debug mode for the Flask server (default False)')
help='Activates debug mode for the server (default False)')
parser.add_argument('--https-only', default=False, action='store_true',
help='Enforces HTTPS redirects for all requests')
parser.add_argument('--userpass', default='', metavar='<username:password>',
help='Sets a username/password basic auth combo (default None)')
args = parser.parse_args()
app.run(host=args.host, port=args.port, debug=args.debug)
if args.userpass:
user_pass = args.userpass.split(':')
os.environ['WHOOGLE_USER'] = user_pass[0]
os.environ['WHOOGLE_PASS'] = user_pass[1]
os.environ['HTTPS_ONLY'] = '1' if args.https_only else ''
if args.debug:
app.run(host=args.host, port=args.port, debug=args.debug)
else:
waitress.serve(app, listen="{}:{}".format(args.host, args.port))

55
app/static/css/header.css Normal file
View File

@ -0,0 +1,55 @@
header {
font-family: Roboto,HelveticaNeue,Arial,sans-serif;
font-size: 14px;
line-height: 20px;
color: #3C4043;
word-wrap: break-word;
}
.logo-link, .logo-letter {
text-decoration: none !important;
letter-spacing: -1px;
text-align: center;
border-radius: 2px 0 0 0;
}
.mobile-logo {
font: 22px/36px Futura, Arial, sans-serif;
padding-left: 5px;
}
.logo-div {
letter-spacing: -1px;
text-align: center;
font: 22pt Futura, Arial, sans-serif;
padding: 10px 0 5px 0;
height: 37px;
font-smoothing: antialiased;
}
.search-div {
border-radius: 8px 8px 0 0;
box-shadow: 0 1px 6px rgba(32, 33, 36, 0.18);
margin-top: 10px;
}
.search-form {
height: 39px;
display: flex;
width: 100%;
}
.search-input {
background: none;
margin: 2px 4px 2px 8px;
display: block;
font-size: 16px;
padding: 0 0 0 8px;
flex: 1;
height: 35px;
outline: none;
border: none;
width: 100%;
-webkit-tap-highlight-color: rgba(0,0,0,0);
overflow: hidden;
}

View File

@ -1,3 +1,7 @@
body {
font-family: Avenir, Helvetica, Arial, sans-serif;
}
.logo {
width: 80%;
display: block;
@ -113,3 +117,15 @@ button::-moz-focus-inner {
-webkit-box-decoration-break: clone;
box-decoration-break: clone;
}
.hidden {
display: none;
}
footer {
position: fixed;
bottom: 0%;
text-align: center;
width: 100%;
z-index: -1;
}

View File

@ -0,0 +1,35 @@
.autocomplete {
position: relative;
display: inline-block;
width: 100%;
}
.autocomplete-items {
position: absolute;
border: 1px solid #685e79;
border-bottom: none;
border-top: none;
z-index: 99;
/*position the autocomplete items to be the same width as the container:*/
top: 100%;
left: 0;
right: 0;
}
.autocomplete-items div {
padding: 10px;
cursor: pointer;
color: #fff;
background-color: #000;
border-bottom: 1px solid #242424;
}
.autocomplete-items div:hover {
background-color: #404040;
}
.autocomplete-active {
background-color: #685e79 !important;
color: #ffffff;
}

34
app/static/css/search.css Normal file
View File

@ -0,0 +1,34 @@
.autocomplete {
position: relative;
display: inline-block;
width: 100%;
}
.autocomplete-items {
position: absolute;
border: 1px solid #d4d4d4;
border-bottom: none;
border-top: none;
z-index: 99;
/*position the autocomplete items to be the same width as the container:*/
top: 100%;
left: 0;
right: 0;
}
.autocomplete-items div {
padding: 10px;
cursor: pointer;
background-color: #fff;
border-bottom: 1px solid #d4d4d4;
}
.autocomplete-items div:hover {
background-color: #e9e9e9;
}
.autocomplete-active {
background-color: #685e79 !important;
color: #ffffff;
}

View File

@ -0,0 +1,98 @@
const handleUserInput = searchBar => {
let xhrRequest = new XMLHttpRequest();
xhrRequest.open("POST", "/autocomplete");
xhrRequest.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
xhrRequest.onload = function() {
if (xhrRequest.readyState === 4 && xhrRequest.status !== 200) {
// Do nothing if failed to fetch autocomplete results
return;
}
// Fill autocomplete with fetched results
let autocompleteResults = JSON.parse(xhrRequest.responseText);
autocomplete(searchBar, autocompleteResults[1]);
};
xhrRequest.send('q=' + searchBar.value);
};
const autocomplete = (searchInput, autocompleteResults) => {
let currentFocus;
searchInput.addEventListener("input", function () {
let autocompleteList, autocompleteItem, i, val = this.value;
closeAllLists();
if (!val || !autocompleteResults) {
return false;
}
currentFocus = -1;
autocompleteList = document.createElement("div");
autocompleteList.setAttribute("id", this.id + "-autocomplete-list");
autocompleteList.setAttribute("class", "autocomplete-items");
this.parentNode.appendChild(autocompleteList);
for (i = 0; i < autocompleteResults.length; i++) {
if (autocompleteResults[i].substr(0, val.length).toUpperCase() === val.toUpperCase()) {
autocompleteItem = document.createElement("div");
autocompleteItem.innerHTML = "<strong>" + autocompleteResults[i].substr(0, val.length) + "</strong>";
autocompleteItem.innerHTML += autocompleteResults[i].substr(val.length);
autocompleteItem.innerHTML += "<input type=\"hidden\" value=\"" + autocompleteResults[i] + "\">";
autocompleteItem.addEventListener("click", function () {
searchInput.value = this.getElementsByTagName("input")[0].value;
closeAllLists();
document.getElementById("search-form").submit();
});
autocompleteList.appendChild(autocompleteItem);
}
}
});
searchInput.addEventListener("keydown", function (e) {
let suggestion = document.getElementById(this.id + "-autocomplete-list");
if (suggestion) suggestion = suggestion.getElementsByTagName("div");
if (e.keyCode === 40) { // down
currentFocus++;
addActive(suggestion);
} else if (e.keyCode === 38) { //up
currentFocus--;
addActive(suggestion);
} else if (e.keyCode === 13) { // enter
e.preventDefault();
if (currentFocus > -1) {
if (suggestion) suggestion[currentFocus].click();
}
}
});
const addActive = suggestion => {
if (!suggestion || !suggestion[currentFocus]) return false;
removeActive(suggestion);
if (currentFocus >= suggestion.length) currentFocus = 0;
if (currentFocus < 0) currentFocus = (suggestion.length - 1);
suggestion[currentFocus].classList.add("autocomplete-active");
};
const removeActive = suggestion => {
for (let i = 0; i < suggestion.length; i++) {
suggestion[i].classList.remove("autocomplete-active");
}
};
const closeAllLists = el => {
let suggestions = document.getElementsByClassName("autocomplete-items");
for (let i = 0; i < suggestions.length; i++) {
if (el !== suggestions[i] && el !== searchInput) {
suggestions[i].parentNode.removeChild(suggestions[i]);
}
}
};
// Close lists and search when user selects a suggestion
document.addEventListener("click", function (e) {
closeAllLists(e.target);
});
};

View File

@ -11,11 +11,22 @@ const setupSearchLayout = () => {
if (event.keyCode === 13) {
event.preventDefault();
searchBtn.click();
} else {
handleUserInput(searchBar);
}
});
}
};
const fillConfigValues = () => {
// Establish all config value elements
const near = document.getElementById("config-near");
const noJS = document.getElementById("config-nojs");
const dark = document.getElementById("config-dark");
const safe = document.getElementById("config-safe");
const url = document.getElementById("config-url");
const newTab = document.getElementById("config-new-tab");
const getOnly = document.getElementById("config-get-only");
const fillConfigValues = (near, nojs, dark) => {
// Request existing config info
let xhrGET = new XMLHttpRequest();
xhrGET.open("GET", "/config");
@ -29,23 +40,18 @@ const fillConfigValues = (near, nojs, dark) => {
let configSettings = JSON.parse(xhrGET.responseText);
near.value = configSettings["near"] ? configSettings["near"] : "";
near.addEventListener("keyup", function() {
configSettings["near"] = near.value;
});
nojs.checked = !!configSettings["nojs"];
nojs.addEventListener("change", function() {
configSettings["nojs"] = nojs.checked ? 1 : 0;
});
noJS.checked = !!configSettings["nojs"];
dark.checked = !!configSettings["dark"];
dark.addEventListener("change", function() {
configSettings["dark"] = dark.checked ? 1 : 0;
});
safe.checked = !!configSettings["safe"];
getOnly.checked = !!configSettings["get_only"];
newTab.checked = !!configSettings["new_tab"];
// Addresses the issue of incorrect URL being used behind reverse proxy
url.value = configSettings["url"] ? configSettings["url"] : "";
};
xhrGET.send();
}
};
const setupConfigLayout = () => {
// Setup whoogle config
@ -62,12 +68,43 @@ const setupConfigLayout = () => {
content.classList.toggle("open");
});
const near = document.getElementById("config-near");
const noJS = document.getElementById("config-nojs");
const dark = document.getElementById("config-dark");
fillConfigValues();
};
fillConfigValues(near, noJS, dark);
}
const loadConfig = event => {
event.preventDefault();
let config = prompt("Enter name of config:");
if (!config) {
alert("Must specify a name for the config to load");
return;
}
let xhrPUT = new XMLHttpRequest();
xhrPUT.open("PUT", "/config?name=" + config + ".conf");
xhrPUT.onload = function() {
if (xhrPUT.readyState === 4 && xhrPUT.status !== 200) {
alert("Error loading Whoogle config");
return;
}
location.reload(true);
};
xhrPUT.send();
};
const saveConfig = event => {
event.preventDefault();
let config = prompt("Enter name for this config:");
if (!config) {
alert("Must specify a name for the config to save");
return;
}
let configForm = document.getElementById("config-form");
configForm.action = '/config?name=' + config + ".conf";
configForm.submit();
};
document.addEventListener("DOMContentLoaded", function() {
setTimeout(function() {

View File

@ -5,9 +5,19 @@
<link rel="search" href="/opensearch.xml" type="application/opensearchdescription+xml" title="Whoogle Search">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="referrer" content="no-referrer">
<script type="text/javascript" src="/static/js/autocomplete.js"></script>
<link rel="stylesheet" href="/static/css/{{ 'search-dark' if dark_mode else 'search' }}.css">
<link rel="stylesheet" href="/static/css/header.css">
<title>{{ query }} - Whoogle Search</title>
</head>
<body>
{{ response|safe }}
{{ search_header|safe }}
{{ response|safe }}
</body>
<footer>
<p style="color: {{ '#fff' if dark_mode else '#000' }};">
Whoogle Search v{{ version_number }} ||
<a style="color: #685e79" href="https://github.com/benbusby/whoogle-search">View on GitHub</a>
</p>
</footer>
</html>

63
app/templates/header.html Normal file
View File

@ -0,0 +1,63 @@
{% if mobile %}
<header>
<div class="bz1lBb">
<form class="Pg70bf" id="search-form" method="POST">
<a class="logo-link mobile-logo"
href="/"
style="display:flex; justify-content:center; align-items:center; color:#685e79; font-size:18px; ">
<span class="V6gwVd">Wh</span><span class="iWkuvd">o</span><span class="cDrQ7">o</span><span
class="V6gwVd">g</span><span class="ntlR9">l</span><span
class="iWkuvd tJ3Myc">e</span>
</a>
<div class="H0PQec" style="width: 100%;">
<div class="sbc esbc autocomplete">
<input id="search-bar" autocapitalize="none" autocomplete="off" class="noHIxc" name="q"
style="background-color: {{ '#000' if dark_mode else '#fff' }};
color: {{ '#685e79' if dark_mode else '#000' }};
border: {{ '1px solid #685e79' if dark_mode else '' }}"
spellcheck="false" type="text" value="{{ query }}">
<input name="tbm" value="{{ search_type }}" style="display: none">
<div class="sc"></div>
</div>
</div>
</form>
</div>
</header>
{% else %}
<header>
<div class="logo-div">
<a class="logo-link" href="/">
<span class="V6gwVd logo-letter">Wh</span><span class="iWkuvd logo-letter">o</span><span
class="cDrQ7 logo-letter">o</span><span class="V6gwVd logo-letter">g</span><span
class="ntlR9 logo-letter">l</span><span class="iWkuvd tJ3Myc logo-letter">e</span>
</a>
</div>
<div class="search-div">
<form id="search-form" class="search-form" id="sf" method="POST">
<div class="autocomplete" style="width: 100%; flex: 1">
<div style="width: 100%; display: flex">
<input id="search-bar" autocapitalize="none" autocomplete="off" class="noHIxc" name="q"
spellcheck="false" type="text" value="{{ query }}"
style="background-color: {{ '#000' if dark_mode else '#fff' }};
color: {{ '#685e79' if dark_mode else '#000' }};
border: {{ '1px solid #685e79' if dark_mode else '' }}">
<input name="tbm" value="{{ search_type }}" style="display: none">
<div class="sc"></div>
</div>
</div>
</form>
</div>
</header>
{% endif %}
<script>
const searchBar = document.getElementById("search-bar");
searchBar.addEventListener("keyup", function (event) {
if (event.keyCode !== 13) {
handleUserInput(searchBar);
} else {
document.getElementById("search-form").submit();
}
});
</script>

View File

@ -17,18 +17,22 @@
<meta name="referrer" content="no-referrer">
<meta name="msapplication-TileColor" content="#ffffff">
<meta name="msapplication-TileImage" content="/static/img/favicon/ms-icon-144x144.png">
<script type="text/javascript" src="/static/js/autocomplete.js"></script>
<script type="text/javascript" src="/static/js/controller.js"></script>
<link rel="search" href="/opensearch.xml" type="application/opensearchdescription+xml" title="Whoogle Search">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="/static/css/{{ 'search-dark' if config.dark else 'search' }}.css">
<link rel="stylesheet" href="/static/css/main.css">
<title>Whoogle Search</title>
</head>
<body id="main" style="display: none; background-color: {{ bg }}">
<body id="main" style="display: none; background-color: {{ '#000' if config.dark else '#fff' }}">
<div class="search-container">
<img class="logo" src="/static/img/logo.png">
<form action="/search" method="post">
<form id="search-form" action="/search" method="{{ 'get' if config.get_only else 'post' }}">
<div class="search-fields">
<input type="text" name="q" id="search-bar">
<div class="autocomplete">
<input type="text" name="q" id="search-bar" autofocus="autofocus">
</div>
<input type="submit" id="search-submit" value="Search">
</div>
</form>
@ -36,10 +40,32 @@
<button id="config-collapsible" class="collapsible">Configuration</button>
<div class="content">
<div class="config-fields">
<form action="/config" method="post">
<form id="config-form" action="/config" method="post">
<div class="config-div">
<!-- TODO: Add option to regenerate user agent? -->
<span class="ua-span">User Agent: {{ ua }}</span>
<label for="config-ctry">Country: </label>
<select name="ctry" id="config-ctry">
{% for ctry in countries %}
<option value="{{ ctry.value }}"
{% if ctry.value in config.ctry %}
selected
{% endif %}>
{{ ctry.name }}
</option>
{% endfor %}
</select>
</div>
<div class="config-div">
<label for="config-lang">Language: </label>
<select name="lang" id="config-lang">
{% for lang in languages %}
<option value="{{ lang.value }}"
{% if lang.value in config.lang %}
selected
{% endif %}>
{{ lang.name }}
</option>
{% endfor %}
</select>
</div>
<div class="config-div">
<label for="config-near">Near: </label>
@ -54,12 +80,35 @@
<input type="checkbox" name="dark" id="config-dark">
</div>
<div class="config-div">
<input type="submit" id="config-submit" value="Save">
<label for="config-safe">Safe Search: </label>
<input type="checkbox" name="safe" id="config-safe">
</div>
<div class="config-div">
<label for="config-new-tab">Open Links in New Tab: </label>
<input type="checkbox" name="new_tab" id="config-new-tab">
</div>
<div class="config-div">
<label for="config-get-only">GET Requests Only: </label>
<input type="checkbox" name="get_only" id="config-get-only">
</div>
<div class="config-div">
<label for="config-url">Root URL: </label>
<input type="text" name="url" id="config-url" value="">
</div>
<div class="config-div">
<input type="submit" id="config-load" onclick="loadConfig(event)" value="Load">&nbsp;
<input type="submit" id="config-submit" value="Apply">&nbsp;
<input type="submit" id="config-submit" onclick="saveConfig(event)" value="Save As...">
</div>
</form>
</div>
</div>
</div>
<footer>
<p style="color: {{ '#fff' if config.dark else '#000' }};">
Whoogle Search v{{ version_number }} ||
<a style="color: #685e79" href="https://github.com/benbusby/whoogle-search">View on GitHub</a>
</p>
</footer>
</body>
</html>

View File

@ -4,10 +4,12 @@
<Description>Whoogle: A lightweight, deployable Google search proxy for desktop/mobile that removes Javascript, AMP links, and ads</Description>
<InputEncoding>UTF-8</InputEncoding>
<Image width="32" height="32" type="image/x-icon">/static/img/favicon/favicon-32x32.png</Image>
<Url type="text/html" method="post" template="{{ main_url }}/search">
<Url type="text/html" method="{{ request_type }}" template="{{ main_url }}/search">
<Param name="q" value="{searchTerms}"/>
</Url>
<Url type="application/x-suggestions+json" method="{{ request_type }}" template="{{ main_url }}/autocomplete">
<Param name="q" value="{searchTerms}"/>
</Url>
<Url type="application/x-suggestions+json" template="{{ main_url }}/search"/>
<moz:SearchForm>{{ main_url }}/search</moz:SearchForm>
</OpenSearchDescription>

0
app/utils/__init__.py Normal file
View File

29
app/utils/misc.py Normal file
View File

@ -0,0 +1,29 @@
from cryptography.fernet import Fernet
from flask import current_app as app
REQUIRED_SESSION_VALUES = ['uuid', 'config', 'fernet_keys']
BLACKLIST = [
'ad', 'anuncio', 'annuncio', 'annonce', 'Anzeige', '广告', '廣告', 'Reklama', 'Реклама', 'Anunț', '광고',
'annons', 'Annonse', 'Iklan', '広告', 'Augl.', 'Mainos', 'Advertentie', 'إعلان', 'Գովազդ', 'विज्ञापन', 'Reklam',
'آگهی', 'Reklāma', 'Reklaam', 'Διαφήμιση', 'מודעה', 'Hirdetés'
]
def generate_user_keys(cookies_disabled=False) -> dict:
if cookies_disabled:
return app.default_key_set
# Generate/regenerate unique key per user
return {
'element_key': Fernet.generate_key(),
'text_key': Fernet.generate_key()
}
def valid_user_session(session):
# Generate secret key for user if unavailable
for value in REQUIRED_SESSION_VALUES:
if value not in session:
return False
return True

View File

@ -0,0 +1,72 @@
from app.filter import Filter, get_first_link
from app.utils.misc import generate_user_keys
from app.request import gen_query
from bs4 import BeautifulSoup
from cryptography.fernet import Fernet, InvalidToken
from flask import g
from typing import Any, Tuple
class RoutingUtils:
def __init__(self, request, config, session, cookies_disabled=False):
self.request_params = request.args if request.method == 'GET' else request.form
self.user_agent = request.headers.get('User-Agent')
self.feeling_lucky = False
self.config = config
self.session = session
self.query = ''
self.cookies_disabled = cookies_disabled
self.search_type = self.request_params.get('tbm') if 'tbm' in self.request_params else ''
def __getitem__(self, name):
return getattr(self, name)
def __setitem__(self, name, value):
return setattr(self, name, value)
def __delitem__(self, name):
return delattr(self, name)
def __contains__(self, name):
return hasattr(self, name)
def new_search_query(self) -> str:
# Generate a new element key each time a new search is performed
self.session['fernet_keys']['element_key'] = generate_user_keys(
cookies_disabled=self.cookies_disabled)['element_key']
q = self.request_params.get('q')
if q is None or len(q) == 0:
return ''
else:
# Attempt to decrypt if this is an internal link
try:
q = Fernet(self.session['fernet_keys']['text_key']).decrypt(q.encode()).decode()
except InvalidToken:
pass
# Reset text key
self.session['fernet_keys']['text_key'] = generate_user_keys(
cookies_disabled=self.cookies_disabled)['text_key']
# Format depending on whether or not the query is a "feeling lucky" query
self.feeling_lucky = q.startswith('! ')
self.query = q[2:] if self.feeling_lucky else q
return self.query
def generate_response(self) -> Tuple[Any, int]:
mobile = 'Android' in self.user_agent or 'iPhone' in self.user_agent
content_filter = Filter(self.session['fernet_keys'], mobile=mobile, config=self.config)
full_query = gen_query(self.query, self.request_params, self.config, content_filter.near)
get_body = g.user_request.send(query=full_query).text
# Produce cleanable html soup from response
html_soup = BeautifulSoup(content_filter.reskin(get_body), 'html.parser')
if self.feeling_lucky:
return get_first_link(html_soup), 1
else:
formatted_results = content_filter.clean(html_soup)
return formatted_results, content_filter.elements

9
docker-compose.yml Normal file
View File

@ -0,0 +1,9 @@
version: "3"
services:
whoogle-search:
image: benbusby/whoogle-search
container_name: whoogle-search
ports:
- 5000:5000
restart: unless-stopped

View File

@ -4,15 +4,17 @@ cffi==1.13.2
Click==7.0
cryptography==2.8
Flask==1.1.1
Flask-Session==0.3.2
itsdangerous==1.1.0
Jinja2==2.10.3
lxml==4.5.1
MarkupSafe==1.1.1
Phyme==0.0.9
pycparser==2.19
pycurl==7.43.0.4
pyOpenSSL==19.1.0
pytest==5.4.1
python-dateutil==2.8.1
requests==2.23.0
six==1.14.0
soupsieve==1.9.5
Werkzeug==0.16.0
waitress==1.4.3

24
run Executable file
View File

@ -0,0 +1,24 @@
#!/bin/bash
# Usage:
# ./run # Runs the full web app
# ./run test # Runs the testing suite
set -euo pipefail
SCRIPT_DIR="$(builtin cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
# Set directory to serve static content from
SUBDIR="${1:-app}"
export APP_ROOT="$SCRIPT_DIR/$SUBDIR"
export STATIC_FOLDER="$APP_ROOT/static"
mkdir -p "$STATIC_FOLDER"
# Check for regular vs test run
if [[ "$SUBDIR" == "test" ]]; then
pytest -sv
else
python3 -um app \
--host "${ADDRESS:-0.0.0.0}" \
--port "${PORT:-"${EXPOSE_PORT:-5000}"}"
fi

View File

@ -8,11 +8,10 @@ setuptools.setup(
author='Ben Busby',
author_email='benbusby@protonmail.com',
name='whoogle-search',
version='0.1.0',
scripts=['whoogle-search'],
version='0.2.0',
include_package_data=True,
install_requires=requirements,
description='Self-hosted, ad-free, privacy-respecting alternative to Google search',
description='Self-hosted, ad-free, privacy-respecting Google metasearch engine',
long_description=long_description,
long_description_content_type='text/markdown',
url='https://github.com/benbusby/whoogle-search',

View File

@ -1,8 +1,13 @@
from app import app
from app.utils.misc import generate_user_keys
import pytest
@pytest.fixture
def client():
client = app.test_client()
yield client
with app.test_client() as client:
with client.session_transaction() as session:
session['uuid'] = 'test'
session['fernet_keys'] = generate_user_keys()
session['config'] = {}
yield client

12
test/test_autocomplete.py Normal file
View File

@ -0,0 +1,12 @@
def test_autocomplete_get(client):
rv = client.get('/autocomplete?q=green+eggs+and')
assert rv._status_code == 200
assert len(rv.data) >= 1
assert b'green eggs and ham' in rv.data
def test_autocomplete_post(client):
rv = client.post('/autocomplete', data=dict(q='the+cat+in+the'))
assert rv._status_code == 200
assert len(rv.data) >= 1
assert b'the cat in the hat' in rv.data

33
test/test_misc.py Normal file
View File

@ -0,0 +1,33 @@
from app.utils.misc import generate_user_keys, valid_user_session
def test_generate_user_keys():
keys = generate_user_keys()
assert 'text_key' in keys
assert 'element_key' in keys
assert keys['text_key'] not in keys['element_key']
def test_valid_session(client):
assert not valid_user_session({'fernet_keys': '', 'config': {}})
with client.session_transaction() as session:
assert valid_user_session(session)
def test_request_key_generation(client):
rv = client.get('/')
cookie = rv.headers['Set-Cookie']
rv = client.get('/search?q=test+1', headers={'Cookie': cookie})
assert rv._status_code == 200
with client.session_transaction() as session:
assert valid_user_session(session)
text_key = session['fernet_keys']['text_key']
rv = client.get('/search?q=test+2', headers={'Cookie': cookie})
assert rv._status_code == 200
with client.session_transaction() as session:
assert valid_user_session(session)
assert text_key not in session['fernet_keys']['text_key']

View File

@ -1,13 +1,13 @@
from bs4 import BeautifulSoup
from cryptography.fernet import Fernet
from app.filter import Filter
from app.utils.misc import generate_user_keys
from datetime import datetime
from dateutil.parser import *
def get_search_results(data):
secret_key = Fernet.generate_key()
soup = Filter(secret_key=secret_key).clean(BeautifulSoup(data, 'html.parser'))
secret_key = generate_user_keys()
soup = Filter(user_keys=secret_key).clean(BeautifulSoup(data, 'html.parser'))
main_divs = soup.find('div', {'id': 'main'})
assert len(main_divs) > 1
@ -62,6 +62,6 @@ def test_recent_results(client):
try:
date = parse(date_span)
assert (current_date - date).days <= num_days
assert (current_date - date).days <= (num_days + 5) # Date can have a little bit of wiggle room
except ParserError:
assert ' ago' in date_span
pass

View File

@ -1,10 +1,13 @@
from app.models.config import Config
import json
import random
demo_config = {
'near': random.choice(['Seattle', 'New York', 'San Francisco']),
'dark_mode': str(random.getrandbits(1)),
'nojs': str(random.getrandbits(1))
'nojs': str(random.getrandbits(1)),
'lang': random.choice(Config.LANGUAGES)['value'],
'ctry': random.choice(Config.COUNTRIES)['value']
}
@ -18,6 +21,11 @@ def test_search(client):
assert rv._status_code == 200
def test_feeling_lucky(client):
rv = client.get('/search?q=!%20test')
assert rv._status_code == 303
def test_config(client):
rv = client.post('/config', data=demo_config)
assert rv._status_code == 302

View File

@ -1,27 +0,0 @@
#!/bin/bash
# Usage:
# ./whoogle-search # Runs the full web app
# ./whoogle-search test # Runs the testing suite
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd -P)"
# Set default port if unavailable
if [[ -z "${PORT}" ]]; then
PORT=5000
fi
# Set directory to serve static content from
[[ ! -z $1 ]] && SUBDIR="$1" || SUBDIR="app"
export APP_ROOT=$SCRIPT_DIR/$SUBDIR
export STATIC_FOLDER=$APP_ROOT/static
mkdir -p $STATIC_FOLDER
pkill flask
# Check for regular vs test run
if [[ $SUBDIR == "test" ]]; then
pytest -sv
else
flask run --host="0.0.0.0" --port=$PORT
fi