Commit Graph

50 Commits

Author SHA1 Message Date
João
219fc58401
Fix handling of bangs (#851)
Changed the implementation to work if the bang is at anyplace in the query.

Added a check to not spend time looking for an operator if a "!" is not present
in the query.

No longer allowed to have the bang at the "!" char at the end, since this may
cause some conflicts like the issue cited before, where the ! is after a word
in the query, which is natural in most languages.
2022-09-30 14:39:13 -06:00
João
74503d542e
Encode config params in URL (#842)
Adds support for encoding (and optionally encrypting) user config values as
a single string that can be passed to any endpoint with the "preferences" url
param.

Co-authored-by: Ben Busby <contact@benbusby.com>
2022-09-22 14:14:56 -06:00
Joao A. Candido Ramos
d05ec08abf
Remove wildcard imports (#791) 2022-06-24 10:51:15 -06:00
Ben Busby
ef98d85dc5
Ensure searches with a leading slash are treated as queries
A user reported a bug where searches with a leading slash (in this case:
"/e/OS apps" were interpreted as a Google specific link when clicking
the next page of results.

This was due to the behavior that Google's search results exhibit, where
internal links for pages like support.google.com are delivered with
params like "?q=/support" rather than a direct link. This fixes that
scenario by checking the "q" param value against the user's original
query to ensure they don't match before assuming that the result is
intended as a redirect.

Fixes #776
2022-06-03 14:03:57 -06:00
Ben Busby
b2c524bc3e
Update test for bang searches without a query
The new behavior for bang searches is to redirect to the proper result
site, rather than redirecting to the Whoogle home page.
2022-04-20 14:58:39 -06:00
Ben Busby
0048c2f9aa
Update remaining alternative frontends to use Farside
Wikipedia, imgur, and translate alternatives were all still using
hardcoded URLs when replaced with their respective alternative frontend.
This updates them to use farside instead.
2022-03-21 10:08:52 -06:00
Ben Busby
69f845a047
Add test for empty bang behavior
Also fix pep8 issue
2022-03-01 12:13:40 -07:00
Ben Busby
d33e8241dc
Fix "my ip" search regression
Removes dependency on class names for creating the "my ip" info card in
the results list for searches pertaining to the user's public IP.

Adds test to prevent this from happening again.

Note to anyone reading this and looking to contribute: please avoid
using hardcoded class names at all costs. This approach of
creating/removing content just results in issues if/when Google decides
to introduce/remove class names from the result page.

Fixes #657
2022-02-14 11:40:11 -07:00
Ben Busby
119437a07c
Fix test for blocking site from results
Previously the logic for testing site blocking was essentially "assert
blocked_site not part of result_site". This caused test failures, since
site blocking does not extend to subdomains for the blocked site. The
reversed logic makes more sense with what the test was trying to
accomplish.
2021-12-19 11:22:47 -07:00
Ben Busby
634d179568
Use farside.link for frontend alternatives in results (#560)
* Integrate Farside into Whoogle

When instances are ratelimited (when a captcha is returned instead of
the user's search results) the user can now hop to a new instance via
Farside, a new backend service that redirects users to working instances
of a particular frontend. In this case, it presents a user with a
Farside link to a new Whoogle (or Searx) instance instead, so that the
user can resume their search.

For the generated Farside->Whoogle link, the generated link includes the
user's current Whoogle configuration settings as URL params, to ensure a
more seamless transition between instances. This doesn't translate to
the Farside->Searx link, but potentially could with some changes.

* Expand conversion of config<->url params

Config settings can now be translated to and from URL params using a
predetermined set of "safe" keys (i.e. config settings that easily
translate to URL params).

* Allow jumping instances via Farside when ratelimited

When instances are ratelimited (when a captcha is returned instead of
the user's search results) the user can now hop to a new instance via
Farside, a new backend service that redirects users to working instances
of a particular frontend. In this case, it presents a user with a
Farside link to a new Whoogle (or Searx) instance instead, so that the
user can resume their search.

For the generated Farside->Whoogle link, the generated link includes the
user's current Whoogle configuration settings as URL params, to ensure a
more seamless transition between instances. This doesn't translate to
the Farside->Searx link, but potentially could with some changes.

Closes #554

Closes #559
2021-12-08 17:27:33 -07:00
Ben Busby
10a15e06e1
Fix incorrect request type for image searches
Previously had hardcoded POST requests for all requests that didn't use
the header template (which currently is only the image tab).

Also refactored how the Filter class works. It now requires a valid
Config model to be provided, which is then set up as a class var that
the filtering functions can use as needed, rather than setting specific
values from the config as individual values (which was confusing and
sloppy).

Fixes #561
2021-12-06 21:39:50 -07:00
Ben Busby
e06ff85579
Improve public instance session management (#480)
This introduces a new approach to handling user sessions, which should
allow for users to set more reliable config settings on public instances.

Previously, when a user with cookies disabled would update their config,
this would modify the app's default config file, which would in turn
cause new users to inherit these settings when visiting the app for the
first time and cause users to inherit these settings when their current
session cookie expired (which was after 30 days by default I believe).
There was also some half-baked logic for determining on the backend
whether or not a user had cookies disabled, which lead to some issues
with out of control session file creation by Flask.

Now, when a user visits the site, their initial request is forwarded to
a session/<session id> endpoint, and during that subsequent request
their current session id is matched against the one found in the url. If
the ids match, the user has cookies enabled. If not, their original
request is modified with a 'cookies_disabled' query param that tells
Flask not to bother trying to set up a new session for that user, and
instead just use the app's fallback Fernet key for encryption and the
default config.

Since attempting to create a session for a user with cookies disabled
creates a new session file, there is now also a clean-up routine included
in the new session decorator, which will remove all sessions that don't
include a valid key in the dict. NOTE!!! This means that current user
sessions on public instances will be cleared once this update is merged
in. In the long run that's a good thing though, since this will allow session
mgmt to be a lot more reliable overall for users regardless of their cookie
preference.

Individual user sessions still use a unique Fernet key for encrypting queries,
but users with cookies disabled will use the default app key for encryption
and decryption.

Sessions are also now (semi)permanent and have a lifetime of 1 year.
2021-11-17 19:35:30 -07:00
Vansh Comar
f04c7c5557
Support DDG style bangs with bang at the end (#503)
DDG style bang searches can now have the bang (!) at the end of
the search (i.e. "bologna w!" will now redirect to wikipedia just like
"bologna !w" would)
2021-10-28 12:39:33 -06:00
Ben Busby
6763c2e99d
Remove test for deprecated feature
Setting config using the URL is a feature that is being deprecated in
the next release, so the test for confirming its functionality has been
removed.
2021-10-26 15:04:21 -06:00
BlissOWL
f12b0e62c5
Make bang searches case insensitive (#438)
Bang searches now ignore the capitalization of the operator

Co-authored-by: Ben Busby <noreply+git@benbusby.com>
2021-09-27 19:39:58 -06:00
Ben Busby
bcb1d8ecc9
Add lingva translation support in search (#360)
* Add support for Lingva translations in results

Searches that contain the word "translate" and are normal search queries
(i.e. not news/images/video/etc) now create an iframe to a Lingva url to
translate the user's search using their configured search language.

The Lingva url can be configured using the WHOOGLE_ALT_TL env var, or
will fall back to the official Lingva instance url (lingva.ml).

For more info, visit https://github.com/TheDavidDelta/lingva-translate

* Add basic test for lingva results

* Allow user specified lingva instances through csp frame-src

* Fix pep8 issue
2021-06-15 10:14:42 -04:00
Ben Busby
4649d96dda
Support basic localization (#325)
* Replace hardcoded strings using translation json file

This introduces a new "translations.json" file under app/static/settings
that is loaded on app init and uses the user config value for interface
language to determine the appropriate strings to use in Whoogle-specific
elements of the UI (primarily only on the home page).

* Verify interface lang can be used for localization

Check the configured interface language against the available
localization dict before attempting to use, otherwise fall back to
english.

Also expanded language names in the languages json file.

* Add test for validating translation language keys

Also adds Spanish translation to json (the only non-English language I
can add and reasonably validate on my own).

* Validate all translations against original keyset, update readme

Readme has been updated to include basic contributing guidelines for
both code and translations.
2021-05-24 17:03:02 -04:00
Ben Busby
c8da53d4b0
Block websites from search results via user config (#304)
* Block websites in search results via user config

Adds a new config field "Block" to specify a comma separated list of
websites to block in search results. This is applied for all searches.

* Add test for blocking sites from search results

* Document WHOOGLE_CONFIG_BLOCK usage

* Strip '-site:' filters from query in header template

The 'behind the scenes' site filter applied for blocked sites was
appearing in the query field when navigating between search categories
(all -> images -> news, etc). This prevents the filter from appearing in
all except "images", since the image category uses a separate header.
This should eventually be addressed when the image page can begin using
the standard whoogle header, but until then, the filter will still
appear for image searches.
2021-05-07 11:45:53 -04:00
Angel Mario
d6d7110e22
Add option to disable changing config from client (#295)
* Add option to disable changing of configuration

Introduces a test to ensure the correct response code is found when
attempting to update the config when disabled, and ensure default config
is unchanged when posting a new config dict.

Attempting to update the config using the API when disabled now returns
a 403 code + redirect.

Co-authored-by: Ben Busby <benbusby@protonmail.com>
2021-04-27 10:36:03 -04:00
Ben Busby
baa7a87efb
Fix incorrect config bool env var casting
Config boolean environment variables need to be cast to ints, since
they are set or unset using 0 and 1. Previously they were interpreted as
(pseudocode) read_var(name, default=False), which meant that setting
CONFIG_VAR=0 would enable that variable since Python reads environment
variables as strings, and '0' is truthy. This updates the previous logic
to (still pseudocode) int(read_var(name, default='0')).

Fixes #279
2021-04-12 16:40:59 -04:00
Ben Busby
df0b7afa50 Switch to single Fernet key per session
This moves away from the previous (messy) approach of using two separate
keys for decrypting text and element URLs separately and regenerating
them for new searches. The current implementation of sessions is not very
reliable, which lead to keys being regenerated too soon, which would
break page navigation. Until that can be addressed, the single
key per session approach should work a lot better.

Fixes #250

Fixes #90
2021-04-05 11:00:56 -04:00
Ben Busby
62a9b9e949 Allow user-defined CSS/theming (#227)
* Add custom CSS field to config

This allows users to set/customize an instance's theme and appearance to
their liking. The config CSS field is prepopulated with all default CSS
variable values to allow quick editing.

Note that this can be somewhat of a "footgun" if someone updates the
CSS to hide all fields/search/etc. Should probably add some sort of
bandaid "admin" feature for public instances to employ until the whole
cookie/session issue is investigated further.

* Symlink all app static files to test dir

* Refactor app/misc/*.json -> app/static/settings/*.json

The country/language json files are used for user config settings, so
the "misc" name didn't really make sense. Also moved these to the static
folder to make testing easier.

* Fix light theme variables in dark theme css

* Minor style tweaking
2021-04-05 11:00:56 -04:00
Ben Busby
f8dfc78539 Improve naming of *_utils files, update fn/class doc
The app/utils/*_utils weren't named very well, and all have been updated
to have more accurate names.

Function and class documention for the utils have been updated as well,
as part of the effort to improve overall documentation for the project.
2021-04-05 11:00:56 -04:00
Ben Busby
0a6575d219
Hotfix: Move language/country json to app dir
Pip installs of whoogle search were missing access to the misc/ folder,
which previously contained the language and country json files. These
have been moved to app/misc, and the previous root level misc/ was
renamed to config/ (since it now only contains the tor config files).

Bump to 0.3.1.
2021-02-07 18:55:27 -05:00
Ben Busby
6e7ec9918a
Move language/country settings to app config
Moves the language and country dicts from the config model to json files
that are loaded during app init and stored in the app config dict. This
substantially improves the readability of the config model and allows
for much more sensible loading of the language/country options.
2020-12-17 16:42:05 -05:00
Ben Busby
375f4ee9fd
PEP-8: Fix formatting issues, add CI workflow (#161)
Enforces PEP-8 formatting for all python code

Adds a github action build for checking pep8 formatting using pycodestyle
2020-12-17 16:06:47 -05:00
Ben Busby
5b5c2588ed
Fix nojs lxml constructor
The BeautifulSoup constructur in gen_nojs needed to explicitly set
features='lxml' to silence a warning from the library.

Also temporarily disabled the site alts test since the results are too
unreliable. This should be moved to a unit test instead.
2020-12-11 19:21:32 -05:00
Ben Busby
6c429e6dd1
Allow setting site alts using environment vars (#155)
* Add ability to configure site alts w/ env vars

Site alternatives (i.e. twitter.com -> nitter.net) can now be configured
using environment variables:

WHOOGLE_ALT_TW='nitter.net' # twitter alt
WHOOGLE_ALT_YT='invidio.us' # youtube alt
WHOOGLE_ALT_IG='bibliogram.art/u' # instagram alt

Updated testing to confirm results have been modified.

* Add site alt vars to docker settings and readme
2020-12-05 17:01:21 -05:00
Ben Busby
72cbc342af Add ability to set temp config in search query
Dark mode, country, interface language, and search language configs
can now be set in the search query by appending each option as a
url parameter.

Supported args are: 'dark', 'lang_search', 'lang_interface', and 'ctry'

Ex: /search?q=%s&dark=1&lang_search=lang_en...

These config settings persist across page navigation and switching
result type, but will be reset if the main search bar is used.

See #144
2020-11-11 00:40:49 -05:00
Ben Busby
ae05e8ff8b Finished basic implementation of DDG bang feature
Initialization of the app now includes generation of a ddg-bang json
file, which is used for all bang style searches afterwards.

Also added search suggestion handling for bang json lookup. Queries
beginning with "!" now reference the bang json file to pull all keys
that match.

Updated test suite to include basic tests for bang functionality.

Updated gitignore to exclude bang subdir.
2020-10-10 15:55:14 -04:00
Ben Busby
975ece8cd0
Privacy respecting alternatives in results view (#106)
Full implementation of social media alt redirects (twitter/youtube/instagram -> nitter/invidious/bibliogram) depending on configuration.

Verbatim search and option to ignore search autocorrect are now supported as well.

Also cleaned up the javascript side of whoogle config so that it now
uses arrays of available fields for parsing config values instead of manually assigning each
one to a variable.

This doesn't include support for Google Maps -> Open Street Maps, that
seems a bit more involved than the social media redirects were, so it
should likely be a separate effort.
2020-07-26 11:53:59 -06:00
Ben Busby
4577c11d4c Merge branch 'develop' of github.com:benbusby/whoogle-search into develop 2020-06-28 10:54:23 -06:00
Ben Busby
6ef7ab663a Small update to results time period test
Updated to ensure a child span element is available before running a
test to verify the correct time range for the result. Need to come up
with a better way of ensuring uniform results across multiple tests,
since otherwise periodic changes in the returned results can cause tests
to fail.
2020-06-28 10:52:53 -06:00
Joao A. Candido Ramos
bf4bf1ff2c
Split interface and results language config (#89)
Adding support to choose separately the language of search and the one for the interface (allowing a default givent by google).

Co-authored-by: Joao <ramos.joao@protonmail.com>
2020-06-27 14:23:17 -06:00
Ben Busby
6ec65f8754 Reworked pytest client fixture to support new session mgmt 2020-06-05 16:09:04 -06:00
Ben Busby
32e837a5e0 Refactored whoogle session mgmt
Now allows a fallback "default" session to be used if a user's browser
is blocking cookies
2020-06-05 15:24:44 -06:00
Ben Busby
b6fb4723f9
Project refactor (#85)
* Major refactor of requests and session management

- Switches from pycurl to requests library
  - Allows for less janky decoding, especially with non-latin character
  sets
- Adds session level management of user configs
  - Allows for each session to set its own config (people are probably
  going to complain about this, though not sure if it'll be the same
  number of people who are upset that their friends/family have to share
  their config)
- Updates key gen/regen to more aggressively swap out keys after each
request

* Added ability to save/load configs by name

- New PUT method for config allows changing config with specified name
- New methods in js controller to handle loading/saving of configs

* Result formatting and removal of unused elements

- Fixed question section formatting from results page (added appropriate
padding and made questions styled as italic)
- Removed user agent display from main config settings

* Minor change to button label

* Fixed issue with "de-pickling" of flask session

Having a gitignore-everything ("*") file within a flask session folder seems to cause a
weird bug where the state of the app becomes unusable from continuously
trying to prune files listed in the gitignore (and it can't prune '*').

* Switched to pickling saved configs

* Updated ad/sponsored content filter and conf naming

Configs are now named with a .conf extension to allow for easier manual
cleanup/modification of named config files

Sponsored content now removed by basic string matching of span content

* Version bump to 0.2.0

* Fixed request.send return style
2020-06-02 12:54:47 -06:00
Ben Busby
21012f5265
Feature: autocomplete/search suggestions (#72)
Basic autocomplete/search suggestion functionality added

* Adds new GET and POST routes for '/autocomplete' that accept a string query and returns an array of suggestions

* Adds new autoscript.js file for handling queries on the main page and results view

* Updated requests class to include autocomplete method

* Updated opensearch template to handle search suggestions

* Added header template to allow for autocomplete on results view

* Updated readme to mention autocomplete feature
2020-05-24 14:03:11 -06:00
Ben Busby
09c53b52af
Feature: country and safe search config options (#71)
* Added country and safe search config options

* Updated handling of parser error in results test

* Improved handling of default country

* Added 1px empty gif fallback as a replacement for images that fail to load
2020-05-23 14:27:23 -06:00
Ben Busby
b15368ac28 Updated recent results test w/ +5 day tolerance 2020-05-20 11:07:01 -06:00
Paul Rothrock
0e39b8f97b
Added "I'm feeling lucky" function (#46)
* Putting '! ' at the beginning of the query now redirects to the first search result

Signed-off-by: Paul Rothrock <paul@movetoiceland.com>

* Moved get_first_url outside of filter class

Signed-off-by: Paul Rothrock <paul@movetoiceland.com>
2020-05-18 10:28:23 -06:00
Ben Busby
708769f682 Minor styling refactor, updated app name 2020-05-04 18:00:43 -06:00
Ben Busby
0a3da5cea4 Updated js controller and config api route
Controller was refactored to be a bit less monolithic.

Config route was updated to accept an html form data POST rather than
just a json object.
2020-04-28 20:50:12 -06:00
Ben Busby
1cbe394e6f Updated tests, fixed a few bugs
Added opensearch routes test and individual tests for searching via GET
and POST separately.

Fixed incorrect assignment in gen_query.
2020-04-28 18:59:33 -06:00
Ben Busby
dd077954bf Fixed search results test
For datetime spans in time-filtered search results, anything less than 7
characters or more than 15 can be guaranteed to not be properly
formatted dates (either "mm dd yyyy" or "xx days/months/weeks ago")
2020-04-26 18:11:02 -06:00
Ben Busby
9c7b4c1444 Fixed bad test assertion
Was previously checking for non-inclusive max number of days (i.e.
filtering by past month would return a failed test if the result was
from exactly 31 days ago)
2020-04-24 18:10:57 -06:00
Ben Busby
024552f2df Minor refactor of filter class, updated tests, fixed html/css, added ua to config 2020-04-16 10:01:02 -06:00
Ben Busby
0d17cc8ef6 Modified result length test 2020-04-15 17:54:38 -06:00
Ben Busby
ab59682cad Small var fix in testing search results 2020-04-15 17:48:49 -06:00
Ben Busby
b5b6e64177 Added testing and ci build, refactored filter class, refactored project structure 2020-04-15 17:41:53 -06:00