Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CleanLinks Pulling Result URL from 'referrer=' Value #147

Open
ghost opened this issue Sep 3, 2016 · 2 comments
Open

CleanLinks Pulling Result URL from 'referrer=' Value #147

ghost opened this issue Sep 3, 2016 · 2 comments

Comments

@ghost
Copy link

ghost commented Sep 3, 2016

CleanLinks changes

https://login.xmarks.com/?referrer=https%3A%2F%2Flastpass.com%2Ffeatures_joinpremiumxmarks2.php%3Flpuser%3Dlastpass%2540mailinator.com&append=1

to

https://lastpass.com/features_joinpremiumxmarks2.php?lpuser=lastpass%40mailinator.com

I would like CleanLinks to change the URL to

https://login.xmarks.com/

Though I would settle for

https://login.xmarks.com/?append=1

For the patterns in the Remove From Links field, I think you use '=' as an implied terminator. So

(?:ref|aff)\w*

should match

referrer=

But I tried to provide a more specific pattern for

referrer=

To the end of the Remove From Links defaults, I added

|(?:ref(?:er(?:r?er)?)?)\w*

I added my pattern to the end so I would not change any of the patterns you provide as defaults. I included 'ref' in my pattern so that my pattern could stand on its own and still match 'ref'.

With my pattern at the end, CleanLinks still changed the URL to

https://lastpass.com/features_joinpremiumxmarks2.php?lpuser=lastpass%40mailinator.com

I removed the other patterns. I inserted my pattern as the only pattern in Remove From Links. CleanLinks still changed the URL to

https://lastpass.com/features_joinpremiumxmarks2.php?lpuser=lastpass%40mailinator.com

I then inserted

referrer

as the only pattern in Remove From Links. No Regular Expression special characters to clutter up the works. CleanLinks still changed the URL to

https://lastpass.com/features_joinpremiumxmarks2.php?lpuser=lastpass%40mailinator.com

What is going on?

@geokis
Copy link

geokis commented Sep 4, 2016

CL has an algorithm to detect nested and encoded links.
In your case CL detects the url-encoded url:

https%3A%2F%2Flastpass.com%2Ffeatures_joinpremiumxmarks2.php%3Flpuser%3Dlastpass%2540mailinator.com

and drop off the remains.

The algorithm has a higher priority as the user pattern.
The only part you can clean is lpuser=lastpass%40mailinator.com with lpuser

@ghost
Copy link
Author

ghost commented Sep 11, 2016

CL has an algorithm to detect nested and encoded links

No kidding. And for URLs like

https://login.xmarks.com/?referrer=https%3A%2%2Flastpass.com%2Ffeatures_joinpremiumxmarks2.php%3Flpuser%3Dlastpass%2540mailinator.om&append=1

the algorithm is wrong.

The algorithm has a higher priority as the user pattern

That is another problem. But that is a design problem. It is not part of the incorrect extraction of the target URL by the internal algorithm.

Leave the filter priority problem for another Issue. Fix the internal algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant