Skip to content

Commit

Permalink
Merge pull request #53 from dotnettools/blackwidow
Browse files Browse the repository at this point in the history
Blackwidow
  • Loading branch information
javidsho authored Nov 19, 2021
2 parents 7c4a333 + 8a133ab commit d46c2c4
Show file tree
Hide file tree
Showing 145 changed files with 5,648 additions and 184 deletions.
35 changes: 27 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@
[![NuGet download count](https://img.shields.io/nuget/dt/SharpGrabber)](https://www.nuget.org/packages/SharpGrabber)

This repository contains multiple related projects:
- `SharpGrabber` is a *.NET Standard* library for scraping top media providers and grabbing high quality video, audio and information.
- `SharpGrabber.Converter` is a *.NET Standard* library based on `ffmpeg` shared libraries to join audio and video streams. This is particularly useful when grabbing high quality *YouTube* media that might be separated into audio and video files. It is also used for merging HLS stream segments.
- `SharpGrabber.Desktop` A cross-platform desktop application which utilizes both mentioned libraries to expose their functionality to desktop end-users.
- <a href="#how-to-use">`SharpGrabber`</a> is a *.NET Standard* library for scraping top media providers and grabbing high quality video, audio and information.
- <a href="#how-to-use">`SharpGrabber.Converter`</a> is a *.NET Standard* library based on `ffmpeg` shared libraries to join audio and video streams. This is particularly useful when grabbing high quality *YouTube* media that might be separated into audio and video files. It is also used for merging HLS stream segments.
- <a href="#introducing-blackwidow">`SharpGrabber.BlackWidow`</a> is a *.NET Standard* library for grabbing with JavaScript, which has many advantages over using scattered NuGet packages.
- <a href="#sharpgrabberdesktop">`SharpGrabber.Desktop`</a> A cross-platform desktop application which utilizes all three libraries mentioned above to expose their functionality to desktop end-users.

# How to Use
**⭐ Please give a star if you find this project useful!**
Expand All @@ -24,7 +25,7 @@ This repository contains multiple related projects:
The `SharpGrabber` package defines abstractions only. The actual grabbers have their own packages and should be installed separately.

### <a href="https://www.nuget.org/packages/SharpGrabber/">SharpGrabber</a> - Core Package
Install-Package SharpGrabber -Version 2.0.2
Install-Package SharpGrabber -Version 2.1

### <a href="https://www.nuget.org/packages/SharpGrabber.Converter/">SharpGrabber.Converter</a>
It's an optional package to work with media files. Using this package, you can easily concatenate video segments, or mux audio and video channels.
Expand Down Expand Up @@ -95,9 +96,10 @@ The good news is no functionality has been removed, so with a minor refactoring,
I strongly recommend that you upgrade, v2 has a much cleaner structure and code.

</details>

## SharpGrabber.Desktop 3.3
- It uses every package mentioned above and supports all of the mentioned providers!
## SharpGrabber.Desktop
### Version 3.3
- Grabs from every source supported by official grabbers.
- Displays information and downloads videos, audios, images etc.
- Merges YouTube separated audio and video streams into complete media files. It can join HLS segments as well!

Expand All @@ -111,12 +113,29 @@ Requirements of the cross-platform desktop application to run and operate correc

<img src="./assets/SharpGrabberDesktop-ScreenShot-3.3.png" alt="SharpGrabber.Desktop Application" />

# Introducing BlackWidow
<img src="./assets/blackwidow-logo-text-sm.png" alt="SharpGrabber" height="92" />

BlackWidow executes scripts written specifically for grabbing, rather than relying on .NET assemblies.
- **Always Up-to-date:** The scripts are always kept up-to-date at runtime; so the functionality of the host application won't break as the sources change - at least not for long!
- **ECMAScript Support:** Supports JavaScript/ECMAScript out of the box.
- **Easy Maintenance:** *JavaScript* is darn easy to write and understand! This helps contributors to quickly write new grabbers or fix the existing ones.
- **Secure**: The scripts are executed in a sandbox environment, and they only have access to what the BlackWidow API exposes to them.
- **Highly Customizable:** Almost everything is open for extension or replacement. Make new script interpreters, custom grabber repositories, or roll out your own interpreter APIs

<a href="blackwidow">Read more + Documentation</a>

## Contribution
You are most welcome to contribute!
- Support for more media providers such as *DailyMotion*, *Instagram*, *Facebook*, *Twitch* etc.
- Authentication mechanisms for grabbers e.g. Instagram Login
- Support for more media providers such as *DailyMotion*, *Facebook*, *Twitch* etc.
- Accelerate downloads in the desktop app (like a download manager)

## Disclaimer
SharpGrabber library, BlackWidow and other projects and libraries provided in this repository are developed for educational purposes.
Since it's illegal to extract copyrighted data, you should make sure your usage of the tools provided here complies with copyright laws.
Contributors to these tools are not responsible for any copyright infringement that may occur per usage.

## License
Copyright &copy; 2021 Javid Shoaei and other contributors<br />

Expand Down
17 changes: 17 additions & 0 deletions SharpGrabber.sln
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "SharpGrabber.Hls", "src\Sha
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "SharpGrabber.Instagram", "src\SharpGrabber.Instagram\SharpGrabber.Instagram.csproj", "{094B729B-9871-4A2C-9228-9AAEE66F135D}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "SharpGrabber.BlackWidow", "src\SharpGrabber.BlackWidow\SharpGrabber.BlackWidow.csproj", "{9F3A8C86-8F28-4F54-B8A6-DBB49DDB5171}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "SharpGrabber.BlackWidow.Tests", "tests\SharpGrabber.BlackWidow.Tests\SharpGrabber.BlackWidow.Tests.csproj", "{4CB41014-D036-4090-B6FA-4CFB01D82C3A}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Tests", "Tests", "{ADFEEE61-D79B-4F91-A192-F6A2E949673C}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down Expand Up @@ -62,10 +68,21 @@ Global
{094B729B-9871-4A2C-9228-9AAEE66F135D}.Debug|Any CPU.Build.0 = Debug|Any CPU
{094B729B-9871-4A2C-9228-9AAEE66F135D}.Release|Any CPU.ActiveCfg = Release|Any CPU
{094B729B-9871-4A2C-9228-9AAEE66F135D}.Release|Any CPU.Build.0 = Release|Any CPU
{9F3A8C86-8F28-4F54-B8A6-DBB49DDB5171}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{9F3A8C86-8F28-4F54-B8A6-DBB49DDB5171}.Debug|Any CPU.Build.0 = Debug|Any CPU
{9F3A8C86-8F28-4F54-B8A6-DBB49DDB5171}.Release|Any CPU.ActiveCfg = Release|Any CPU
{9F3A8C86-8F28-4F54-B8A6-DBB49DDB5171}.Release|Any CPU.Build.0 = Release|Any CPU
{4CB41014-D036-4090-B6FA-4CFB01D82C3A}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{4CB41014-D036-4090-B6FA-4CFB01D82C3A}.Debug|Any CPU.Build.0 = Debug|Any CPU
{4CB41014-D036-4090-B6FA-4CFB01D82C3A}.Release|Any CPU.ActiveCfg = Release|Any CPU
{4CB41014-D036-4090-B6FA-4CFB01D82C3A}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(NestedProjects) = preSolution
{4CB41014-D036-4090-B6FA-4CFB01D82C3A} = {ADFEEE61-D79B-4F91-A192-F6A2E949673C}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {0003E70E-C9A2-459C-A6A0-540449AC7A87}
EndGlobalSection
Expand Down
Binary file added assets/blackwidow-logo-text-sm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/blackwidow-logo-text.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/blackwidow-logo-text.psd
Binary file not shown.
Binary file added assets/blackwidow-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/blackwidow-logo.psd
Binary file not shown.
27 changes: 27 additions & 0 deletions blackwidow/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<img src="../assets/blackwidow-logo-text.png" alt="SharpGrabber" height="128" />

# BlackWidow

BlackWidow is a .NET library based on SharpGrabber. Rather than relying on .NET assemblies, BlackWidow executes scripts written specifically for grabbing.

## Why use BlackWidow?
BlackWidow gives you the following advantages over the traditional NuGet package approach:

- **Always Up-to-date:** The scripts are always kept up-to-date at runtime; so the functionality of the host application won't break as the sources change - at least not for long!
- **ECMAScript Support:** Supports JavaScript/ECMAScript out of the box.
- **Easy Maintenance:** *JavaScript* is darn easy to write and understand! This helps contributors to quickly write new grabbers or fix the existing ones.
- **Secure**: The scripts are executed in a sandbox environment, and they only have access to what the BlackWidow API exposes to them.
- **Highly Customizable:** Almost everything is open for extension or replacement. Make new script interpreters, custom grabber repositories, or roll out your own interpreter APIs

## How does it work?

BlackWidow keeps a collection of scripts locally - called the local repository.
Each script gets interpreted as an object implementing `IGrabber`.
To keep the scripts up-to-date, a remote repository is constantly monitored as the single source of truth.

*TODO:* <a href="https://github.com/dotnettools/SharpGrabber">Read the Documentation</a>

# Installation
*WIP*

<a href="https://github.com/dotnettools/SharpGrabber">&lt;- Back to Home Page</a>
22 changes: 22 additions & 0 deletions blackwidow/repo/feed.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"scripts": [
{
"id": "vimeo.com",
"name": "Vimeo",
"version": "1.0",
"type": "JavaScript",
"apiVersion": 1,
"supportedRegularExpressions": [ "^https?://(www\\.|player\\.)?vimeo\\.com/(video/)?([0-9]+)" ],
"file": "scripts/vimeo.js"
},
{
"id": "pornhub.com",
"name": "PornHub",
"version": "1.0",
"type": "JavaScript",
"apiVersion": 1,
"supportedRegularExpressions": [ "^(https?:\\/\\/)?(www\\.)?pornhub\\.com\\/([^\\/]+)viewkey=(\\w+).*$" ],
"file": "scripts/pornhub.js"
}
]
}
115 changes: 115 additions & 0 deletions blackwidow/repo/scripts/pornhub.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
const urlMatcher = /^(https?:\/\/)?(www\.)?pornhub\.com\/([^\/]+)viewkey=(\w+).*$/i
const flashVarsFinder = /^\s*(var|let)\s+(flashvars[\w_]+)\s+=/mi

const getViewId = uri => {
const url = new URL(uri)
const match = urlMatcher.exec(uri)
if (!match)
return undefined
return match[4]
}

const getStdUrl = url => {
return `https://www.pornhub.com/view_video.php?viewkey=${url}`
}

const parseFlashVarsScript = doc => {
let source
let varName
doc.selectAll('script').forEach(elem => {
const match = flashVarsFinder.exec(elem.innerText)
if (match) {
source = elem.innerText
varName = match[2]
}
})

const flashVars = new Function('let playerObjList = {};'+source + ';return '+varName+';')()
if (!flashVars)
throw new GrabException('Could not extract flashVars.')
return flashVars
}

const updateResult = (result, vars) => {
const parseBool = str => typeof str === 'boolean' ? str : new Function('return ' + str)();

if (parseBool(vars.video_unavailable))
throw new GrabException('This video is unavailable.')
if (parseBool(vars.video_unavailable_country))
throw new GrabException('This video is unavailable in your country.')

const duration = vars.video_duration * 1000 // milliseconds

result.title = vars.video_title

result.grab('info', {
length: duration
})

result.grab('image', {
resourceUri: vars.image_url,
type: 'primary'
})

vars.mediaDefinitions.forEach(def => {
if (!def.quality || def.remote || !def.videoUrl)
return

if (def.format === 'hls') {
// grab HLS stream
if (Array.isArray(def.quality)) {
result.grab('hlsStreamReference', {
resourceUri: def.videoUrl,
playlistType: 'master',
resolution: def.quality.join(',')
})
} else {
result.grab('hlsStreamReference', {
resourceUri: def.videoUrl,
playlistType: 'stream',
resolution: def.quality
})
}
} else {
// grab mp4 video
result.grab('media', {
resourceUri: def.videoUrl,
format: {
mime: 'video/mp4',
extension: 'mp4',
channels: 'both',
length: duration,
container: 'mp4,'
resolution: def.quality,
formatTitle: 'MP4 ' + def.quality,
}
})
}
})
}

grabber.supports = uri => {
return getViewId(uri) !== undefined
}

grabber.grab = (request, result) => {

// init
const viewId = getViewId(request.url)
if (!viewId)
return false

// download page
const url = getStdUrl(viewId)
const response = http.client.get({
url
})
response.assertSuccess()

// parse response HTML
const doc = html.parse(response.bodyText)
const flashVars = parseFlashVarsScript(doc)
updateResult(result, flashVars)

return true
}
83 changes: 83 additions & 0 deletions blackwidow/repo/scripts/vimeo.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
const urlRegex = /^https?:\/\/(www\.|player\.)?vimeo\.com\/(video\/)?([0-9]+)/

function getVideoId(url) {
const match = urlRegex.exec(url)
return match ? match[3] : undefined
}

function getConfigUrl(videoId) {
return 'https://player.vimeo.com/video/{0}/config'.replace('{0}', videoId)
}

function fetchConfig(videoId) {
const url = getConfigUrl(videoId)
const response = http.client.get({
url,
expectText: true
})
response.assertSuccess()
return JSON.parse(response.bodyText)
}

function setGrabResult(result, config) {
if (!config.request.files)
throw new GrabException('Video is unavailable.')

// add info
result.title = config.video.title
result.grab('info', {
author: config.video.owner?.name,
length: config.video.duration * 1000,
})

// add images
if (config.video.thumbs) {
for (var key in config.video.thumbs) {
const isBase = Number.isNaN(Number(key))
const size = isBase ? undefined : {
width: key,
height: key * 0.5625
};
result.grab('image', {
resourceUri: config.video.thumbs[key],
type: isBase ? 'primary' : 'thumbnail',
size
})
}
}

// add media
config.request.files.progressive.forEach(file => {
const fileMime = file.mime || 'video/mp4'
const fileExt = mime.getExtension(fileMime)
const containerName = fileExt.toUpperCase()
result.grab('media', {
resourceUri: file.url,
channels: 'both',
container: containerName,
resolution: file.quality,
formatTitle: containerName + ' ' + file.quality,
pixelWidth: file.width,
pixelHeight: file.height,
format: {
mime: fileMime,
extension: fileExt
}
})
})
}

grabber.supports = url => Boolean(getVideoId(url))

grabber.grab = (request, result) => {
const videoId = getVideoId(request.url)
if (!videoId)
return false

const config = fetchConfig(videoId)
if (!config)
throw new GrabException('Failed to fetch video config.')

setGrabResult(result, config)
return true
}
Loading

0 comments on commit d46c2c4

Please sign in to comment.