You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my tests, it turns out that the file name of the cache file is not exactly the SHA-1 of the URL, but it is the SHA-1 of the URL prefixed by something I believe the source code in mozilla/gecko-dev calls "storageID", and a colon is inserted between the prefix and the URL.
At least on my machine, this storageID starts with O^partitionKey= and then continues with the URI-encoded version of (https,<domain>),, where the domain was derived from the URL (skipping subdomain prefixes) and the URI-encoding was done according to RFC3986, i.e. encoding also the parentheses.
Example: https://www.mozilla.org/en-US/ would be turned into 'O^partitionKey=%28https%2Cmozilla.org%29,:https://www.mozilla.org/en-US/, translating to the file name 1B859BB06A1B06364AF4788A20B4E37A32557DD8.
If this works elsewhere, too, I would believe that it might be a nice feature for FirefoxCache2 to determine the cache file name automatically from the URL.
(Note: If you use multi-account containers, like I do, things seem to get a bit more complicated because the prefix includes something like userContextId=2& after the O^ and before the partitionKey= part.)
Since my Python-fu is very, very rusty, let me describe the proposed algorithm in a node.js snippet:
Here is a not completely polished node.js script that Works For Me™️:
// This script takes a file from// $LOCALAPPDATA/Mozilla/Firefox/Profiles/*/cache2/entries/ and parses part of// its contents, in particular the "key" (which consists of a storage ID, a// colon, and the actual URL).//// Alternatively, it takes a URL and determines the file name from it.(async()=>{constfs=require('fs/promises')constfirefoxProfilesPath=`${process.env.LOCALAPPDATA}/Mozilla/Firefox/Profiles`letfirefoxProfileconstgetLatestFirefoxProfile=async()=>{letlatestMTime=0for(constdirofawaitfs.readdir(firefoxProfilesPath)){conststats=awaitfs.stat(`${firefoxProfilesPath}/${dir}`)if(stats.isDirectory()&&stats.mtime>latestMTime){firefoxProfile=dirlatestMTime=stats.mtime}}}awaitgetLatestFirefoxProfile()constencodeRFC3986URIComponent=(str)=>encodeURIComponent(str).replace(/[!'()*]/g,c=>`%${c.charCodeAt(0).toString(16).toUpperCase()}`)constcrypto=require('crypto')constgetCacheFileName=url=>{constmatch=url.match(/^(https?):\/\/([^/]+\.)?([^./]+\.[^./]+)(\/|$)/)if(!match)thrownewError(`Unhandled URL: ${url}`)constsha1=crypto.createHash('sha1')sha1.update(`O^partitionKey=${encodeRFC3986URIComponent(`(${match[1]},${match[3]})`)},:${url}`)returnsha1.digest('hex').toUpperCase()}constgetCacheFilePath=url=>`${firefoxProfilesPath}/${firefoxProfile}/cache2/entries/${getCacheFileName(url)}`constarg=process.argv[2]constpath=arg.startsWith('https://') ? getCacheFilePath(arg) : argconsthandle=awaitfs.open(path)conststat=awaithandle.stat()constuint32=newDataView(newArrayBuffer(4))uint32.read=async(offset)=>(awaithandle.read(uint32,0,4,offset)).buffer.getUint32()constrealSize=awaituint32.read(stat.size-4)constchunkCount=((realSize+(1<<18)-1)>>18)constversion=awaituint32.read(realSize+4+2*chunkCount)constfetchCount=awaituint32.read(realSize+4+2*chunkCount+4)constlastFetch=awaituint32.read(realSize+4+2*chunkCount+8)constmtime=awaituint32.read(realSize+4+2*chunkCount+12)constfrecency=awaituint32.read(realSize+4+2*chunkCount+16)constexpirationTime=awaituint32.read(realSize+4+2*chunkCount+20)constkeySize=awaituint32.read(realSize+4+2*chunkCount+24)constflags=awaituint32.read(realSize+4+2*chunkCount+28)constkey=(awaithandle.read(Buffer.alloc(keySize),0,keySize,realSize+4+2*chunkCount+32)).buffer.toString('utf-8')console.log(`size: ${stat.size}, real size: ${realSize}, chunk count: ${chunkCount}, version: ${version}, fetchCount: ${fetchCount}, lastFetch: ${newDate(lastFetch*1000)}/${lastFetch}, mtime: ${newDate(mtime*1000)}, expires: ${newDate(expirationTime*1000)}, keySize: ${keySize}, flags: ${flags}, key: '${key}'`)})().catch(console.log)
The text was updated successfully, but these errors were encountered:
In my tests, it turns out that the file name of the cache file is not exactly the SHA-1 of the URL, but it is the SHA-1 of the URL prefixed by something I believe the source code in mozilla/gecko-dev calls "storageID", and a colon is inserted between the prefix and the URL.
At least on my machine, this storageID starts with
O^partitionKey=
and then continues with the URI-encoded version of(https,<domain>),
, where the domain was derived from the URL (skipping subdomain prefixes) and the URI-encoding was done according to RFC3986, i.e. encoding also the parentheses.Example: https://www.mozilla.org/en-US/ would be turned into
'O^partitionKey=%28https%2Cmozilla.org%29,:https://www.mozilla.org/en-US/
, translating to the file name 1B859BB06A1B06364AF4788A20B4E37A32557DD8.If this works elsewhere, too, I would believe that it might be a nice feature for FirefoxCache2 to determine the cache file name automatically from the URL.
(Note: If you use multi-account containers, like I do, things seem to get a bit more complicated because the prefix includes something like
userContextId=2&
after theO^
and before thepartitionKey=
part.)Since my Python-fu is very, very rusty, let me describe the proposed algorithm in a node.js snippet:
This relies on the custom
encodeRFC3986URIComponent()
function:Here is a not completely polished node.js script that Works For Me™️:
The text was updated successfully, but these errors were encountered: