返回列表 发帖

[原创代码] PowerShell实现科学文库图书下载

科学文库图书下载,png格式,可自行转换为pdf,去除在线阅读限制
科学文库官网: https://book.sciencereading.cn/shop/main/Login/shopFrame.do
图书链接格式: https://book.sciencereading.cn/s ... E053020B0A0A1666000
侵权请通知删除,谨慎外传
带目录索引下载的更新在7楼
cls
<#
    科学文库官网: https://book.sciencereading.cn/shop/main/Login/shopFrame.do
    图书链接格式: https://book.sciencereading.cn/shop/book/Booksimple/show.do?id=B970CE8AEE3531D1DE053020B0A0A1666000
#>
#图书连接
$book_url = Read-Host -Prompt '输入图书链接'
$resp = Invoke-WebRequest -Uri $book_url
#图书名称
$book_name = $null
$book_name = $resp.ParsedHtml.querySelector('.book_detail_title').innerText.Trim()
if($book_name -eq $null){
    Write-Host '图书名称解析失败' -ForegroundColor Red
    pause
    exit
}
$book_name = $book_name -replace '\<|\>|\?|\*|\:|\||\/|\\',' '
#图书id
$book_id = $book_url -split '=' | Select-Object -Last 1
#服务器ip和端口
$server_ip = '159.226.241.32'
$server_port = 81
#默认用户id
$default_user = '825ae171eb514934b1ed2374976f4a9f'
#文档编号
$doc_num_api = 'https://wkobwp.sciencereading.cn/api/file/add'
$resp = Invoke-WebRequest -UseBasicParsing -Method Post -Uri $doc_num_api -Headers @{
    'accessToken' = 'accessToken'
    'Content-Type' = 'application/x-www-form-urlencoded; charset=UTF-8'
} -Body (
    'params=%7B%22params%22%3A%7B%22userName%22%3A%22Guest%22%2C%22userId%22%3A%22{0}%22%2C%22file%22%3A%22http%3A%2F%2F{1}%3A{2}%2F{3}.pdf%22%7D%7D&type=http' -f $default_user,$server_ip,$server_port,$book_id
)
$book_number = ($resp.Content -split '"')[3]
#获取图片总数量
$img_count = $null
while($img_count -eq $null){
    $book_number
    $resp = Invoke-WebRequest -UseBasicParsing -Uri ('https://wkobwp.sciencereading.cn/asserts/{0}/manifest?language=zh-CN' -f $book_number)
    $json = [System.Text.UTF8Encoding]::UTF8.GetString($resp.Content) | ConvertFrom-Json
    $json = $json.docinfo | ConvertFrom-Json
    $img_count = $json.PageCount
}
#遍历下载图片
[void][System.IO.Directory]::CreateDirectory($book_name)
$img_url = 'https://wkobwp.sciencereading.cn/asserts/{0}/image/{1}/100?accessToken=accessToken&formMode=true';
0..($img_count-1) | foreach {
    $url = $img_url -f $book_number,$_
    $png = '.\{0}\{0}-{1:000}.png' -f $book_name,$_
    while($true){
        Write-Host ('{0}/{1}' -f $_,$img_count) -ForegroundColor Yellow
        try{
            $resp = Invoke-WebRequest -UseBasicParsing -Uri ($img_url -f $book_number,$_)
            [System.IO.File]::WriteAllBytes($png,$resp.Content)
            [System.IO.Path]::GetFileName($png)
            break
        } catch {
            Start-Sleep -Seconds 1  
        }
    }
}
Write-Host '全部下载完成' -ForegroundColor Green
pauseCOPY
$img_url = 'https://wkobwp.sciencereading.cn/asserts/{0}/image/{1}/100?accessToken=accessToken&formMode=true';

100改大点,图像更清晰。听别人说有9个级别
分辨率共有50,75,100,125,150,200,400,800,1000九个级别
1

评分人数

谢谢分享哦
我是小白,希望老师多多帮助

TOP

谢谢分享,不知道全部下载完成多少G?
目的,学习批处理

TOP

$img_url = 'https://wkobwp.sciencereading.cn/asserts/{0}/image/{1}/100?accessToken=accessToken&formMode=true';

100改大点,图像更清晰。听别人说有9个级别
分辨率共有50,75,100,125,150,200,400,800,1000九个级别

TOP

回复 3# hlzj88


    只下载自己需要的吧,爬虫这种事本来就不道德

TOP

回复 4# pd1


    感谢提醒,已更新到顶楼

TOP

更新一个带目录索引的
cls
<#
    科学文库官网: https://book.sciencereading.cn/shop/main/Login/shopFrame.do
    图书链接格式: https://book.sciencereading.cn/shop/book/Booksimple/show.do?id=B970CE8AEE3531D1DE053020B0A0A1666000
#>
#图书连接
$book_url = Read-Host -Prompt '输入图书链接'
$resp = Invoke-WebRequest -Uri $book_url
#图书名称
$book_name = $null
$book_name = $resp.ParsedHtml.querySelector('.book_detail_title').innerText.Trim()
if($book_name -eq $null){
    Write-Host '图书名称解析失败' -ForegroundColor Red
    pause
    exit
}
$book_name = $book_name -replace '\<|\>|\?|\*|\:|\||\/|\\',' '
#图书目录
$menus = $null
if($resp.Content -match '(?s)var zNodes=(\[{.*}\]);'){
    $menus = $Matches[1] | ConvertFrom-Json
    $menus = $menus | Sort-Object { $arr=$_.url -split '=';return ([int]$arr[$arr.Length-1]) }
}
#图书id
$book_id = $book_url -split '=' | Select-Object -Last 1
#服务器ip和端口
$server_ip = '159.226.241.32'
$server_port = 81
#默认用户id
$default_user = '825ae171eb514934b1ed2374976f4a9f'
#文档编号
$doc_num_api = 'https://wkobwp.sciencereading.cn/api/file/add'
$resp = Invoke-WebRequest -UseBasicParsing -Method Post -Uri $doc_num_api -Headers @{
    'accessToken' = 'accessToken'
    'Content-Type' = 'application/x-www-form-urlencoded; charset=UTF-8'
} -Body (
    'params=%7B%22params%22%3A%7B%22userName%22%3A%22Guest%22%2C%22userId%22%3A%22{0}%22%2C%22file%22%3A%22http%3A%2F%2F{1}%3A{2}%2F{3}.pdf%22%7D%7D&type=http' -f $default_user,$server_ip,$server_port,$book_id
)
$book_number = ($resp.Content -split '"')[3]
#获取图片总数量
$img_count = $null
while($img_count -eq $null){
    $resp = Invoke-WebRequest -UseBasicParsing -Uri ('https://wkobwp.sciencereading.cn/asserts/{0}/manifest?language=zh-CN' -f $book_number)
    $json = [System.Text.UTF8Encoding]::UTF8.GetString($resp.Content) | ConvertFrom-Json
    $json = $json.docinfo | ConvertFrom-Json
    $img_count = $json.PageCount
}
#下载资源
function Get-FullDir($menus,$menu){
    if($menu.pId -eq '0') {
        return ('.\{0}\{1}' -f $book_name,$menu.name)
    }
    $p_menu = $menus | Where-Object { $_.id -eq $menu.pId }
    $p_dir = Get-FullDir -menus $menus -menu $p_menu
    return ('{0}\{1}' -f $p_dir,$menu.name)
}
function Download-Image($start_page,$end_page,$save_dir){
    [void][System.IO.Directory]::CreateDirectory($save_dir)
    $img_url = 'https://wkobwp.sciencereading.cn/asserts/{0}/image/{1}/100?accessToken=accessToken&formMode=true'
    for($i=$start_page; $i -le $end_page; $i++){
        $url = $img_url -f $book_number,($i-1)
        $png = '{0}\{1}-{2:000}.png' -f $save_dir,[System.IO.Path]::GetFileName($save_dir),$i
        $png
        while($true){
            Write-Host ('{0}/{1}' -f $i,$img_count) -ForegroundColor Yellow
            try{
                $resp = Invoke-WebRequest -UseBasicParsing -Uri $url
                [System.IO.File]::WriteAllBytes($png,$resp.Content)
                [System.IO.Path]::GetFileName($png)
                break
            } catch {
                Start-Sleep -Seconds 1  
            }
        }
    }
}
if($menus -ne $null){
    #创建主目录
    [void][System.IO.Directory]::CreateDirectory($book_name)
    $last_dir = '.\' + $book_name + '\封面'
    #遍历子目录
    $last_page = 1
    $menus | foreach {
        #当前目录
        $cur_dir = Get-FullDir -menus $menus -menu $_
        $cur_dir = $cur_dir -replace '\:|\?|\*|\"|\<|\>|\|','' -replace "'",''
        [void][System.IO.Directory]::CreateDirectory($cur_dir)
        #当前目录图片开始索引
        $arr = $_.url -split '='
        $cur_page = [int]$arr[$arr.length-1]
        #下载目录对应的图片
        Write-Host $last_dir
        Write-Host ('pages: [{0}-{1}]' -f $last_page,$cur_page) -ForegroundColor Yellow
        Download-Image -start_page $last_page -end_page $cur_page -save_dir $last_dir
        #目录对应图片下载完成,设置初始化图片索引
        $last_page = $cur_page
        $last_dir = $cur_dir
        '-------------'
    }
    #下载剩余所有图片到最后一个目录
    Write-Host $last_dir
    Write-Host ('pages: [{0}-{1}]' -f $last_page,$img_count) -ForegroundColor Yellow
    Download-Image -start_page $last_page -end_page $img_count -save_dir $last_dir
}
Write-Host '全部下载完成' -ForegroundColor Green
pauseCOPY

TOP

6666666666666666666666666666666

TOP

返回列表