更新:2016-12-07
kankindle.com官网改版了,这个版本已经无法下载了,新版本正在更新中…
写了一个python脚本下载看kindle(kankindle.com)的所有电子书,程序会自动下载首页部分13页的所有电子书,下载到ebook目录下,程序会检测是否下载过,程序更新时间是20160421
(大家没事还是别全下载了,下载下来也不一定会看,啥时候想看下载也一样)、
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
#!/usr/bin/env python # coding=utf-8 #www.503error.com #20160421 from bs4 import BeautifulSoup import urllib2 import socket import re import unicodedata import os from urwid.text_layout import trim_line def download(url): print 'starting download %s' % url response=urllib2.urlopen(url,timeout=30) html_data=response.read() soup=BeautifulSoup(html_data) print 'start to analayse---------------' title_soup=soup.find_all(class_='yanshi_xiazai') name_soup = soup.find_all('h1') tag_a = title_soup[0].a.attrs['href'] tag_name= title_soup[0].a.contents link_name = name_soup[0] link_name = str(link_name).replace("<h1>","").replace("</h1>","") #print tag_name[0] #print link_name filename = link_name+".mobi" filename = "ebook/"+filename print 'filename is :%s' % filename print "downloading with urllib2 %s" % tag_a if os.path.exists(filename): print 'already donwload ,ignore' else: try: f = urllib2.urlopen(tag_a,timeout=60) data = f.read() #print 'the data is %s'% data with open(filename, "wb") as code: code.write(data) except Exception,e: print e def get_all_link(url): print 'Starting get all the list' response=urllib2.urlopen(url,timeout=30) html_data=response.read() #print html_data soup=BeautifulSoup(html_data) link_soup = soup.find_all('a') #print link_soup for each_link in link_soup: if re.search('view',str(each_link)): #print each_link print each_link print each_link.attrs['href'] download(each_link.attrs['href']) if __name__ == '__main__': for page in range(1,13): url = "http://kankindle.com/simple/page/3"+str(page) url = url.strip() print url get_all_link(url) |
Latest posts by Zhiming Zhang (see all)
- aws eks node 自动化扩展工具 Karpenter - 8月 10, 2022
- ReplicationController and ReplicaSet in Kubernetes - 12月 20, 2021
- public key fingerprint - 5月 27, 2021
11是 2019/03/14 14:42
ewf