24Nov/091
Massive download from Pixiv?
Recently my friend in 2ch irc asked me to write a short snippet for him to get files from Pixiv, a imageboard, in Python. Since I've experienced with web parsing and other imageboard like danbooru, i accept his request.
Update (Jan 13) : Someone name Nadaka post on his/her blog a upgraded version of my code, which is cool, so I gonna post a link to him/her so that you can check out: Here!
After few hours of parsing and looking at the webpage XHTML tree, this is my result. This is a base for download 1 or more files. If ever anyone of you have edit to give it more functionality , i'd love to know
PS: you need to get mechanize and BeautifulSoup before using this code.
import re import os import mechanize import urllib2 from mechanize import Browser from BeautifulSoup import BeautifulSoup #-----------Initialize info username = 'xxxx' password = 'xxxx' url = 'http://www.pixiv.net/' br = Browser() linklist = [] re1='(member_illust\\.php)' # File Name 1 re2='(\\?)' # Any Single Character 1 re3='((?:[a-z][a-z]+))' # Word 1 re4='(=)' # Any Single Character 2 re5='((?:[a-z][a-z]+))' # Word 2 big='(big)' # Word 2 med='(medium)' # Word 2 re6='(&)' # Any Single Character 3 re7='(illust_id)' # Variable Name 2 re8='(=)' # Any Single Character 4 re9='(\\d+)' # Integer Number 1 #-----------End info bigrg = re.compile(re1+re2+re3+re4+big+re6+re7+re8+re9,re.IGNORECASE|re.DOTALL) medrg = re.compile(re1+re2+re3+re4+med+re6+re7+re8+re9,re.IGNORECASE|re.DOTALL) br.set_handle_equiv(True) #br.set_handle_gzip(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(True) br.visit_response br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] #-----------Clear Screen # for i in range(60): # print #-----------End clear screen #-----------Login Steps print "Start login \t\t", req = urllib2.Request(url) br.open(req) form = br.select_form(nr=0) br["pixiv_id"] = username br["pass"] = password response = br.submit() # LOGIN print "DONE!" #-----------End login #-----------Start Download, input: ID, type: VOID, save to disk def download(id): print "Getting file id: " + str(id) + "\t\t", br.open("http://www.pixiv.net/member_illust.php?mode=medium&illust_id="+str(id)) c=0 for link in br.links(url_regex=bigrg): viewPage = br.follow_link(link) parser = BeautifulSoup(viewPage.read()) imgFile = parser('img')[0]['src'] fileName = os.path.basename(imgFile) req = urllib2.Request(imgFile) req.add_header("Referer", viewPage.geturl()) response = br.open(req) try: saving = file(fileName, "wb+") saving.write(response.read()) saving.close() print "DONE!" except: print "FAILED!" + str(c) #-----------End Download download(7133932) download(7667)
January 23rd, 2010 - 00:13
updated