DucDigital for ( $girl = 1; $girl < $required; $girl++ ) { echo “I love DucDigital”; }

24Nov/091

Massive download from Pixiv?

Recently my friend in 2ch irc asked me to write a short snippet for him to get files from Pixiv, a imageboard, in Python. Since I've experienced with web parsing and other imageboard like danbooru, i accept his request.

Update (Jan 13) : Someone name Nadaka post on his/her blog a upgraded version of my code, which is cool, so I gonna post a link to him/her so that you can check out: Here!

After few hours of parsing and looking at the webpage XHTML tree, this is my result. This is a base for download 1 or more files. If ever anyone of you have edit to give it more functionality , i'd love to know :)

PS: you need to get mechanize and BeautifulSoup before using this code.

import re
import os
import mechanize
import urllib2
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
 
#-----------Initialize info
username = 'xxxx'
password = 'xxxx'
url = 'http://www.pixiv.net/'
 
br = Browser()
 
linklist = []
re1='(member_illust\\.php)'	# File Name 1
re2='(\\?)'	# Any Single Character 1
re3='((?:[a-z][a-z]+))'	# Word 1
re4='(=)'	# Any Single Character 2
re5='((?:[a-z][a-z]+))'	# Word 2
big='(big)'	# Word 2
med='(medium)'	# Word 2
re6='(&)'	# Any Single Character 3
re7='(illust_id)'	# Variable Name 2
re8='(=)'	# Any Single Character 4
re9='(\\d+)'	# Integer Number 1
#-----------End info
 
bigrg = re.compile(re1+re2+re3+re4+big+re6+re7+re8+re9,re.IGNORECASE|re.DOTALL)
medrg = re.compile(re1+re2+re3+re4+med+re6+re7+re8+re9,re.IGNORECASE|re.DOTALL)
 
br.set_handle_equiv(True)
#br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(True)
br.visit_response
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
 
#-----------Clear Screen
# for i in range(60):
#     print
#-----------End clear screen
 
#-----------Login Steps
print "Start login \t\t",
req = urllib2.Request(url)
br.open(req)
form = br.select_form(nr=0)
br["pixiv_id"] = username
br["pass"] = password
response = br.submit()  # LOGIN
print "DONE!"
#-----------End login
 
#-----------Start Download, input: ID, type: VOID, save to disk
def download(id):
    print "Getting file id: " + str(id) + "\t\t",
    br.open("http://www.pixiv.net/member_illust.php?mode=medium&amp;illust_id="+str(id))
    c=0
    for link in br.links(url_regex=bigrg):
            viewPage = br.follow_link(link)
            parser = BeautifulSoup(viewPage.read())
            imgFile = parser('img')[0]['src']
            fileName = os.path.basename(imgFile)
            req = urllib2.Request(imgFile)
            req.add_header("Referer", viewPage.geturl())
            response = br.open(req)
    try:
        saving = file(fileName, "wb+")
        saving.write(response.read())
        saving.close()
        print "DONE!"
    except:
        print "FAILED!" + str(c)
#-----------End Download
 
download(7133932)
download(7667)
  • Share/Bookmark
Comments (1) Trackbacks (0)

Leave a comment


No trackbacks yet.