Talk:List of Wikipedias by sample of articles/Source code (original)
From Meta, a Wikimedia project coordination wiki
< Talk:List of Wikipedias by sample of articles(Redirected from Talk:List of Wikipedias by sample of articles/Source code)
[edit] Modified source
Suggestions:
include .encode('cp437','replace') whenever printing to console to avoid errors- optimize by caching English pages
- remove interwiki text for article length calculation
- weight text length
- color code score
--MarsRover 11:05, 2 December 2007 (UTC)
[edit] Modifying source
I was looking at modifying this program for my own use (namely, directing it towards a different page; for example, Vital Articles / Extended, a specific wikiproject's topic list, or a specific topic outline's list. Who would be the right person to ask about doing such? Almafeta 05:50, 1 October 2009 (UTC)
- Smeira is the original author but he has been missing for couple of years. I could probably help. I've been working on code to create a extended article list (see below). It may need some tweaking for your needs but it can read from the lists you mentioned. --MarsRover 07:36, 1 October 2009 (UTC)
- I've been working on that (apparently my installation of Python had... issues), and finally have it working with the groups I'm interested in. Thank you. =)
- Also, it's too bad Smeira's gone... it occurs to me that the original was probably the most significant piece of code ever written in Volapük. Almafeta 16:49, 26 October 2009 (UTC)
[edit] GetExtendedArticleList.py
# -*- coding: utf_8 -*- import sys sys.path.append('./pywikipedia') import wikipedia import pagegenerators import re entry_re = re.compile(r"([\*|#]+)(\s*)('*)\[\[([^\]]+)\]\](\s*)\(?(\[\[([^\]]+)\]\])?\)?") link_re = re.compile(r'(:?([a-z\-]+):)?([^\]\|:]+)(\|([^\]]+))?') def parseEntry(line): m = entry_re.search(line) if m: return {'name':m.group(4),'sibling':m.group(7),'indent':len(m.group(1)),'span':m.span()} def parseLink(link, wiki_name): m = link_re.search(link) if m: if m.group(2): linkWiki = m.group(2) else: linkWiki = wiki_name return {'wiki':linkWiki,'name':m.group(3),'alias':m.group(5)} def findAll(text, parseFunction): return_list = [] pos = 0 item = parseFunction(text) while item: pos = pos + item['span'][1] item['pos'] = pos del item['span'] return_list.append(item) item = parseFunction(text[pos:]) return return_list def getArticle(wiki_name, wiki_family, article_name): print "reading %s" % (article_name) wiki = wikipedia.Site(wiki_name, wiki_family) page = wikipedia.Page(wiki, article_name) article_text = page.get(get_redirect=False) return {'text':article_text} def getArticleList(wiki_name, wiki_family, article_name): article = getArticle(wiki_name, wiki_family, article_name)['text'] arts = findAll(article, parseEntry) for art in arts: art['link'] = parseLink(art['name'], wiki_name) return arts print "working..." lists = {} lists[':en:Wikipedia:Vital articles/Expanded/People'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/People') lists[':en:Wikipedia:Vital articles/Expanded/History'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/History') lists[':en:Wikipedia:Vital articles/Expanded/Geography'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/Geography') lists[':en:Wikipedia:Vital articles/Expanded/Arts'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/Arts') lists[':en:Wikipedia:Vital articles/Expanded/Philosophy and religion'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/Philosophy and religion') lists[':en:Wikipedia:Vital articles/Expanded/Everyday life'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/Everyday life') lists[':en:Wikipedia:Vital articles/Expanded/Society and social sciences']= getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/Society and social sciences') lists[':en:Wikipedia:Vital articles/Expanded/Health and medicine'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/Health and medicine') lists[':en:Wikipedia:Vital articles/Expanded/Science'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/Science') lists[':en:Wikipedia:Vital articles/Expanded/Technology'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/Technology') lists[':en:Wikipedia:Vital articles/Expanded/Mathematics'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/Mathematics') lists[':en:Wikipedia:Vital articles/Expanded/Measurement'] = getArticleList('en', 'wikipedia','Wikipedia:Vital articles/Expanded/Measurement') lists[':m:List of articles every Wikipedia should have/Version 1.1'] = getArticleList('meta','meta', 'List of articles every Wikipedia should have/Version 1.1') lists[':en:Films considered the greatest ever'] = getArticleList('en', 'wikipedia','Films considered the greatest ever') lists[':en:Outline of biology'] = getArticleList('en', 'wikipedia','Outline of biology') print "merge lists..." fullList = [] for l in lists: for i in lists[l]: ok = True for fli in fullList: if fli.lower() == i['link']['name'].lower(): ok = False break if ok: fullList.append(i['link']['name']) print len(fullList) print "sorting..." sortedFullList = sorted(fullList, lambda a,b: cmp(a.lower(),b.lower())) for i in sortedFullList: print i