Botopedia/Chatlog 11 September 2005
Appearance
Session Start: Mon Sep 12 01:43:27 2005 Session Ident: #wikidatadiscussion [01:43] * Now talking in #wikidatadiscussion [01:43] * kornbluth.freenode.net sets mode: +ns [01:43] * waerth has joined #wikidatadiscussion [01:43] * Ucucha has joined #wikidatadiscussion [01:43] * CyeZ sets mode: +oo Ucucha waerth [01:43] <Ucucha> hoi :) [01:43] <CyeZ> hoi hoi [01:44] * daniel-fpc has joined #wikidatadiscussion [01:44] <CyeZ> ik blijf hier wel rondhangen om na te lezen wat er besproken is. Maar ga nu toch eerst maar eens slapen. [01:44] <CyeZ> inmiddels al bijna 2u hier [01:44] <waerth> slaapze CyeZ [01:44] <Ucucha> welterusten CyeZ [01:44] <CyeZ> trusten [01:44] * CyeZ is now known as CyeZzZz [01:45] * Ausir has joined #wikidatadiscussion [01:45] <Ucucha> hi Ausir [01:45] * waerth goes to study the toilet for 10 minutes [01:45] * Datrio has joined #wikidatadiscussion [01:45] <Ucucha> hi Datrio [01:45] * tsca has joined #wikidatadiscussion [01:45] <Datrio> hey, hey [01:46] * elian has joined #wikidatadiscussion [01:46] * TOR_CNR has joined #wikidatadiscussion [01:46] * yannf has joined #wikidatadiscussion [01:47] <yannf> hi all [01:47] <Ausir> hi [01:47] <tsca> hi [01:47] <Ucucha> hi tsca, elian, TOR_CNR, yannf :) [01:47] <yannf> hi Ucucha [01:47] <TOR_CNR> hello [01:48] <yannf> Ucucha, where are you from ? which project are you working on ? [01:48] <Ucucha> nl.wikipedia and wikispecies a bit [01:48] <yannf> ok [01:48] <Ucucha> I think wikidata will be interesting for taxonomic information [01:48] <yannf> yes, sure [01:49] <TOR_CNR> isn't wikispecies doing that? [01:49] <Ausir> well, it's not actually a discussion about wikidata [01:50] <Ausir> (the Wikidata, the new software project) [01:50] <elian> can someone tell us what it's about then? [01:50] <Ausir> the channel name isn't very fortunate - it's about an international project for generating articles about towns of the world in wikipedias from statistical data [01:51] <Ausir> many wikipedias already do it, but it's not very coordinated and they rarely share the data [01:51] <Ausir> well, I suppose it will eventually be integrated into wikidata [01:51] <Ausir> and just taken from the common database by all wikipedias [01:51] <waerth> I know the name might not be optimally choosen [01:51] <waerth> it is what I came up with in a flash [01:51] <waerth> sorry [01:51] <Ausir> #wikitowns , maybe? [01:52] <waerth> want to move everyone there? [01:52] <Ausir> well, not really [01:52] <Ausir> since we're already here anyway [01:52] <tsca> who cares how the channel is called [01:52] <Ausir> tsca: the people who thought it's about wikidata :) [01:52] <tsca> oh [01:53] * waerth changes topic to 'the name of this channel is unfortunately choosen. this is a discussion about an international proct for generating articles about towns of the world in wikipedias from statistical data�' [01:54] * waerth changes topic to 'The name of this channel is unfortunately choosen, the discussion is not about the wikidata project. This is a discussion about an international proct for generating articles about towns of the world in wikipedias from statistical data�' [01:54] * waerth changes topic to 'The name of this channel is unfortunately choosen, the discussion is not about the wikidata project. This is a discussion about an international project for generating articles about towns of the world in wikipedias from statistical data�' [01:54] <tsca> is this a one-time conference or is this channel here to stay? [01:54] <waerth> that should be correct [01:54] <waerth> one time channe; [01:54] <waerth> l [01:54] <waerth> as far as i am concerned [01:54] <waerth> others might feel differently ;) [01:55] <Ausir> well, this channel could be useful in the future as well [01:55] <waerth> yes [01:56] * WikiWichtel has joined #wikidatadiscussion [01:56] <waerth> but I am afraid there is a nameconvention rule for wikipediachannels ausir [01:56] <waerth> at least someone told me that once [01:56] <Ausir> waerth: towns.wikipedia ? [01:56] * WikiWichtel has left #wikidatadiscussion [01:57] <waerth> something like that ausir [01:57] <tsca> wikicities :-) [01:57] <waerth> no ;) [01:57] <waerth> it is a shame dannyisme isn't here yet [02:00] <tsca> who chairs this meeting? [02:00] <waerth> well I started it [02:01] <waerth> so if no-one objects ;) [02:01] <waerth> I have it one minute before one here [02:02] <waerth> which makes it almost 20.00 cet [02:02] <waerth> so there is a few issies in my opinion [02:02] <waerth> where to get the data from [02:02] <waerth> what data to consider reliable [02:03] <Ausir> waerth: mostly data from government websites [02:03] <waerth> where to store it in the projects (meta?) untill a software solution is found [02:03] <waerth> yes ausir lets start on the first point ;) [02:03] <Ausir> Polish Regioset website is not made by the government, IIRC, but also quite reliable, though [02:04] <waerth> ok [02:04] * Anthere has joined #wikidatadiscussion [02:04] <Ausir> waerth: well, we could always send the data to commons in open office file format, since commons already accepts those [02:04] <Ucucha> hi Anthere [02:04] <waerth> yes [02:04] <Ausir> hi Anthere [02:05] <waerth> it should be in a file format that is easily readable by robots [02:05] <waerth> as they would mainly use it [02:05] <elian> XML [02:05] <elian> ? [02:05] <Ausir> well, just a text file, probably, but with sxv extension, so that it's uploadable to commons [02:05] <waerth> you mean with ; ? [02:05] <elian> why not put the data on commons? [02:05] <elian> and let the bots gather the data and create the articles [02:05] <waerth> comma seperated files right ? [02:05] <Ausir> it'd be good if we made a standardized format for those files [02:06] <elian> seems better for updates [02:06] <tsca> why upload when you can give links to where the data is on the Web? [02:06] <waerth> because we would want the data with us [02:06] <Ausir> tsca: but it's not always available in English, or even available at all [02:06] <tsca> the collection of data can be (c), can't it? [02:06] <Ausir> tsca: and the files could be compiled from various sources [02:06] <Ausir> true [02:06] <waerth> they are if you implement it as is [02:07] <waerth> if you take it and put it into an article it is not [02:07] <waerth> it are plain facts [02:07] <waerth> otherwise all our country/municipality etc articles are copyrightviolations [02:08] <tsca> so, no Commons. [02:08] <tsca> just a collection of links [02:08] <waerth> anyway I wanted to start the discussion with what kind of data do we want to use ? [02:08] <Ausir> tsca: well, sometimes it's a pain in the ass to download and parse the files info into a decent format [02:09] <Ucucha> should we only discuss towns here or also other data? [02:09] <waerth> for some countries like the netherlands data from all villages up to the smallest ones is available online for free [02:09] <Ausir> so if we had them already in a format that can be used easily by the bots, it'd be easier [02:09] <waerth> for other countries you have to pay [02:09] <Anthere> hi [02:09] <Ausir> Ucucha: like what kind of data? [02:09] <Ausir> waerth: well, it depends on the country [02:09] <Ucucha> biological [02:09] <Ucucha> there are databases about insects, for example [02:09] <tsca> yeah sure, we can just as well generate xml files, aside from generating the articles [02:09] <Ucucha> maybe about planetoids or so [02:10] <Ausir> Ucucha: true [02:10] <Ausir> we'll be generating articles about all Polish MPs out of the data from the parliament website at pl: :) [02:10] <waerth> ok but lets focus on one area for now [02:10] <waerth> otherwise it gets to splintered [02:10] <Ausir> but let's focus on the towns firs [02:11] <waerth> I feel that at meta we could start a page where we collect links per country [02:11] <Ausir> yeah [02:11] <waerth> were we can find the official data [02:11] <waerth> with the emphasis on official [02:11] * dittaeva has joined #wikidatadiscussion [02:11] <Ucucha> hi dittaeva [02:12] <waerth> and we would put notes nect to it per country [02:12] <Ausir> waerth: or from other reliable institutions, not necessarily government [02:12] <dittaeva> hi [02:12] <waerth> ausir I am reluctant to take other sources [02:12] <waerth> because you would need a good verification where they got their data from [02:12] <dittaeva> are we early (I'm missing Eloquence)? [02:13] <dittaeva> is there a log? [02:13] * dittaeva changes topic to 'The discussion is not about the wikidata project. This is a discussion about an international project for generating articles about towns of the world in wikipedias from statistical data�' [02:14] <Ucucha> dittaeva: I'm logging now [02:14] <tsca> I can publish the log later [02:14] * dittaeva changes topic to 'This channel is currently not about the wikidata project. This is a discussion about an international project for generating articles about towns of the world in wikipedias from statistical data�' [02:14] <tsca> ok [02:15] <dittaeva> yeah, did I miss much? [02:15] <Ausir> waerth: well, Regioset is a Polish database of regional information, and it's not official, but rather reliable - anyway, we should have references for every bit of data exactly to what source it comes from [02:18] * lode has joined #wikidatadiscussion [02:19] <tsca> we need some focus in this discussion, it's going nowhere :-) [02:19] <Ausir> and their database is based on official sources anyway, it's just those sources are not available on-line... so I'd say official sources should be preferred [02:19] <Ausir> but not necessarily only official - just verifiable [02:21] <dittaeva> is there a page on meta for the project? [02:21] <Ausir> where's dannyisme? :( [02:21] <Ausir> dittaeva: we're going to create one [02:21] * gpvos has joined #wikidatadiscussion [02:22] <Ausir> we're just thinking now about what should the project be like... [02:22] <tsca> my suggestion is: [02:23] <tsca> bot operators who create series of aricles, create xml files at the same time (for other bot operators) and put the files online [02:23] <tsca> then the files are announced on some ml. [02:23] <tsca> that is all... [02:23] <Ausir> tsca: or the meta project page [02:24] <Anthere> here is a message from waerth [02:24] <Anthere> though he appears to be there, he is not [02:24] <Anthere> he was disconnected [02:25] <tsca> let's agree on the title of the page so that we can add it to our watchlists [02:25] <Anthere> he phoned his internet company and they said they were doing maintenance tonight [02:25] <Anthere> he just called me to tell me [02:25] <Ucucha> that's a big pity :( [02:25] <Anthere> so, he apologies very much, but he wont be with you [02:25] <Anthere> could someone at least log the discussion for him ? [02:25] <Ausir> well, discussing it is sort of pointless anyway [02:26] <Ucucha> he was just announced to be the chair [02:26] <Ausir> someone should {{be bold}} and just create the project page:) [02:26] <Ucucha> someone willing to be the new chair? [02:26] <Ausir> and then we'll discuss it and improve it [02:27] <elian> Wikitowns? [02:27] <Datrio> Wikibots [02:27] <elian> no [02:27] <Ausir> no [02:27] <elian> bots are just the tools [02:27] <Ucucha> not wikitowns [02:27] <Datrio> well, it would be the best to eventually get the databases for everything [02:27] <Datrio> not only towns [02:27] <Ucucha> we shouldn't limit it to towns [02:27] <Datrio> so not wikitowns [02:27] <Ausir> Bot-generated articles [02:27] <Datrio> heh [02:27] <Datrio> Botopedia [02:27] <elian> Botbase? [02:27] <Ucucha> yeah [02:27] <elian> Botopedia [02:27] <elian> *lol* [02:28] <Datrio> "A free encyclopedia that YOU can edit. If you're a bot, that is." [02:28] <elian> my boyfriend wanted a LanguageBot.php for his bot, but brion refused [02:28] <Ausir> [[Database sharing project]] [02:29] <elian> Ausir: too boring [02:29] <Ausir> elian: but more descriptive :) [02:29] <Ucucha> Botbase seems a good idea :) [02:29] <Ausir> and anti-botters are going to protest anyway :) [02:30] <Ausir> just like they did ever since rambot doubled en: overnight [02:30] <tsca> call it [[asdrfgu3a,c a45:"qad]], a good bot name, and create some redirs [02:31] <lode> when it is on a different db/wp you loose the fact that people who see those stubs edit it to something more [02:31] <tsca> OK, what can you expect on that page: [02:31] <tsca> 1. ~8000 Italian municipalities [02:31] <Datrio> [[$bot->new("Pedia")]] [02:31] <Ausir> a list of all such projects [02:32] <Ausir> in all wikipedias [02:32] <tsca> 2. Swedish/Norwegian/Danish municupalites [02:32] <Ausir> who to contact etc. [02:32] <tsca> 3. French (?towns) [02:32] <Ausir> yeah, towns [02:32] <tsca> what else? [02:32] <Ausir> and villages [02:32] <Ucucha> 4. Fish [02:32] <Ausir> tsca: Rambot [02:32] <Ucucha> from FishBase, possibly [02:32] <tsca> Ucucha: Fish are problematic [02:32] * waerth has quit IRC (No route to host�) [02:32] <tsca> they have local names [02:33] <Ucucha> yes, but many just have scientific names ;) [02:33] <tsca> unless one finds a digital dictionary/glossary... [02:33] <Ausir> tsca: and of course, Polish towns/communes/counties/voivodships [02:33] * Kristof has joined #wikidatadiscussion [02:33] <Ucucha> hoi Kristof [02:33] <Kristof> hoi all [02:33] <Datrio> don't forget Pokemons from en [02:33] <Ucucha> tsca: probably, most species don't have common names at all [02:33] <Ausir> and Czech disctricts [02:34] <Ausir> Ucucha: but you can never be sure, at least for many languages [02:34] <Ausir> Ucucha: towns are easier [02:35] <Ucucha> they are [02:35] <Ucucha> but we don't have to do easy things only [02:35] <Ausir> but your other proposal, asteroids, was interesting :) [02:35] <Ausir> maybe we could find a good NASA database for generating those? [02:39] <Ucucha> http://nssdc.gsfc.nasa.gov/planetary/factsheet/asteroidfact.html [02:39] <Ucucha> a few [02:40] * henna has joined #wikidatadiscussion [02:42] * lode has left #wikidatadiscussion [02:42] <yannf> i think that, quite often, the data need processing before being useable by bots [02:43] <yannf> so it's better doing that only once [02:43] <yannf> not for every languages [02:44] <tsca> sure [02:44] <tsca> we should work out some standard format [02:44] <yannf> yes, that's an important part of this project [02:46] * GerardM has joined #wikidatadiscussion [02:47] <yannf> i can help working on some conversion tools [02:48] <yannf> and defining a common format [02:48] * TOR_CNR has left #wikidatadiscussion [02:49] * henna is now known as hennaNoInternet [02:49] <tsca> yes, please publish the format proposal on meta so that we can discuss it [02:51] <yannf> actually, i am not a bot expert [02:53] <yannf> but formating data to a defined format interests me [02:54] <yannf> i think something like CSV would be appropriate [02:54] <yannf> csv = comma separated values [02:57] <tsca> either this or xml [02:57] <tsca> but field names must be standardised [02:58] <tsca> so that we don't need to modify out bots all the tile [02:58] <tsca> *time [02:58] <dittaeva> csv would perhaps be easier for people to get/make [02:59] <dittaeva> but we are planning to put the data on commons once wikidata is ready and works for the purpose, right? [03:00] <yannf> i would propose upload the files to commons [03:00] <tsca> why not meta? [03:01] <yannf> either is ok for me [03:01] <tsca> it's just like the logfiles for interwiki bot operators; no need to put them on commons [03:02] <yannf> ok [03:05] <yannf> so first we need a list of sources [03:05] <tsca> yeah [03:06] <yannf> waerth already made one [03:07] <dittaeva> so where is it? [03:07] <dittaeva> :-) [03:07] <yannf> http://nl.wikipedia.org/wiki/Gebruiker:Waerth/Handigelinksvooriedereen [03:15] * tsca is now known as tsca_away [03:33] * Ucucha has quit IRC ("Chatzilla 0.9.68.5 [Firefox 1.0.6/20050717]"�) [04:05] * Kristof has quit IRC (Read error: 110 (Connection timed out)�) [04:06] * dittaeva has quit IRC (Read error: 110 (Connection timed out)�) [04:06] * dittaeva has joined #wikidatadiscussion [04:07] <dittaeva> http://meta.wikimedia.org/wiki/Botopedia [04:07] * dittaeva changes topic to 'This channel is currently not about the wikidata project. This is a discussion about an international project for generating articles about towns of the world in wikipedias from statistical data. http://meta.wikimedia.org/wiki/Botopedia�' [04:29] * dittaev1 has joined #wikidatadiscussion [04:32] * dittaeva has quit IRC (Read error: 110 (Connection timed out)�) [04:44] * gpvos has left #wikidatadiscussion [04:53] * tsca_away is now known as tsca [05:04] * Datrio has quit IRC [05:28] * dittaev1 is now known as dittaeva [05:32] <Ausir> dittaeva: I edited it a bit [05:32] <Ausir> dittaeva: I made a list of existing projects like that [05:33] * tsca has quit IRC (Read error: 110 (Connection timed out)�) [05:34] <dittaeva> nice [05:37] <dittaeva> Ausir: you're the one who has been operating the bot at the polish wikipedia? Where did you get the norwegian communes data? [05:38] <Ausir> tsca is operating the bot [05:38] <Ausir> I'm mostly working on the data - but tsca got the scandinavian data himself [05:38] <dittaeva> ok, too bad he went off zzzzz (I suppose) [05:38] <Ausir> from the norwegian authorities website [05:39] <Ausir> some statistical office [05:39] <Ausir> from here: http://www.ssb.no/ and here: http://www.kommunenokkelen.no/ [05:39] <Ausir> see: http://pl.wikipedia.org/wiki/Finn%C3%B8y [05:40] <Ausir> a sample article [05:40] <dittaeva> thanks! [05:40] <Ausir> one has statistics about population etc. [05:41] <Ausir> and the other has addresses and stuff like names of the mayors etc. [05:41] <dittaeva> cool [05:41] <dittaeva> real cool [05:41] <dittaeva> awesome [05:41] <dittaeva> :-) [05:42] <Ausir> we now have more data on Norwegian towns than no: :P [05:42] <dittaeva> you should probably have appended them all with the equivalent of "kommune" in polish cause most names also refer to things that are not kommunes [05:43] <Ausir> like what? [05:43] <dittaeva> and there's a lot more at ssb.no [05:43] <Ausir> well, the equivalent of kommune in Polish is gmina [05:43] <Ausir> dittaeva: what else is there? [05:44] <Ausir> oh, I can see there's a lot of it there [05:44] <Ausir> although I don't think much of it is needed in a wikipedia article [05:46] <dittaeva> yeah I just looked at it, you'd probably have to work a lot with the framework to make articles from the rest of the data [05:46] <dittaeva> f. ex. [[pl:Leikanger]] is not only a kommune it is also a "township" [05:47] <dittaeva> and Vik is a kommune and a lot of other places [05:48] <Ausir> well, at pl: it says in Polish "Leikanger is a town and a commune in Norway..." [05:48] <dittaeva> ok, :-) [05:49] <Ausir> although if we had enough content or stats for the towns themselves, we could have separate articles for "Leikanger commune" and "Leikanger", I suppose [05:50] <Ausir> like we have for Polish towns and communes [05:50] <dittaeva> hm, it might be in there somewhere [05:50] <dittaeva> at least there is inhabitant statistics for them, but I suppose you need something more [05:51] <Ausir> well, we don't need to separate it for now :) [05:52] <dittaeva> does tsca have the same username on wikipedia too, should I contact him on pl.? [05:53] <Ausir> yes [05:54] <Ausir> he's also active as sv: and da: [05:54] <Ausir> *at [05:54] <Ausir> he lives in Denmark [05:55] <dittaeva> cool [05:59] <Ausir> I suppose it's fairly easy to translate articles from one Scandinavian Wikipedia to another? [06:00] <dittaeva> yes, its regularily done [06:04] <Ausir> tsca is online at #pl.wikipedia [06:20] * dittaeva has quit IRC (Read error: 110 (Connection timed out)�) [07:00] * GerardM has quit IRC ("Chatzilla 0.9.68a [Firefox 1.0.6/20050716]"�) [07:00] * Ausir has quit IRC (Read error: 104 (Connection reset by peer)�) [07:00] * Ausir has joined #wikidatadiscussion