維基百科之資料庫鎖定事件

出自 ProgWiki
前往: 導覽搜尋

時間

  • 2008年10月12日21:18(UTC+8, 台北時間)開始
  • 2008年10月13日02:10(UTC+8, 台北時間)前結束
  • 歷時約5個小時

現象

  • 編輯條目時,出現
  1. 「警告:數據庫被鎖以進行維護,所以您目前將無法保存您的修改。您或許可以先將有關的編輯內容複製並保存為文本文件,然後等一會兒再修改。」(zh.wikipedia.org)
  2. 或「Database locked」(commons.wikimedia.org)

影響所及

bgwiki
bgwiktionary
commonswiki
cswiki
dewiki
enwikiquote
enwiktionary
eowiki
fiwiki
idwiki
itwiki
nlwiki
nowiki
plwiki
ptwiki
svwiki
thwiki
trwiki
zhwiki

原因

  1. 有人在commons.wikimedia.org上運行bot,導致DB出現巨量的錯誤Log,因此s2 master的硬碟空間被塞滿了。
  2. 有人同時在更新MySQL的版本?(這個不確定是不是我看錯了?英文好的人請幫忙看下面的IRC Log)
a script which checks which files are in use, adds an appropriate safety margin, then runs a PURGE MASTER LOGS query to delete any logs before that and a patch for mysql that makes transactions block when they try to write to the binlog but the disk is full.

IRC Log

#wikimedia-tech
	[INFO]	Channel view for “#wikimedia-tech” opened.
	-->|	YOU (p1ayer) have joined #wikimedia-tech
	=-=	Topic for #wikimedia-tech is “Wikimedia servers administration | Disk full on ixia | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	=-=	Topic for #wikimedia-tech was set by TimStarling on 2008年10月12日 下午 09:38:34
	<Platonides>	strange that talk namespaces were accessing commons and other not :P
	<Platonides>	some wiki-gnome...
	-->|	mik____ (n=chatzill@p4FCF7439.dip.t-dialin.net) has joined #wikimedia-tech
	-->|	Muscari (n=chatzill@p57B0520B.dip.t-dialin.net) has joined #wikimedia-tech
	-->|	buecherwuermlein (n=buecherw@wikipedia/buecherwuermlein) has joined #wikimedia-tech
	|<--	__aib has left irc.freenode.net (Connection timed out)
	-->|	Gnu1742 (n=chatzill@wikipedia/gnu1742) has joined #wikimedia-tech
	-->|	Euku (n=Euku@wikipedia/Euku) has joined #wikimedia-tech
	<siebrand>	now that many of you cannot edit your favourite WMF wikis, please take a minute to update your language's localisation at http://translatewiki.net.
	<TimStarling>	yes, especially all those messages that were recently added
	<TimStarling>	get in before I revert them ;)
	* siebrand	bites TimStarling.
	<p1ayer>	DataBase Lock? at zh.wikipedia.org , start-time 21:18 , ex, see http://zh.wikipedia.org/w/index.php?title=MySQL&action=edit
	* lin4h	/exit
	|<--	lin4h has left irc.freenode.net ("KVIrc 3.4.0 Virgo http://www.kvirc.net/")
	<Platonides>	siebrand, specially for updating mediawiki:readonlwarning ;)
	<Platonides>	http://commons.wikimedia.org/w/index.php?title=Image:Ramslogo2.jpeg&action=edit&uselang=en
	<siebrand>	Platonides: I assume that's of type 'ignored'?
	-->|	Lijealso (i=Lijealso@wikipedia/Lijealso) has joined #wikimedia-tech
	[INFO]	ChatZilla 0.9.83 [Firefox 3.0.3/2008092417]
	[INFO]	Please visit the ChatZilla homepage at <http://chatzilla.hacksrus.com/> for more information.
	<Platonides>	sorry?
	-->|	olei_ (n=olei@89.246.161.167) has joined #wikimedia-tech
	<Mike_lifeguard>	Platonides: I was gonna fix that, but... :D
	<Platonides>	I know, you can't fix the readonly message because it's read only xD
	=-=	TimStarling has changed the topic to “Locked wikis: http://noc.wikimedia.org/conf/s2.dblist | Wikimedia servers administration | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	<siebrand>	Platonides: It's localised for 127 languages. Not that bad.
	<Rjd0060>	the db lock template isn't rendering properly though
	<Rjd0060>	(on commons at least)
	* Platonides	proposes moving this conversation to #wikimedia-commons
	<TimStarling>	looks like some bot is in a tight loop trying to patrol articles on enwiktionary
	<HardDisk_WP>	hrhr
	<TimStarling>	flooding the DB error logs
	<Platonides>	block it?
	<TimStarling>	too hard
	<TimStarling>	I'll just leave it
	-->|	Aeternus (n=Massimil@wikimedia/massimiliano-lincetto) has joined #wikimedia-tech
	* DarkoNeko	stops his itnerwiki bot ._.
	-->|	Herr_X (n=chatzill@m83-188-205-116.cust.tele2.se) has joined #wikimedia-tech
	<Herr_X>	what happens?
	<Herr_X>	on wp
	<Platonides>	I have a bot running on commons, but it's just doing read actions...
	<Platonides>	Herr_X, see topic
	-->|	NjaelkiesL (n=chatzill@84-217-71-174.tn.glocalnet.net) has joined #wikimedia-tech
	<Platonides>	s2 master run out of disk space
	|<--	Tommy6_ has left irc.freenode.net ("Leaving...")
	<TimStarling>	someone make me a master log purge script that runs from cron.d and installs via a debian task package
	<TimStarling>	if you want to stop this kind of thing from happening
	<TimStarling>	seems like a simple enough thing...
	<TimStarling>	oh yeah, and a safe shutdown patch for mysql
	<TimStarling>	block on binlog write
	<TimStarling>	that'd be good too
	-->|	Myrrdin (n=Myrrdin@ip-90-186-18-186.web.vodafone.de) has joined #wikimedia-tech
	=-=	Merlissimo is now known as Guest3816
	<Platonides>	a script which just deletes old files?
	=-=	Myrrdin is now known as Merlissimo
	-->|	Herr_X_ (n=chatzill@m83-188-233-160.cust.tele2.se) has joined #wikimedia-tech
	<TimStarling>	no, a script which checks which files are in use, adds an appropriate safety margin, then runs a PURGE MASTER LOGS query to delete any logs before that
	<TimStarling>	and a patch for mysql that makes transactions block when they try to write to the binlog but the disk is full
	|<--	Wegge has left irc.freenode.net (Remote closed the connection)
	<Platonides>	there's probably a reason they don't block on that
	|<--	Jan_eissfeldt has left irc.freenode.net ("http://www.mibbit.com ajax IRC Client")
	|<--	hwaimi has left irc.freenode.net (Read error: 113 (No route to host))
	<MinuteElectron>	Platonides: why
	* Platonides	expects that mysql does check return values
	<Platonides>	MinuteElectron, no idea why they might want to do that
	<Platonides>	the point of replication is precisely to keep copies of the db
	<TimStarling>	do you think, at 2am on a sunday night, I care what the reason is?
	<TimStarling>	I just want it to block
	<TimStarling>	I don't care what it breaks
	|<--	werdan7 has left irc.freenode.net (Connection timed out)
	* Platonides	realises that it's 2am for Tim
	-->|	Flo_1 (i=593307ec@Wikipedia/Flo-1) has joined #wikimedia-tech
	-->|	GluonBall (n=chatzill@xdsl-92-252-55-174.dip.osnanet.de) has joined #wikimedia-tech
	-->|	NauarchLysander (n=christop@dslb-084-063-245-105.pools.arcor-ip.net) has joined #wikimedia-tech
	<MinuteElectron>	Platonides: and if the disk is full it cannot write to the binlog and therefore replication will fail
	<Platonides>	why not use mysql variable expire_logs_days ?
	-->|	Vito (n=Vituzzu@unaffiliated/vito) has joined #wikimedia-tech
	<Platonides>	MinuteElectron, exactly
	<Platonides>	why keep allowing writes at the master?
	<MinuteElectron>	well
	<NauarchLysander>	How is it possible that there is a database problem in the German Wikipedia because the disks are full? Something like this is predictable and appropriate measures are taken.
	<MinuteElectron>	ok
	<TimStarling>	Platonides: added in 4.1.0, for one thing
	<Platonides>	or at least make it configurable
	<Platonides>	are we still on 4.0?
	* Platonides	doesn't remember the minor version
	<TimStarling>	yes
	<TimStarling>	NauarchLysander: sucks doesn't it?
	<NauarchLysander>	TimStarling: Yes, really. If it would not impede my work I would not mind and think it funny that the Wikimedia tech people are now busy plugging in new hard disks...
	<TimStarling>	nobody is plugging in new hard disks, it's not that sort of disk full
	<NauarchLysander>	TimStarling: Did I get something wrong? Which disks are full?
	<TimStarling>	this is a recurring issue with mysql 4.0 that we've had a number of times before
	<TimStarling>	having nagios monitor disk space was meant to fix it
	|<--	Herr_X has left irc.freenode.net (Read error: 110 (Connection timed out))
	|<--	KameraadPjotr has left irc.freenode.net (Read error: 110 (Connection timed out))
	<TimStarling>	but apparently nobody noticed the warning and it never went critical
	<Platonides>	the only recent nagios space warning i saw, was for adler a week ago
	|<--	Aeternus has left irc.freenode.net ("«Goodbye cruel world. I'm leaving you today. Goodbye, goodbye, goodbye...»")
	<NauarchLysander>	TimStarling: Sucks. If these things happened before, people at Wikimedia should learn from it!
	<TimStarling>	hmm
	<Platonides>	but i'm not up 24/7 :P
	<TimStarling>	maybe it did go critical, the logs don't go back that far...
	<effeietsanders>	NauarchLysander: lets first get things fixed, right? :P
	<effeietsanders>	and maybe do a Postmortem afterwards :P
	-->|	Church_of_emacs (n=Church_o@wikipedia/Church-of-emacs) has joined #wikimedia-tech
	<TimStarling>	anyone have IRC logs? did it log a critical notice on IRC?
	<NauarchLysander>	effeietsanders: ok, no problem. But, after all, I'm not holding them from doing their work, I'm just saying they apparently did not learn from past mistakes, that's all... ;)
	<TimStarling>	[1223769600] CURRENT SERVICE STATE: ixia;Disk space;CRITICAL;HARD;3;DISK CRITICAL - free space: /a 655 MB (0% inode=99%):
	<henna>	TimStarling: time?
	<TimStarling>	ok, so it was critical
	<henna>	ah you've found logs :)
	<TimStarling>	well, that shows that it was in the critical state
	<TimStarling>	so there would have been an alert
	<TimStarling>	<effeietsanders> NauarchLysander: lets first get things fixed, right? :P
	<TimStarling>	I'm just waiting for it to copy
	<TimStarling>	rsync says ETA 30 minutes
	<effeietsanders>	aaah, ok :)
	<NauarchLysander>	k
	<effeietsanders>	TimStarling: hmm, you're in down under right? :P
	<TimStarling>	yes
	|<--	Guest3816 has left irc.freenode.net (Connection timed out)
	<effeietsanders>	must be like very late to you :o /me hands cup of coffee
	<Schildkroete>	someone should update this: Sorry, emergency maintenance, should be back around 16:00 UTC
	<effeietsanders>	Schildkroete: in 30min it is 16UTC
	<effeietsanders>	remember you're in UTC+2
	<--|	Vito has left #wikimedia-tech
	<Schildkroete>	oh I'm sorry
	<nefesfgehd>	it's still daylight's saving time in germany
	<nefesfgehd>	until the last sunday in october, IIRC
	-->|	tombom (i=tombom@82.26.207.174) has joined #wikimedia-tech
	<Platonides>	1223769600 = Sunday October 10/12/08 02:00:00
	<NauarchLysander>	That's quite annoying. Always thinking about which time we have here in Germany in comparison to other places where they do not have switching times.
	<effeietsanders>	the nice thing is, UTC is no place, it is a stable timezone :)
	<nefesfgehd>	in china it's the opposite annoyance
	<NauarchLysander>	Yes, perhaps I should paste a UTC clock on my desktop.
	<TimStarling>	I'm not really sure if we need more technology here
	<MwpnlBot>	NauarchLysander: it's quite easy. Daylight saving=UTC+2, winter=UTC+1 :P
	<TimStarling>	I think maybe we just need procedures
	<nefesfgehd>	we need more testing
	<TimStarling>	not sure what you mean by that
	<henna>	TimStarling: my logs are probably useless, they're ignoring nagios-wm
	<Platonides>	would it be a big annoyance updating mysql to add expire_logs_days ?
	<Platonides>	Either by updating to mysql4.1 or to a custom build (i think they already are) which backports that option
	-->|	PatriciaR (n=chatzill@wikimedia/PatriciaR) has joined #wikimedia-tech
	<nefesfgehd>	TimStarling: it was supposed to be a generic, politician-like, know-nothing answer
	<nefesfgehd>	supposedly funny, as if i'd said "we need more vespene gas"
	<nefesfgehd>	this whole db lock thing is getting to me
	<PatriciaR>	is there an estimated time to finish this maintenance and if yes, could that be on the topic? :)
	-->|	MZMcBride (i=mzmcbrid@wikipedia/MZMcBride) has joined #wikimedia-tech
	<dbenzhuser>	16:00 UTC ... whatever that is for you. Just about 30 min from now.
	-->|	cs97009 (i=4c1fd3cf@gateway/web/ajax/mibbit.com/x-c32095e5d6a4560d) has joined #wikimedia-tech
	<PatriciaR>	dbenzhuser: thanks :)
	|<--	PatriciaR has left irc.freenode.net (Client Quit)
	<TimStarling>	serious question:
	<TimStarling>	what would you all have done if I wasn't here and dealing with this?
	<TimStarling>	the site would be mostly unusable by now
	<jeblad>	Imagine 2000 people screaming
	<MZMcBride>	s/wasn't/weren't
	<henna>	TimStarling: not sure how possible, but at least have nagios send sms etc to a designated on-duty-person?
	<nefesfgehd>	i imagine there would be an angry mob with pitchforks in front of jimbo's apartment by now
	<DarkoNeko>	( jeblad ) Imagine 2000 people screaming -> only 2000 ? :o
	<jeblad>	The one editing
	<dbenzhuser>	Isn't it a good feeling to be needed? (except for the time ...)
	<TimStarling>	would anyone have tried to contact a sysadmin by phone?
	<jeblad>	..phone home?
	<MZMcBride>	Probably not. Do you usually like random Internet strangers calling you, Tim?
	<DarkoNeko>	their phone number is public ? :o
	<effeietsanders>	TimStarling: hmm, good question, i dont think i have any phone numbers :P
	<MZMcBride>	And most of the people in here and on the site wouldn't know the difference between a real issue and fake one.
	* jeblad	imagined TimStarling & co as humans connected to the Matrix
	<Platonides>	it's usually said 'call the office'
	<effeietsanders>	i prolly would have tried to poke any staff member
	<effeietsanders>	or domas :P
	<Platonides>	but i don't think there's anyone at the office on Sunday
	<effeietsanders>	(read: cary or domas)
	=-=	Church_of_emacs is now known as Church_of_away
	-->|	Lockal (n=lockal@wikipedia/Lockal) has joined #wikimedia-tech
	<jeblad>	Strange thing, right now journalists are writing stories about problems with Wikipedia
	-->|	EncycloPetey (i=47caff88@gateway/web/ajax/mibbit.com/x-56c113c357290ac9) has joined #wikimedia-tech
	<effeietsanders>	but yeah, it might be a good idea to work out a system to deal with it :)
	<effeietsanders>	jeblad: ?
	<jeblad>	And that it probably will be the end of information as we know it..
	<effeietsanders>	link?
	-->|	Yerul (n=Vlad@wikipedia/Yerul) has joined #wikimedia-tech
	[INFO]	Conference Mode has been enabled for this view; joins, leaves, quits and nickname changes will be hidden.
	<jeblad>	and they forget the whole bank crisis..
	<EncycloPetey>	Has anyone else noticed that the "You have new messages" isn't going away on Wiktionary? I thought maybe someone had posted again to my page, but that hasn't happened.
	<EncycloPetey>	I don't know whether this is happening just there on on other WM projects as well.
	<EncycloPetey>	**er, on the English Wiktionary, I suppose I should say. **
	<Rjd0060>	You'll also notice you can't edit the wiki, EncycloPetey
	<effeietsanders>	dont have it on nlwiki, EncycloPetey
	<Rjd0060>	They're working on some issues
	<Rjd0060>	It's fine :)
	<atglenn>	I supposed as en wikt is in the list of locked dbs
	<atglenn>	that would be it, yeah.
	<Mike_lifeguard>	The db is locked, and the trigger for the banner is in the db, so it's not gonna change
	<effeietsanders>	and not on enwikt either
	<Platonides>	EncycloPetey, isn't enwiktionary readonly?
	<EncycloPetey>	I've never seen that happen before... How long has it been locked down, and what is the prognosis for unlocking?
	<Wuzur|rhn>	at 16 UTC it should be unlocked
	<Rjd0060>	*around 16:00
	<EncycloPetey>	Both the English and French Wiktionaries are closing in on one million entries, so locking right now for either could lead to unpleasantness.
	<Rjd0060>	not at ;)
	<MZMcBride>	EncycloPetey: The French Wiktionary isn't locked.
	<EncycloPetey>	I had noticed earlier this week that the French wiktionary was not gaining new articles as they had been in previous days.
	<Platonides>	it's a conspiracy so the French arrive easrlier ;)
	<EncycloPetey>	For three days only 100 new entries...very uncharacteristic for them.
	<EncycloPetey>	But 4000 new entries in the last 24 hours
	<Mike_lifeguard>	what server has the db for global groups stuff?
	<jeblad>	As a side note; some time back I worked for a power company and we had just shut off the current to about 4000 households. Then someone called and asked "Hey, my coffe isn't ready, can you turn the power back on?" So we tried to carefully explain the situation, and the caller waited ten seconds, thinking, then he replied "so you turn the power back on?"
	<jeblad>	Perhaps it should be possible to get a message out to the readers when there are a problem with the databases.
	<Platonides>	they will see a message when trying to edit
	<henna>	jeblad: why, reading isn't affected
	<MZMcBride>	How is one to create panic if every page doesn't have a large blinking banner?
	<EncycloPetey>	But some folks will see odd behavior even if they don't edit...like perpetual "new messages".
	<henna>	EncycloPetey: small number off ppl that wil have that that won't edit
	<Platonides>	then, when visiting your talk and not able to remove the "new messages" banner, an alert should be shown
	<jeblad>	The problem is, when something locks down the site for whattever reason they should get something more usable than just "The site experiences technical difficulties".
	<EncycloPetey>	Henna, so confusing a "small" number of people is OK?
	<Platonides>	jeblad, you can customize the message
	<Platonides>	there is a line set when locking
	<henna>	EncycloPetey: depending on the amount of work involved in unconfusing them while at the same time not confusing others who won't notice anyways, it might be Ok to confu them
	<henna>	EncycloPetey: if it's very important to you, write the code and/or convince somebody it's necessary
	<EncycloPetey>	I've tries convincing people that much bigger problems were worth doing something about, but nobody listens.
	<jeblad>	I think it is some line where you give sufficient information that you customers will be satisfied
	<nagios-wm>	Disk space on ixia is OK: DISK OK
	<EncycloPetey>	English Wiktionary hasn't had an XML dump since June.
	<EncycloPetey>	This is at odds with not wanting sites to run live mirrors, yet wanting to provide free content.
	<dbenzhuser>	DISK OK sounds good ..
	<EncycloPetey>	Since June, English Wiktionary has grown by more than 12%, counting only *new* entries. So anything based off the June XML dump is nearly worthless at this point.
	<jeromyu>	Still "Sorry, emergency maintenance, should be back around 16:00 UTC" on zh.wikipedia :P
	<MZMcBride>	EncycloPetey: One thing at a time, eh? :-)
	<Alphos>	jeromyu : "should" :p
	<EncycloPetey>	We can't run statistical updates or cleanup.
	<Alphos>	"around" :p
	<MZMcBride>	The dumps had issues related to hardware that, as far as I know, are now resolved.
	<EncycloPetey>	MZM, but since June it's been "no things at a time" for us. So, why haven't we had an XML dump?
	<MZMcBride>	Read what I just wrote?
	<nagios-wm>	MySQL on ixia is OK: TCP OK - 0.000 second response time on port 3306
	<EncycloPetey>	Yes, and it doesn't explain why no dump has yet occurred.
	<Platonides>	EncycloPetey, new xml dumps are on the way
	<EncycloPetey>	Part of the problem is that we haven't gotten feedback on our site from *anyone* .
	<EncycloPetey>	There is no means (as in the current situation) for disseminating the information to the people affected.
	<EncycloPetey>	...or if there is a means, then no one is using it.
	<nagios-wm>	MySQL on db8 is OK: TCP OK - 0.000 second response time on port 3306
	<jeromyu>	just a question, zh.wikipedia is also on s2?
	<Platonides>	jeromyu, see list on topic
	<Platonides>	yes, it is
	<jeromyu>	thx
	<str4nd>	(Can't contact the database server: No working slave server: Unknown error)
	<str4nd>	(Can't contact the database server: Too many connections (10.0.0.241))
	<TimStarling>	yeah, it might be a bit overloaded for a while
	<TimStarling>	I'll transfer some load to db13
	<str4nd>	Yeah.. I can belive that
	<jeblad>	Nice work, I even got ime to talk to other people! :D
	<Platonides>	xD
	<jeblad>	I think the tech people do a great job
	<jeblad>	Thing will fail from time to time
	<jeblad>	But most of the time it gets fixed very fast
	<Platonides>	we were lucky TimStarling was here
	<Platonides>	instead of being sleeping
	<Platonides>	3am for him
	<str4nd>	is upload.wikimedia.org going to work soon also?
	<str4nd>	yay
	<Platonides>	str4nd, isn't it working?
	* EncycloPetey	needs food...badly
	<TimStarling>	dewiki might have to stay r/o if db8 can't handle the load
	<str4nd>	Platonides: It wasn't
	<Platonides>	hmm, No working slave server..
	<Platonides>	wait until there is less load on db8
	<TimStarling>	yeah, you all need to stop using the servers now, they need a break
	<niabot>	Tell that our discussion hungry army of writers XD
	* str4nd	gives coffee and biscuits to servers
	<atglenn>	food is a good idea
	<juliano>	SNAFU, right?
	<boivie>	Is anyone here? I need a developer,
	<Platonides>	boivie, read the topic
	<str4nd>	:DD
	<Platonides>	i doubt your need is really urgent
	<boivie>	it is not
	<boivie>	wow, You've got svwiki working again
	<boivie>	thanks!
	<str4nd>	don't lie to me!
	<TimStarling>	no, it's still screwed
	<buecherwuermlein>	:S
	<boivie>	But now it's readable at least.
	<niabot>	auf die 10 folgt die 11 000 XD
	<TimStarling>	trust me, it's screwed
	<TimStarling>	ok *now* it's readable
	<TimStarling>	maybe we can get dewiki up
	<TimStarling>	the others will have to stay r/o
	<TimStarling>	no promises
	<jeblad>	any estimates?
	<TimStarling>	it'll take another hour and a half to do another copy
	<TimStarling>	give me a minute
	<TimStarling>	not ready for that yet
	<Kanonkas>	how long is commons.wiki going to be locked? :/
	<Mike_lifeguard>	let the man work
	<siebrand>	Kanonkas: it is being worked on. It'll be ready when it is ready. Current estimate is 90 minutes, +/- 1 day.
	<Kanonkas>	O.O
	* Kanonkas	moves to en.wp
	<siebrand>	Kanonkas: alternatively you can also work on the 'nn' localisation on http://translatewiki.net.
	<MwpnlBot>	could someone change "Sorry, emergency maintenance, should be back around 16:00 UTC" to a more accurate message? "Due to unplanned maintenance editing is probited for at least the coming hours" or something.
	<jeromyu>	unplanned maintenance......................... :p
	<MwpnlBot>	*prohibited
	=-=	gribeco has changed the topic to “Unplanned maintenance on S2 cluster -- Locked wikis: http://noc.wikimedia.org/conf/s2.dblist, read-only until 18:00 UTC at least”
	<gribeco>	better ?
	=-=	Platonides has changed the topic to “Unplanned maintenance on S2 cluster -- Locked wikis: http://noc.wikimedia.org/conf/s2.dblist, read-only until 18:40 UTC at least”
	=-=	Platonides has changed the topic to “Unplanned maintenance on S2 cluster -- Locked wikis: http://noc.wikimedia.org/conf/s2.dblist, read-only until 18:00 UTC at least”
	<MwpnlBot>	gribeco: I ment the notice on the Wikipedia's :-)
	<gribeco>	MwpnlBot: ah, I haven't seen that one
	<jeblad>	The message comes from the database, right?
	<MinuteElectron>	no
	<MwpnlBot>	no
	<MinuteElectron>	from initializesettings.php
	<jeblad>	And the database is locked?
	<gribeco>	the sitenotice is in the db, no?
	<MwpnlBot>	Thanks to whoever changed it :-)
	<str4nd>	"new estimate ~18:30 UTC"
	<nagios-wm>	Apache on srv161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
	<nagios-wm>	MySQL on db8 is CRITICAL: Connection refused
	<Calandrella>	What caused the error which caused the locking?
	<Mike_lifeguard>	ran out of disc space on s2
	<Calandrella>	Mike_lifeguard: thanks
	<nagios-wm>	MySQL status on db16 is CRITICAL: CRITICAL: Running threads = 403 (75): Connected threads = 446 (1000)
	<CWii>	TimStarling, Good luck on fixing the dbs :)
	[INFO]	The view “#wikimedia-tech” has been successfully saved to “C:\Documents and Settings\Administrator\桌面\#wikimedia-tech.html”.
	<str4nd>	kind of offtopic, but http://img.4chan.org/b/src/1223831234999.jpg (safe for work)
	<nagios-wm>	Host thistle is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms
	<nagios-wm>	MySQL on thistle is OK: TCP OK - 0.000 second response time on port 3306
	<TimStarling>	welcome thistle
	<CWii>	TimStarling, New server?
	<Platonides>	not new
	<Platonides>	but perhaps it was down
	<CWii>	Oh :)
	<CWii>	Yeah, it's not new.
	<CWii>	According to wikitech.
	<HardDisk_WP>	str4nd, LOL
	<nagios-wm>	Disk space on thistle is CRITICAL: Connection refused by host
	<CWii>	Unless it was upgraded it only has 438 GB of HDD space
	<str4nd>	disk space refused by host :)
	<HardDisk_WP>	str4nd, you happen to have a backup of the image? it's 404 now :X
	<str4nd>	HardDisk_WP: ye \o
	<str4nd>	http://mirror.kapsi.fi/1223831234999.jpg
	<wmrwiki>	大家好
	<wmrwiki>	大家好
	<jeromyu>	halo?
	<str4nd>	wmrwiki: Could you write in English?
	<CWii>	o.0
	<jeromyu>	he said halo to everybody
	<wmrwiki>	oh
	<Tubarao>	as in xbox? ;)
	<CWii>	:)
	<jeromyu>	:D
	<HardDisk_WP>	str4nd, thx :D
	<str4nd>	HardDisk_WP: np :)
	<Calandrella>	Is enwp working=
	<Calandrella>	How?
	<Calandrella>	Why?
	<str4nd>	That's in s1. Only s2 is locked.
	<nagios-wm>	Host will is DOWN: CRITICAL - Host Unreachable (208.80.152.184)
	<nagios-wm>	Disk space on thistle is OK: DISK OK
	<Dammit>	disk space ok? :o
	<nagios-wm>	Host will is UP: PING OK - Packet loss = 0%, RTA = 0.68 ms
	<malafaya>	@replag
	<malafaya>	whats this s2 maintenance? did something break?
	<CWii>	malafaya, Disk on one of the db servers got full
	* CWii	thinks
	<CWii>	Please correct me if I'm wrong!
	<malafaya>	thats a nasty one
	<CWii>	Ya.
	<CWii>	Tim is working on it atm.
	<nagios-wm>	MySQL on db15 is OK: TCP OK - 0.004 second response time on port 3306
	<Philip_zhwp>	All your bug are belong to wikimedia. - -
	<CWii>	:)
	<jeromyu>	:d
	<jeromyu>	:D
	<nagios-wm>	MySQL on db8 is OK: TCP OK - 0.000 second response time on port 3306
	<CWii>	"16:54 Tim: copied mysqld binaries from db11 to db15 and thistle. Plan for thistle is to use it for s2a." --WikiTech, Server admin log
	* CWii	is keeping the people informed
	<nagios-wm>	MySQL status on db8 is OK: OK:
	<TimStarling>	db8/db15 warming up
	* CWii	gets the blankets
	=-=	Mike_lifeguard has changed the topic to “Unplanned maintenance on S2 cluster -- Locked wikis: http://noc.wikimedia.org/conf/s2.dblist - read-only until 18:00 UTC at least”
	<Mike_lifeguard>	so you can click the link :)
	<TimStarling>	I didn't say "at least"
	<CWii>	:)
	<TimStarling>	I still have 12 minutes
	<Mike_lifeguard>	I only changed punctuation
	<nagios-wm>	MySQL status on db16 is OK: OK:
	<_mary_kate_>	who trashed the topic?
	<HardDisk_WP>	Not me
	<CWii>	Not me
	<HardDisk_WP>	*lol*
	<CWii>	:)
	<HardDisk_WP>	hi _mary_kate_ btw
	<Mike_lifeguard>	gribeco, looks like
	=-=	_mary_kate_ has changed the topic to “Unplanned s2 maintenance until 18:00 at least (locked wikis: http://noc.wikimedia.org/conf/s2.dblist) | Wikimedia servers administration | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	=-=	_mary_kate_ has changed the topic to “Unplanned s2 maintenance until 18:00 UTC at least (locked wikis: http://noc.wikimedia.org/conf/s2.dblist) | Wikimedia servers administration | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	<gribeco>	_mary_kate_: yep, we were getting the usual rush of questions, and the topic wasn't dire enough
	<nagios-wm>	Apache on srv187 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
	<CWii>	:O
	<gribeco>	"Wikimedia meltdown -- awaiting emergency funds from G7 meeting"
	<str4nd>	t minus 7 minutes!
	<Philip_zhwp>	your guys..
	<CWii>	It looks like Apache on srv187 went down. There isn't any CPU or network usage according to Ganglia
	<CWii>	But that's not the most of our problems :()
	<malafaya>	it's 18:00 UTC and still no en.wikt. :)
	<PhilipXD>	18:00 UTC
	<PhilipXD>	2:00 CST
	<CWii>	malafaya, My friend. Patenice.
	<malafaya>	CWii, no problem. i was just teasing Tim cause he didn't say "at least" :D
	<CWii>	:D
	<LA2>	on #wikipedia-sv the channel topic says 20.30 (18.30 UTC)
	<malafaya>	thats what it says when you try to edit en.wikt too
	<Mike_lifeguard>	someone padded the estimate... good boy
	* CWii	in yer Ganglia, watchin yer servers
	=-=	PhilipXD has changed the topic to “Unplanned s2 maintenance until ??:00 UTC at least (locked wikis: http://noc.wikimedia.org/conf/s2.dblist) | Wikimedia servers administration | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	<PhilipXD>	..
	<PhilipXD>	just a test..
	<CWii>	..
	<JulianC93>	...
	<CWii>	PhilipXD, This isn't the place to test
	<CWii>	Fix it please.
	=-=	PhilipXD has changed the topic to “Unplanned s2 maintenance until 18:00 UTC at least (locked wikis: http://noc.wikimedia.org/conf/s2.dblist) | Wikimedia servers administration | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	<PhilipXD>	XD
	=-=	CWii has changed the topic to “Unplanned s2 maintenance until 18:30 UTC at least (locked wikis: http://noc.wikimedia.org/conf/s2.dblist) | Wikimedia servers administration | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	<CWii>	You have thiry minutes!
	<CWii>	Or it's FREE!
	<gregor>	what about a bot, that change the topic on every channel related to wikipedia? :S
	<CWii>	Why?
	<gribeco>	gregor: bad idea
	<TimStarling>	ok, going to r/w on s2 (except dewiki)
	<gribeco>	\o/
	<CWii>	Yay!
	<malafaya>	and what about a bot that adds an extra 30 minutes 5 minutes before deadline? :)
	=-=	CWii has changed the topic to “Unplanned s2 maintenance, Read write coming back shortly (locked wikis: http://noc.wikimedia.org/conf/s2.dblist) | Wikimedia servers administration | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	* JulianC93	picks up a very large trout and slaps hurricanes around a bit with it
	* JulianC93	sends a Category 1 hurricane in hurricanes's direction
	* JulianC93	sends a Category 2 hurricane in hurricanes's direction
	<PhilipXD>	数据库错误
	<PhilipXD>	发生数据库查询语法错误。 可能是由于软件自身的错误所引起。 最后一次数据库查询指令是:
	<PhilipXD>	(SQL查询已隐藏)
	<PhilipXD>	来自于函数 "User::invalidateCache"。 MySQL返回错误 "1223: Can't execute the query because you have a conflicting read lock (10.0.0.231)"。
	* JulianC93	sends a Category 3 hurricane in hurricanes's direction
	* JulianC93	sends a Category 4 hurricane in hurricanes's direction
	* JulianC93	sends a Category 5 hurricane in hurricanes's direction
	* JulianC93	whacks hurricanes with an adminship mop
	<JulianC93>	wtf, sorry
	<CWii>	lol
	<CWii>	DB error
	<malafaya>	:S
	<CWii>	Yay!
	* JulianC93	was setting ChatZilla preferences, and it went crazy
	<TimStarling>	should be right now
	<HardDisk_WP>	JulianC93, you never can beat what I did. I accidentally busted in here the WHOLE wikimedia error message in ALL languages!
	<[MarkW]>	TimStarling: where can we send the 'thank you for working on sunday'-notes to?
	<nagios-wm>	MySQL status on ixia is OK: OK:
	<HardDisk_WP>	[MarkW], I guess he has Paypal somewhere
	<TimStarling>	http://donate.wikimedia.org/
	=-=	CWii has changed the topic to “Read write back except on dewiki | Wikimedia servers administration | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	<CWii>	TimStarling, Or is it back on dewiki too?
	<PhilipXD>	http://zh.wikipedia.org/w/index.php?title=Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E6%8A%80%E6%9C%AF&action=edit&section=new
	<PhilipXD>	Wikipedia has a problem
	<CWii>	fine for me
	<[MarkW]>	Why does everything has to be about money stacks? :P
	<HardDisk_WP>	CWii, no, dewiki still busted
	<CWii>	Okay, noted in topic
	<TimStarling>	PhilipXD: probably just a temporary overload
	=-=	YOU are now known as p1ayer_
	=-=	YOU are now known as p1ayer
	<MZMcBride>	Tim: Nice work today. :-)
	<malafaya>	de.wiki is still locked
	<CWii>	TimStarling, Yes. You really made a difference :)
	<CWii>	malafaya, We know.
	<Thehelpfulone>	why am i getting: Er is een syntaxisfout in het databaseverzoek opgetreden. Mogelijk zit er een fout in de software. Het laatste verzoek aan de database was:
	<malafaya>	ok
	<Thehelpfulone>	(SQL-zoekopdracht verborgen)
	<Thehelpfulone>	vanuit de functie “User::addToDatabase”. MySQL gaf de foutmelding “1062: Duplicate entry 'Thehelpfulone' for key 2 (10.0.0.231)”.
	<Thehelpfulone>	CWii: ?
	<Wutsje>	Thank you, Tim.
	<SterkeBak>	Tim good work :-)
	<str4nd>	Funny, moving and new accounts are enabled in dewiki. :)
	<Thehelpfulone>	TimStarling: why do i get: Er is een syntaxisfout in het databaseverzoek opgetreden. Mogelijk zit er een fout in de software. Het laatste verzoek aan de database was:
	<Thehelpfulone>	(SQL-zoekopdracht verborgen)
	<Thehelpfulone>	vanuit de functie “User::addToDatabase”. MySQL gaf de foutmelding “1062: Duplicate entry 'Thehelpfulone' for key 2 (10.0.0.231)”.?
	<HardDisk_WP>	Thehelpfulone, what are you trying to do?
	<Thehelpfulone>	HardDisk_WP: nvm, it's working now
	<HardDisk_WP>	k
	<PhilipXD>	http://zh.wikipedia.org/w/index.php?title=Special:RecentChanges&variant=zh-cn
	<PhilipXD>	still 13:31
	<Thehelpfulone>	or not :S
	<Thehelpfulone>	HardDisk_WP: http://nl.wikipedia.org/wiki/
	<Thehelpfulone>	i get:Databasefout
	<Thehelpfulone>	Er is een syntaxisfout in het databaseverzoek opgetreden. Mogelijk zit er een fout in de software. Het laatste verzoek aan de database was:
	<Thehelpfulone>	(SQL-zoekopdracht verborgen)
	<Thehelpfulone>	vanuit de functie “User::addToDatabase”. MySQL gaf de foutmelding “1062: Duplicate entry 'Thehelpfulone' for key 2 (10.0.0.231)”.
	<Calandrella>	What? "Sorry! This site is experiencing technical difficulties.!
	<Calandrella>	I thought everything was solved...
	<Thehelpfulone>	yeah :P
	<TimStarling>	everything's very overloaded
	<Calandrella>	Yeah, it works!
	<Beau_>	is this planned?: db host="db15" lag="1019"
	<PhilipXD>	Special:RecentChanges, lag lag lag lag ...
	<Gnu1742_>	can someone estimate when dewp will be up agein?
	<CWii>	Gnu1742_, Not at this time.
	<Gnu1742_>	Ok... thank you
	<TimStarling>	dewiki can probably go r/w now
	<ChrisiPK>	weeeee!
	<Calandrella>	yes yes!
	<Calandrella>	TimStarling: Good work!
	<DaBPunkt>	Must it? It's so lovely quite at teh moment ;)
	<ChrisiPK>	DaBPunkt das heißt quiet :P
	<Liesel73>	DaBPunkt Time to work
	* Calandrella	imagines locking, but with the access to edit from administrators...
	<DaBPunkt>	ChrisiPK: Mist, ich hatte extra nachgschlagen und dann doch falsch getppt :(
	<Calandrella>	How wonderful not needing to care about vandalism for a while...
	=-=	str4nd has changed the topic to “Wikimedia servers administration | Up | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	<Calandrella>	Or, rather, the time could be used to remove old not yet removed vandalism
	<ChrisiPK>	hehe DaBPunkt, da arbeitet man wochenlang auf so nen satz hin und dann kann man sich nicht mal positiv profilieren ^^
	<malafaya>	#wikimedia-stewards
	<TimStarling>	db15 is non-google and slow, chronically lagged by about 40 minutes and rising
	<TimStarling>	db8 is fluctuating around 30s lag
	<TimStarling>	and ixia is doing 100% of dewiki's read load
	<TimStarling>	I've got db13 (non-replicating) doing some query groups
	<TimStarling>	it's not enough, and I certainly can't run the site on one less, for another copy
	<TimStarling>	I could split s2
	<TimStarling>	that might help
	<gregor>	what is non-google?
	<TimStarling>	google has a patch set for mysql 4.0
	<TimStarling>	look it up on google
	<Danny_B>	my sgml validator (firefox extension) says there's invalid char on http://cs.wiktionary.org/wiki/Speci%C3%A1ln%C3%AD:Recentchanges but does not say where and i am unable to find it. however i usualy get this error on my own pages when i accidentaly submit utf files with BOM or having 00h char in them
	<Danny_B>	if anybody has an idea how to find, where the invalid char is, it would be great
	<LA2>	tim, how are you doing? do you get enough sleep?
	<str4nd>	2008-09-28T23:24 « nagios-wm» Disk space on ixia is CRITICAL: DISK CRITICAL - free space: /a 16442 MB (3% inode=99%):
	<TimStarling>	kind of tired, been up all night now
	<Calandrella>	TimStarling: I think you should rest
	<Danny_B>	go have a rest
	<Calandrella>	Your health is a lot more important than WP
	<Danny_B>	we will guard here
	* Calandrella	can't help,but I think that TimStarling anyway should rest
	<LA2>	a broken server is one thing, but we can't afford a broken Tim
	<TimStarling>	lack of sleep never killed anyone
	<TimStarling>	well, almost never
	<LA2>	(famous last words)
	<VasilievVV>	o_O
	<Simetrical>	I bet lots of people got killed in car accidents and such due to lack of sleep.
	<manuel>	I think we could find sysadmins who got killed by their users after accidental deletion due to lack of sleep.
	<siebrand>	per http://nl.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb db13 replication lag is still growing, preventing bot edits. Is that on purpose?
	<Simetrical>	Of course, sysadmins like to increase replag deliberately just to annoy bot operators.
	<TimStarling>	db13 isn't replicating
	<nagios-wm>	MySQL on db15 is CRITICAL: Connection refused
	<siebrand>	Simetrical: I'm not annoyed. It could be that load should be kept as low as possible. Was simply asking... Didn't know it was a sensitive subject...
	<Simetrical>	Replag is, generally speaking, never intentional.
	<Simetrical>	Well, probably in some odd cases it is.
	<nagios-wm>	Apache on srv75 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
	<Simetrical>	But it can be safely assumed in the general case that it's not deliberate, and will be fixed when possible.
	<Simetrical>	Assuming the sysadmins know about it.
	<nagios-wm>	MySQL on thistle is CRITICAL: Connection refused
	<Tubarao>	TimStarling: maybe you know, but history doesn't seem to be working at nowiki
	<Platonides>	Tubarao, link?
	* Platonides	doesn't see anything strange on nowiki history pages
	<TimStarling>	RCL will be mostly broken though
	<TimStarling>	who's in the US?
	<Platonides>	sorry for my ignorance, what's RCL?
	<TimStarling>	recent changes linked
	<Tubarao>	well..
	<Platonides>	which are those sent to db13
	<Platonides>	so they will be lagged
	<Tubarao>	api seems to be good.. maybe it's just some cache problem..
	<Tubarao>	http://no.wikipedia.org/w/index.php?title=Wikipedia:Badwords&action=history vs http://no.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Wikipedia:Badwords&rvlimit=10&rvprop=timestamp|ids|user|comment&meta=userinfo&rvdir=older
	<nagios-wm>	Apache on srv75 is OK: OK - HTTP/1.1 301 Moved Permanently - 0.062 second response time
	<Platonides>	i see the same content
	<Platonides>	db15 joined to the pool
	<nagios-wm>	MySQL on db15 is OK: TCP OK - 0.001 second response time on port 3306
	<Platonides>	seems it will take it hard work to reduce its lag
	<TimStarling>	hey, brion!
	<TimStarling>	what's wrong with your phone?
	<brion>	yo
	<Platonides>	brion, take the cluster work
	<Platonides>	and let tim sleep
	<brion>	shitty reception
	<Platonides>	s2 wasn't too healthy :P
	<brion>	:(
	* brion	pokes at server admin log
	<TimStarling>	press refresh, one last edit from me
	<TimStarling>	db15 is catching up nicely now
	<TimStarling>	maybe it can go into rotation when it's done
	<LA2>	morning brion, wish I could say "good" morning
	<LA2>	the German word for database is Datenbank, so maybe this is related to the financial crisis
	<Platonides>	xDD
	<brion>	ok, i'll poke back after lunch
	<brion-lunch>	things seem reasonably stable for the meantime
	<Platonides>	have a rest, tim
	<Klara>	i have a problem with Special:RelatedChangesLinked in the de.wp. first i saw the edits which where made after editing was possible again. but after 15 minutes or so they disappeared from the RCL-page. until now i see only edits which were made before the downtime although i know there exist newer edits.
	<Platonides>	Klara, that page goes to a lagged server
	<Simetrical>	[081012 15:45:26] <TimStarling> RCL will be mostly broken though
	<Klara>	ok, thank you...
	<Klara>	i guess it's not clear when this lag will be fixed?
	<Platonides>	when db13 catches up
	<Klara>	ok, so i guess it's not clear when db13 catches up ;)
	* Platonides	sees that db15 is not longer lagged :)
	<Platonides>	Klara, you're right ;)
	<cs97009>	Are bot edits disabled or so on it, cs, nl, pl wikis? My bot waits on server lag.. Hoewver, if I look at the "Special:Contributions" page, I don't notice the delay message. Plus I tried editing page on cs.wiki using my bot id and the edit succeeded
	<cs97009>	any help would be appreciated.. Thanks !
	<Platonides>	cs97009m a slave is lagged
	<Platonides>	it's probably that
	=-=	brion-lunch has changed the topic to “You may see old or slightly broken results on some special pages on German and some other Wikipedias -- DB resync in progress | Wikimedia servers administration | Up | 50k | MediaWiki issues: #mediawiki | Wikimedia Foundation: #wikimedia | Wikimedia Toolserver: #wikimedia-toolserver | Don't ask "Is anyone here?" or say "I need a developer," just ask your question | Lots of text? http://rafb.net/paste/”
	<cs97009>	Sorry I got disconneced before..
	<cs97009>	any advise on my bot issue?
	<Beau_>	there is an option in pywikipediabot to ignore replag
	<cs97009>	thanks, my question is whether there is still any problem with DBs.. If so, I can just wait..
	<Simetrical>	As the topic says, there's a DB resync happening.
	<Simetrical>	I'm guessing it will be fixed sometime in the near future.
	<MZMcBride>	cs97009: The API can tell you the replag.
	<cs97009>	ok get it.. :)
	<MZMcBride>	http://nl.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb :-)
	<cs97009>	thanks, that helps !
	<DaBPunkt>	hello Duesentrieb
	<Duesentrieb>	moin DaB.
	<Thogo>	huhu Duesentrieb ;)
	<Duesentrieb>	huhu
	<Duesentrieb>	kaum zuhause, schon wieder online
	<Duesentrieb>	geh aber bald ins bett
	<Thogo>	ja, ich auch. Muss morgen wieder arbeiten (und dem Chef Bericht erstatten...)
	<Thogo>	(immerhin hat Onkel Max meine Fahrkarte bezahlt)
	<domas>	"Hard drives are pretty cheap, installing a second drive to take better
	<domas>	advantage of a dual-CPU system wouldn't be too difficult."
	<Thogo>	*lol*
	<LA2>	domas, if you get the person to send the money for "a second drive" ($100 maybe), they would have donated more than the average person
	<brion>	http://leuksman.com/log/2008/10/12/database-borkage/
	<MZMcBride>	Hahaha.
	<MZMcBride>	Nice image.
	<brion>	:)
	<Platonides>	brion, what would take updating to mysql4.1 or backporting expire_logs_days and adding it to our custom builds?
	<brion>	i think it'll be my new 'server news' icon ;)
	<brion>	Platonides: domas is still evaluating 5; 4.1 would be at least as painful
	<brion>	we can expire log files with a cronjob pretty easily as well
	<brion>	it's just a matter of actually doing it
	<Platonides>	tim talked about a script to automatically check which files are in use and purge master, but existing a variable on next versions...
	<Platonides>	a minor version as painful as a major? :s
	<brion>	4.0 to 4.1 is a huge change
	<brion>	charset support, subqueries, lots of different behavior
	<brion>	the .1 is deceptive :)
	<Platonides>	:(
	<Platonides>	I find 'look at which files are in use' hackish
	<Bdka>	*eek* nice image, brion ;-)
	<MZMcBride>	brion: Not to pile on, but... we've got 1,976 images that have some sort of issue according to http://noc.wikimedia.org/~tstarling/image-check/2008-10-10 . Is there a plan to fix those? :-)
	<brion>	MZMcBride: moment
	<brion>	ok i think we're going to copy db15 (now up to date) to db13, so we'll have them both in service soonish
	<brion>	then at some point _mary_kate_ will need to copy one of them to resync toolserver
	<brion>	ok kind of confused
	<Mbimmler>	could anyone have a look at the technical problem ("database query syntax error") described in #2008101210016311 ?
	<brion>	ok, switched db15 in in place of db13
	<brion>	and gave it some general load on s2
	<brion>	db15 is getting a little laggy
	<brion>	let's see if it catches up shortly ...
	<brion>	taking db15 back out of general rotation, leaving it for rc/contribs/etc
	<CWii>	brion, Having fun? ;)
	<brion>	not really
	<CWii>	I would see why :P
	<brion>	ok just taking db15 out entirely
	<CWii>	Hehe,
	* CWii	loves reading the server log :P
	<Bensin>	Is there a planned date for implementaion of http://www.mediawiki.org/wiki/Extension:Multi-Category_Search ?
	<Bryan>	bah, why doesn't the whole world just use either \r\n or \n
	<brion>	Mbimmler: link?
	<Mbimmler>	brion: second
	<Mbimmler>	brion: https://secure.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom&TicketID=2061910&ArticleID=2444446&QueueID=114
	<brion>	Mbimmler: that'd be the login page, not the upload page
	<Mbimmler>	*I* know (he doesn't...)
	<brion>	tell him try tomorrow, the dbs are a little funky today
	<Mbimmler>	alright, will do
	<Mbimmler>	thanks
	<Mbimmler>	(eh, by the way, because I'm too lazy to check: If he clicks on Upload while not logged in, is he by any chance rerouted to the Userlogin page, which might clarify why he was talking of upload?)
	<brion>	not unless somebody did that in JS
	<domas>	I enabled db15 with proper load
	<brion>	whee
	<brion>	thx
	<CWii>	heh
	<brion>	ok, trying to put db15 back in for categories use now that domas fixed the index
	<domas>	I did not
	<domas>	:)
	<brion>	!
	<brion>	uh-oh
	<brion>	then.... i shouldn't?
	<brion>	or?
	* brion	is confuzzled
	<domas>	dunno
	<domas>	it may not collapse though
	<domas>	it is heated up by now
	<Bryan>	all fun happens on sunday, eh?
	<brion>	so...
	<domas>	the trick that used to be before was that mediawiki had that concurrency limiting
	<brion>	does something need to be applied or not?
	<CWii>	Bryan, seems that way.
	<domas>	lemmie try one thing
	<Bryan>	actually, it is monday already here. bugmonday!
	<domas>	I'll remove db15 from load for a while
	<brion>	ok
	<brion>	(do the group loads too -- i put it in for cats)
	<brion>	Bryan: not only sunday, but monday is a us holiday so we technically get it off too :)
	<domas>	damn, wrong server
	<domas>	:D
	<brion>	:P
	<domas>	db13 has right one
	<domas>	as intended
	<Paintman>	Hi
	<domas>	why oh why did Tim not take slave as new master image
	<domas>	that would've resolved so many issues %)
	<brion>	db13's borked tho
	<brion>	:(
	<Paintman>	why most of the dumps are aborted?
	<_mary_kate_>	because 10 minutes of changes would have been lost?
	<domas>	brion: that was the "wrong server" part
	<domas>	_mary_kate_: yes
	<brion>	Paintman: because they stopped two months ago
	<brion>	and when the system was restarted last week it marked them as incomplete
	<Paintman>	ok
	<domas>	[root@zwinger php]# for i in $(mysql -h db15 -e 'show databases'); do mysql -h db15 -e 'alter table categorylinks drop key cl_sortkey, add key cl_sortkey (cl_to,cl_sortkey);' $i; done
	<domas>	ERROR 1049 (00000): Unknown database 'Database'
	<domas>	hehe
	<Bryan>	hmm no bugmonday for me; need to sleep
	<domas>	ummm, I think I know which problem we may be hitting - there's an XFS regression
	<brion>	o_O
	<brion>	oh dear
	<domas>	and I know how to fix it!
	<CWii>	:D
	<brion>	yay!
	<domas>	just needs remounting of filesystem after big file copyin
	<domas>	well, thats workaround
	<brion>	heh
	<domas>	other way is remove O_DIRECT
	<domas>	db15 is indeed very slow
	<domas>	atm
	<domas>	but thats filesystem issue probably
	<brion>	still after the xfs remount?
	<brion>	or did you do that yet?
	<domas>	no
	<domas>	I didn't do that yet
	<domas>	I want the alter table to finish
	<domas>	at least for that table
	<brion>	k
	<domas>	hah
	<domas>	computers hate me
	<domas>	file system is XFS
	<domas>	in oprofile it shows 'jfs' being high up
	<brion>	...
	<domas>	well, not too high
	<domas>	but still, 1%
	<Werdna>	hey all
	<Werdna>	no bugmonday for me either. I'm on for 5 minutes as a study break :(
	<domas>	I intentionally crashed db15 :(
	<domas>	:)
	<domas>	as in, kill -9
	<Werdna>	yay!
	* MrZ-man	may be on for bugmonday when its actually monday for him
	<domas>	damn 1G of transaction logs
	<Werdna>	I had something to bug brion about, but I forgot it :/
	<brion>	:P
	<Werdna>	exams start friday :/
	<CWii>	Werdna, Good luck.
	<Werdna>	thanks..
	<Werdna>	In 3.5 weeks, I'll be completely done.
	<Werdna>	3 weeks thursday.
	<brion>	woohoo
	<MrZ-man>	next weekend I might poke at the GIS extension to see what's usable in it
	<brion>	woot
	<domas>	ok, recovery isn't that slow, apparently
	<domas>	15% in last 3 minutes
	<domas>	:)
	<brion>	yay
	<domas>	I wonder, if that regression was jfs caused
	<domas>	lol
	<domas>	samples % image name app name symbol name
	<domas>	193349 48.1072 mysqld mysqld buf_flush_insert_sorted_into_flush_list
	<domas>	41817 10.4045 mysqld mysqld recv_apply_hashed_log_recs
	<domas>	38216 9.5085 mysqld mysqld buf_calc_page_new_checksum
	<domas>	does it actually traverse flush list? :)
	* Platonides	filled 308G trying to extract a enwiki history dump
	<domas>	Platonides: lol
	<Platonides>	(the partition got full)
	[INFO]	Conference Mode has been disabled for this view; joins, leaves, quits and nickname changes will be shown.
	|<--	siebrand has left irc.freenode.net (Read error: 60 (Operation timed out))
	<nagios-wm>	Apache on srv183 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
	<CWii>	:O
	<CWii>	There goes another!
	* CWii	counts 3 8 CPU apaches that have dead httpd
	<MZMcBride>	Is it "play sysadmin for a day" day?
	<CWii>	MZMcBride, it's mostly boredom
	<domas>	mark will love it
	|<--	WaRpAtH has left irc.freenode.net (Read error: 113 (No route to host))
	=-=	Pathoschild is now known as slavie|away
	|<--	para has left irc.freenode.net (Read error: 60 (Operation timed out))
	=-=	Alexfusco5 is now known as Alexfusco5|Away
	|<--	Dammit has left irc.freenode.net ("I KILL YOU!!! http://www.youtube.com/watch?v=1uwOL4rB-go")
	-->|	hegesippe3 (n=hegesipp@wikimedia/Hegesippe-Cormier) has joined #wikimedia-tech
	=-=	hegesippe3 is now known as hegesippe
	|<--	hegesippe2 has left irc.freenode.net (Nick collision from services.)
	=-=	dungodung is now known as dungodung|sleep
	-->|	Soxred (n=[X]@wikipedia/Soxred93) has joined #wikimedia-tech
	|<--	Alexfusco5|Away has left irc.freenode.net (Nick collision from services.)
	-->|	Alexfusco5|Away (n=Alex@wikipedia/Alexfusco5) has joined #wikimedia-tech
	|<--	Haley has left irc.freenode.net (Remote closed the connection)
	=-=	Alexfusco5|Away is now known as Alexfusco5
	<nagios-wm>	Apache on srv183 is OK: OK - HTTP/1.1 301 Moved Permanently - 0.025 second response time
	|<--	Alexfusco5 has left irc.freenode.net (Read error: 104 (Connection reset by peer))
	|<--	Platonides has left irc.freenode.net ()
	|<--	emgent has left irc.freenode.net (Read error: 104 (Connection reset by peer))
	<--|	Nemo_bis has left #wikimedia-tech
	<brion>	updating just CodeReview to test a change
	|<--	hausgeist has left irc.freenode.net (Remote closed the connection)
	-->|	hausgeist (n=fnhg@sandportal.de) has joined #wikimedia-tech
	<nagios-wm>	Apache on srv187 is OK: OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time
	<nagios-wm>	Apache on srv161 is OK: OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time
	|<--	Soxred has left irc.freenode.net ("The purpose of life is not to live forever, but to make something that will.")
	|<--	hausgeist has left irc.freenode.net (Client Quit)
	-->|	hausgeist (n=fnhg@sandportal.de) has joined #wikimedia-tech
	<--|	hausgeist has left #wikimedia-tech
	-->|	emgent (n=emgent@ubuntu/member/emgent) has joined #wikimedia-tech
	|<--	AzaTht has left irc.freenode.net ("Ex-Chat")
	|<--	Luna-San has left irc.freenode.net (Success)
	|<--	worby has left irc.freenode.net ("rebooting")
	-->|	ST47 (n=st47@wikipedia/ST47) has joined #wikimedia-tech
	-->|	hausgeist (n=fnhg@sandportal.de) has joined #wikimedia-tech
	|<--	ST47_ has left irc.freenode.net (Connection timed out)
	-->|	aib (n=aib@pdpc/supporter/basic/aib) has joined #wikimedia-tech
	=-=	niabot is now known as niabot_asleep
	|<--	ChrisiPK has left irc.freenode.net (Read error: 110 (Connection timed out))
	-->|	Tommy6 (n=Tommy6@wikia/Tommy6) has joined #wikimedia-tech
	|<--	Az1568_ has left irc.freenode.net ("Leaving")
	=-=	slavie|away is now known as Pathoschild
	|<--	__aib has left irc.freenode.net (Connection timed out)
	-->|	tawker (n=ahuman@wikipedia/Tawker) has joined #wikimedia-tech
	|<--	JeLuF has left irc.freenode.net (Read error: 104 (Connection reset by peer))
	-->|	JeLuF (i=jf@mormo.org) has joined #wikimedia-tech
	|<--	DaBPunkt has left irc.freenode.net (Read error: 101 (Network is unreachable))
	<brion>	bah
	<CWii>	humbug?
	-->|	presroi (n=mathiass@gtng-4db0467b.pool.einsundeins.de) has joined #wikimedia-tech
	|<--	kodoma has left irc.freenode.net (Read error: 60 (Operation timed out))
	|<--	presroi__ has left irc.freenode.net (Read error: 110 (Connection timed out))
	-->|	WaRpAtH (i=VRS@wikimedia/Cometstyles) has joined #wikimedia-tech