◆ The MTA and GTFS

2 May 2011 • Geekery

Just today I found about GTFS, a standardized data format from Google for describing public transit schedules. I subsequently discovered that the MTA has a GTFS feed.

Back when I didn’t have a car, I used the MTA daily and had a set of scripts to extract their schedule data from the website and import it into trainsched on my Palm Pilot. I asked them for more structured data at one point but was rebuffed, so I stuck with my scripts that had to be updated every time they changed their schedule pages.

I don’t have a Palm Pilot any more (and trainsched doesn’t even run on the PalmOS emulator on my WebOS phone), but I thought it would be nice to update my old scripts to use the new GTFS feeds. This should even work with other GTFS feeds; there’s nothing really MTA-specific going on.

The updated scripts are here: GTFS to trainsched converter.

◆ Car Engine Advice Sought

1 Jan 2010 • General

I’ve been having intermittent problems with my car that have now stumped two mechanics, so I’m turning to the Internet for advice. I have a 1996 Jeep Grand Cherokee, with Selec-Track four-wheel-drive and a 6 cylinder engine.

The problem I’m having is that the engine will sometimes just stall (the motive power goes away and the engine RPMs drop to zero). It only seems to happen right after starting the car (so the engine hasn’t warmed up yet), and it’s much more common in wet weather (rain, snow, heavy fog). Often times, I can feel the engine losing power and can prevent a stall by stepping on the gas pedal. If the engine actually stalls, I have to wait at least a minute to restart it; if I try restarting it immediately, I hear the chug-chug-chug of the starter, but the engine won’t start running.

I’ve had a few times when the stalling seemed to hit as I was braking, just as the automatic transmission was downshifting from second to first gear. I’ve had the car sall at times apparently unrelated to shifting, though.

This morning, it was particularly bad. I barely got out of the parking lot at home (after several stop/wait/restart cycles) before the engine stopped being useful—it would only run for 30–45 seconds before stalling, and I couldn’t prevent the stall. I had the car towed to a mechanic, but by the time they took a look at it, the engine started and ran normally. They couldn’t find anything wrong with the engine, and there were no codes from the onboard diagnostics.

The coolant system has a slow leak (my regular mechanic tracked it to the water pump, I think), but I’ve had that for over a year now and I doubt it’s related. In the time since the problem first manifested, I’ve had a system tune-up (including replacing the spark plug wiring, apparently a common contributor to problems such as these), had the oil changed, had the transmission fluid changed, and had a (presumably unrelated) badly-leaking coolant hose replaced.

Any suggestions on what to look for to fix this will be gladly accepted. I’ll be cross-posting this to several social media sites; responses there are fine, or just send an email to pip@aperiodic.net

Update 2010-01-02: I’ve gotten a number of suggestions for problem areas that could be causing this problem. The most-suggested item was the fuel filter, which I haven’t replaced since I got the car five years ago, so I’m taking care of that to start with. I’ll see how well the car behaves with a new filter. Other suggestions have included the idle valve, O2 sensor, or fuel pump. If replacing the fuel filter doesn’t work, I’ll check out some of the other possibilities.

Update from late 2010: The problem turned out to be the engine computer. It was failing to set appropriate engine parameters under various circumstances, particularly when it was wet. It took a lot of trial an error to get to this diagnosis, and my mechanic said this was the first he’d ever seen this particular problem.

◆ Kiva

2 Dec 2009 • General

I’ve been contributing to Kiva for a number of months now, and am sufficiently appreciative of them to want to tell others about them. The short version is that Kiva is a distributed, peer-to-peer microlending non-profit with the goal of alleviating global poverty.

§ Microlending

Microfinance is the practice of providing financial services to people in low-income parts of the world who are not generally considered profitable-enough customers to warrant service from traditional banking institutions.

Microlending (or microcredit) is a subset of microfinance that provides loans to low-income people, typically entrepreneurs in low-income area of the world, who do not have access to loans from traditional banking sources. Sometimes there simply are no banks in the area, but more typically the entrepreneurs don’t make enough money to qualify for loans, don’t have sufficient collateral, or don’t have a sufficient credit history. Modern microlending started in the 1970s with the Bangladesh-based Grameen Bank.

The advantages of microlending over traditional aid programs are in its benefits to developing economies: it encourages growth of local businesses, puts more money into local economies, encourages local employment opportunities, and gives entrepreneurs a foothold to establish their businesses and a credit record. In some areas, it provides an alternative to tribal moneylenders, who charge high interest rates and respond to repayment failures with physical violence. In short, traditional aid addresses people’s basic needs: food, shelter, clothing, education, sustainable agriculture; while microlending works on the next step of growing local economies.

§ Kiva’s contribution

Kiva was created as the first distributed, peer-to-peer microlending organization in the world. It’s peer-to-peer because all Kiva does is list loan applicants on their website; it’s up to the users of the website to supply the money to fund the loans. As the loans are repaid, those repayments go to the people who supplied the money. Kiva is distributed because the funding for any given loan is made up of contributions from many different Kiva users. (And as the loan is repaid, each contributor is repaid according to his or her portion of the overall loan.)

The actual process is a little more involved. Kiva doesn’t make loans directly. Instead, they partner with microfinance institutions (MFIs) in other countries. The MFIs accept loan applications, evaluate the applicants, disburse loans, and collect payments. During this process, the MFIs tell Kiva about the loans in batches, Kiva lists the loans, collects the contributions from Kiva users, and sends the money to the MFIs. When the MFIs collect money, they send the payments (again, in batches) to Kiva, which distributes the payments to the loan contributors. The MFIs often fund the loans before telling Kiva about them (to avoid the delays in the funding process and get the money to the applicant quickly), and then backfill the accounting once the loan is actually funded. In some cases, a loan will expire on Kiva without being fully funded; when that happens, any contributors will have their contributions refunded, and if the MFI already disbursed the loan, they will have to find another source of funding for it.

§ Loan repayment

When payments are received on a loan, the money is made available to contributors as “Kiva Credits”. If you have Kiva credits, you can apply them to a different loan, donate them to Kiva itself, or transfer them into a PayPal account.

§ Defaults

Because most loans are made in developing areas of the world, if someone defaults on their loan, there is often no workable legal recourse for requiring repayment and the people who contributed to the loan are not repaid. The MFIs do research on the applicants before granting loans, and do a generally good job of selecting people who will repay their loans. Kiva has an overall default rate of about 2%. (For comparison, in 2004, the federal Small Business Administration, which makes loans to small businesses in the US, had a 2.4% default rate, which has hit 11.9% with the recession.)

This does mean that any loan made through Kiva carries the risk of losing the money contributed.

§ Interest

Kiva does not collect any interest on its loans. Consequently, Kiva users will only get back exactly what they put into the loan in the first place, unless the loan defaults. This means that anyone contributing to Kiva loans over a long term can expect to lose some money to defaults. Because Kiva is operated as a non-profit organization with the goal of alleviating poverty, they don’t feel it is appropriate to add interest payments to the loans they fund.

The MFIs that Kiva partners with do charge interest, however. The interest that they charge pays for their staff and administration costs. Typically, they charge a small percentage of the loan in interest, plus a flat fee to cover the effort necessary for any loan, regardless of its size. This includes things like interviewing applicants and researching their financial history and dependability. Because the loan amounts are small, that fee often adds a noticeable percentage to the loan, leading to interest rates that seem surprisingly high. The average interest rate on Kiva is about 35%, but I’ve seen rates as high as 50%.

The MFI interest rates are another reason that Kiva doesn’t charge interest themselves; they don’t want to add any more to the borrower’s repayment load.

For each of its partners, Kiva shows both the average interest rate for that MFI and the median interest rate for other MFIs in the same region. People worried about the high rates are encouraged to check up on the MFI before committing to a loan.

§ Kiva’s Funding

Since Kiva doesn’t collect interest and they apply all of a contributor’s money to the loans themselves, they rely on donations to keep running. After you contribute to a loan, they ask for a donation and suggest an amount equal to 15% of your loan contribution. You have the option of choosing a different percentage (including 0%) or specifying a particular dollar amount. You can also give the repayments on your loans to Kiva as a donation, either manually (by donating your Kiva credits) or automatically (by specifying that all repayments should be automatically donated).

Kiva is a 501(c)3 nonprofit, so all donations are tax-deductible, although loans are not. (As mentioned above, loans are not even guaranteed to maintain value.)

§ Criticism

I feel it would be unfair to talk about Kiva without acknowledging that some people have criticisms of the microlending process.

The most common criticism about microlending in general is the high interest rates. Kiva does as well as it can in this area, recognizing that MFIs need to charge enough to fund their activities, but also being as transparent as possible about what is being charged. Kiva also periodically reviews their partners’ claims and practices. (They have closed relations with at least one partner because of fraud—not interest related, in this case; the partner was claiming much higher amounts on its loans than it was actually disbursing.)

Conversely, some people argue that Kiva should themselves be charging interest. I can’t find the article now, but I read a piece a while back that argued that when microfinance funding organizations like Kiva pay interest, contributors change from a “charity” mindset to an “investment” mindset and end up putting more money into system. Personally, I think that the charity mindset is a better fit here. If the goal is to alleviate poverty, then helping people establish businesses with as few impediments (including additional interest) as possible is a better approach than one that is beholden to the best return on investments.

There are also claims that microlending doesn’t help all that much. There was an article in the Boston Globe a few months ago titled “Small change: Billions of dollars and a Nobel Prize later, it looks like ‘microlending’ doesn’t actually do much to fight poverty”. Despite the inflammatory subtitle, the article is reasonably balanced. It refers to two unpublished papers about studies that concluded that microlending is not as effective as many people think. For one thing, I’d want to wait until the papers are published; peer review is likely to shake out any methodological problems in the studies. But the article points out that the studies found that microlending did make a difference for a lot of businesses and helped them either establish themselves or grow. It’s not a panacea—if for no other reason than that the eligibility requirements for “who can take out a loan and be counted on to repay it if there’s little government enforcement to rely on” mean that only a small part of the population will be considered in the first place—but it is a step in the direction of helping establish functioning local economies in poor regions. The article provides a number of pro-microfinance perspectives, most of which boil down to, “The studies show that microlending isn’t magic, but it’s a useful tool—as long as you understand that it isn’t magic.”

Kiva in particular was the target of some criticism recently when they began offering loans to entrepreneurs in the US. Many people felt that people in the US are either well-served by the standard banking system or have access to many government programs and would not get much additional benefit from a microloan. I tend to regard this as a place where it’s best to give people the ability to make their own decisions. Kiva felt that there was benefit to offering loans in the US, and those who agree with them can now participate in those loans. Those who disagree have no obligation to fund US loans, and there are plenty of available loans in poorer countries. (There was a time when Kiva funded every loan it offered and sometimes ran out of new loans to fund. These days, there’s more demand, and loans do periodically expire unfunded.)

§ Conclusion

I think that Kiva provides an important service to developing areas of the world, and I put a portion of my charitable giving into the site each month. Personally, I don’t plan to take money out of it; what I put in goes either towards funding more loans or donations to keep Kiva running. Other people have different financial approaches, of course.

Kiva also isn’t the only place I donate. In addition to local giving, I also contribute to more traditional aid organizations, because microlending is only effective if people are already healthy, educated, and empowered. For giving along more traditional lines, I recommend Heifer International, which mostly focuses on sustainable agriculture in developing regions.

§ Competition

Kiva was the first peer-to-peer microlending website, but many others have followed in its footsteps, and some of its competitors take different approaches to microlending. If you don’t like Kiva, there might be another organization that’s a better fit for you.

The Microfinance Gateway is a large database of microfinance organizations, including aggregators like Kiva. It contains links to many more organizations than I’m going to link to here.
~~United Prosperity~~ doesn’t directly give contributed money to borrowers; it uses that money as collateral for traditional bank loans. They feel that this approach is better at integrating people into the traditional banking system.
The Rotarian Action Group for Microcredit works to assist Rotary groups with setting up their own microlending operations.
~~MicroPlace~~ collects interest on its microloans, so contributors earn a return on their contributions.
Prosper is a more general peer-to-peer lending broker. Contributors can use it either as a general investment fund or a broker for choosing individual loans to fund. Anyone with a decent credit score can request a loan, which is bid on by lenders.

§ My Kiva History

If you want, you can look at my lender page on Kiva. So far, I’ve contributed to 11 loans:

Ibrahim (last name withheld for privacy) in Lebanon: Owns a sewing factory. Currently 58% repaid on a $3,000 loan.
Neang Soknoeun in Cambodia: Weaves and sells groceries. Currently 33% repaid on a $700 loan.
Khosiyat Kuldasheva in Tajikistan: Sells foodstuffs at a local market. Currently 37% repaid on a $3,000 loan.
Harriet Namugabo’s group in Uganda: A 5-person group who are collectively responsible for the loan. The leader sells medicines. Currently 66% repaid on a $875 loan.
Lilane (last name withheld for privacy) in Lebanon: Owns a clothing store. Currently 24% repaid on a $1,000 loan.
Emélido Oporta Alvarado in Costa Rica: Raises cattle. Currently 0% repaid on a $1,175 loan. (His payment schedule is one payment every six months. The first one is due next April.)
Efrain (last name withheld for privacy) in Peru: Grows and sells potatoes and raises cattle. Currently 0% repaid on a $1,200 loan. (He doesn’t have a monthly payment schedule; the entirety of the loan is due next March.)
Felix Alberto Tórrez Machado in Nicaragua: Runs a cybercafe. Currently 13% repaid on a $975 loan.
Anonymous in Iraq: Runs an auto repair shop. Currently 0% repaid on a $3,000 loan. (First payment is due in January. Kiva also notes that government regulations on foreign loans are murky and they might not be able to receive the money from the borrower’s payments. I figured it was good to support a business in Iraq, though, so I accepted the risk.)
Nuevo Amanecer Group in Bolivia: a 13-person group who would be collectively responsible for the loan, with various jobs. Loan expired before being fully funded and my contribution was refunded.
Gaibullodjon Sobitov in Tajikistan: Processes and sells dried apricots. Currently 70% raised on a $1,500 loan, with 2 days before the loan expires.

◆ Betsy McCaughey on the Obama Stimulus Plan: A Rebuttal

12 Feb 2009 • General

Betsy McCaughy has written an opinion piece for bloomberg.com titled Ruin Your Health With the Obama Stimulus Plan. In it, she characterizes the stimulus package (as passed by the House) as containing sinister language that will allow the government to control and reduce the quality of health care available to the American public. I took a look at the sections of the bill that she has issues with, and I found that her conclusions are gross distortions at best. More or less everything she points to in the bill is focused on providing a healthcare information technology infrastructure to allow doctors and hospitals easy access to patients’ complete medical information—nothing more.

(Since I wrote my analysis, the people at FactCheck.org have done their own analysis, which is more in depth than mine. The conclusions are similar—McCaughey is misunderstanding a lot of the bill—although they allow that some parts of the bill don’t necessarily forbid her worst-case scenario. (They also note that much of what she worries about has been in place already, under a program created by George W Bush: the bill just turns an executive order into established law.))

She also makes a lot of comparisons to Tom Daschle’s book Critical: What We Can Do About the Health-Care Crisis, drawing sinister inferences from the comparisons. I haven’t read the book, so I can’t judge how right or wrong she is about its contents, but the contents of the actual bill are not what she says they are.

To begin with, she claims that the government will be overseeing what your doctor does and dictating treatments:

One new bureaucracy, the National Coordinator of Health Information Technology, will monitor treatments to make sure your doctor is doing what the federal government deems appropriate and cost effective. The goal is to reduce costs and “guide” your doctor’s decisions (442, 446).

The actual text of the bill is as follows:

                           page 441
19      “(b) PURPOSE.--The National Coordinator shall per-
20 form the duties under subsection (c) in a manner con-
21 sistent with the development of a nationwide health infor-
22 mation technology infrastructure that allows for the elec-
23 tronic use and exchange of information and that--

                           page 442
 1            “(1) ensures that each patient's health informa-
 2      tion is secure and protected, in accordance with ap-
 3      plicable law;
 4            “(2) improves health care quality, reduces med-
 5      ical errors, reduces health disparities, and advances
 6      the delivery of patient-centered medical care;
 7            “(3) reduces health care costs resulting from
 8      inefficiency, medical errors, inappropriate care, du-
 9      plicative care, and incomplete information;
10            “(4) provides appropriate information to help
11      guide medical decisions at the time and place of
12      care;
13            “(5) ensures the inclusion of meaningful public
14      input in such development of such infrastructure;
15            “(6) improves the coordination of care and in-
16      formation among hospitals, laboratories, physician
17      offices, and other entities through an effective infra-
18      structure for the secure and authorized exchange of
19      health care information;
20            “(7) improves public health activities and facili-
21      tates the early identification and rapid response to
22      public health threats and emergencies, including bio-
23      terror events and infectious disease outbreaks;
24            “(8) facilitates health and clinical research and
25      health care quality;

                           page 443
 1            “(9) promotes prevention of chronic diseases;
 2            “(10) promotes a more effective marketplace,
 3      greater competition, greater systems analysis, in-
 4      creased consumer choice, and improved outcomes in
 5      health care services; and
 6            “(11) improves efforts to reduce health dispari-
 7      ties.

The whole thing is talking about making patients’ medical information available by electronic means to hospitals, doctors, and insurance companies while maintaing the patients’ privacy. Clause (4), in particular, refers to doctors being able to have all of the information they need when making decisions. It does not imply anything about the government dictating doctors’ decisions to them.

She goes on to talk about hospitals and doctors not being “meaningful users” of the system, and she seems to imply that this will result in doctors abandoning treatments that aren’t government-sanctioned:

Hospitals and doctors that are not “meaningful users” of the new system will face penalties. “Meaningful user” isn’t defined in the bill. … What penalties will deter your doctor from going beyond the electronically delivered protocols when your condition is atypical or you need an experimental treatment?

Here’s what the bill has to say about “meaningful users”:

                           page 511
14     “(o) INCENTIVES    FOR  ADOPTION   AND  MEANINGFUL
15 USE OF CERTIFIED EHR TECHNOLOGY.--
16           “(1) INCENTIVE  PAYMENTS.--
17                 “(A) IN GENERAL.--Subject    to the suc-
18           ceeding subparagraphs of this paragraph, with
19           respect to covered professional services fur-
20           nished by an eligible professional during a pay-
21           ment year (as defined in subparagraph (E)), if
22           the eligible professional is a meaningful EHR
23           user (as determined under paragraph (2)) for
24           the reporting period with respect to such year,
25           in addition to the amount otherwise paid under

                           page 512
 1           this part, there also shall be paid to the eligible
 2           professional (or to an employer or facility in the
 3           cases described in clause (A) of section
 4           1842(b)(6)), from the Federal Supplementary
 5           Medical Insurance Trust Fund established
 6           under section 1841 an amount equal to 75 per-
 7           cent of the Secretary's estimate (based on
 8           claims submitted not later than 2 months after
 9           the end of the payment year) of the allowed
10           charges under this part for all such covered
11           professional services furnished by the eligible
12           professional during such year. 

                           page 540
 6     “(3) MEANINGFUL      EHR USER.--
 7           “(A) IN    GENERAL.--For purposes of para-
 8     graph (1), an eligible hospital shall be treated
 9     as a meaningful EHR user for a reporting pe-
10     riod for a payment year (or, for purposes of
11     subsection (b)(3)(B)(ix), for a reporting period
12     under such subsection for a fiscal year) if each
13     of the following requirements are met: 
14                 “(i) MEANINGFUL  USE OF CERTIFIED
15           EHR TECHNOLOGY.--The       eligible hospital
16           demonstrates to the satisfaction of the Sec-
17           retary, in accordance with subparagraph
18           (C)(i), that during such period the hospital
19           is using certified EHR technology in a
20           meaningful manner. 
21                 “(ii) INFORMATION  EXCHANGE.--The
22           eligible hospital demonstrates to the satis-
23           faction of the Secretary, in accordance
24           with subparagraph (C)(i), that during such
25           period such certified EHR technology is

                           page 541
 1           connected in a manner that provides, in
 2           accordance with law and standards appli-
 3           cable to the exchange of information, for
 4           the electronic exchange of health informa-
 5           tion to improve the quality of health care,
 6           such as promoting care coordination. 
 7                 “(iii) REPORTING     ON   MEASURES
 8           USING      EHR.--Subject  to subparagraph
 9           (B)(ii) and using such certified EHR tech-
10           nology, the eligible hospital submits infor-
11           mation for such period, in a form and
12           manner specified by the Secretary, on such
13           clinical quality measures and such other
14           measures as selected by the Secretary
15           under subparagraph (B)(i).

“Meaningful EHR users” will be given financial incentives to use the electronic system. If a doctor or hospital is not a “meaningful EHR user”, that means only that they’re not making use of the electronic medical information infrastructure. It has nothing to do with what treatments they offer. Furthermore, non-“meaningful users” don’t face penalties; they just don’t have the government helping them pay for the upfront cost of implementing the electronic system.

Next, she claims that the bill creates a department to prevent people from using new medical treatments because they’re more expensive:

In his book, Daschle proposed an appointed body with vast powers to make the “tough” decisions elected politicians won’t make. The stimulus bill does that, and calls it the Federal Coordinating Council for Comparative Effectiveness Research (190-192). The goal, Daschle’s book explained, is to slow the development and use of new medications and technologies because they are driving up costs.

Again, the bill:

                           page 190
 7 SEC. 9201. FEDERAL COORDINATING COUNCIL FOR COM-
 8                 PARATIVE EFFECTIVENESS RESEARCH. 
 9        (a) ESTABLISHMENT.--There is hereby established a
10 Federal Coordinating Council for Comparative Effective-
11 ness Research (in this section referred to as the “Coun-
12 cil”). 
13        (b) PURPOSE; DUTIES.--The Council shall--
14             (1) assist the offices and agencies of the Fed-
15        eral Government, including the Departments of
16        Health and Human Services, Veterans Affairs, and
17        Defense, and other Federal departments or agencies,
18        to coordinate the conduct or support of comparative
19        effectiveness and related health services research;
20        and
21             (2) advise the President and Congress on--
22                  (A) strategies with respect to the infra-
23             structure needs of comparative effectiveness re-
24             search within the Federal Government;

                           page 191
 1                  (B) appropriate organizational expendi-
 2             tures for comparative effectiveness research by
 3             relevant Federal departments and agencies; and
 4                  (C) opportunities to assure optimum co-
 5             ordination of comparative effectiveness and re-
 6             lated health services research conducted or sup-
 7             ported by relevant Federal departments and
 8             agencies, with the goal of reducing duplicative
 9             efforts and encouraging coordinated and com-
10             plementary use of resources.

The supposedly sinister Federal Coordinating Council for Comparative Effectiveness Research seems to be more like the Department of Homeland Security for existing federal health care systems. Programs like Medicaid, veteran’s benefits, and federal employee health insurance already make estimations on the effectiveness of various treatments, just like every private health insurance company. The new council would share information across the federal programs that already exist.

Following that, she claims that the bill would change the standards for approved Medicare treatments, implying that fewer treatments will be approved:

Medicare now pays for treatments deemed safe and effective. The stimulus bill would change that and apply a cost- effectiveness standard set by the Federal Council (464).

Page 464 contains references to implementing things specified under section 3004, which describes the process for adopting and implementing the previously-mentioned technology infrastructure. I can find no mention of Medicare. It seems very much like this claim is completely fabricated.

Every complaint that the author raises, everything that she claims will affect what care doctors are willing to give, seems to only concern the healthcare information technology infrastructure that Obama has been talking about implementing for months. I see no places where the government will be dictating anything to doctors, aside from providing financial incentives (not punishments) to work with the new infrastructure. None of the author’s claims about the government cutting off healthcare seem at all founded in fact. At best, she’s consistently misreading the text of the bill. I think it more likely that she has other reasons not to want the bill to pass and is distorting the reality of the situation to steer people in the direction she wants them to go.

This post is available under a CC0 Waiver. Feel free to copy it anywhere.

◆ Backup Implementation

2 Jan 2009 • Geekery

I previously considered different backup schemes. Writing that entry crystallized my basic ideas about what I was going to do about the backups. I then proceeded to implement them. This entry is a detailed description of what I did.

§ Backup Overview

I ended up buying two 80 GB hard drives and a ThermalTake BlacX external enclosure. The overall plan is to do backups to one of the drives on a daily, automated basis, and the other on a periodic, maybe monthly basis. Most of the time, the periodic backup drive will live at my office and will serve as my offsite backup.

I want to have a backup history in the same way that a tape rotation scheme would give me. That way, if I don’t notice there’s something wrong with a file for a time, there’s still a good chance I can retrieve it. I also want things stored space-efficiently, so backing up unchanged files doesn’t take up additional space. This is accomplished pretty easily with rsync; I do one full backup, and then subsequent backups use rsync’s --link-dest option pointing to the most recent complete backup; any files that haven’t changed are simply hardlinked together, so the two directory entries point to the same physical location on the disk.

For the backup history, I decided to go with a variant of the Towers of Hanoi backup rotation. Instead of using a preset number of pegs, as I would have to do if I were using physical tapes, I can just calculate the numbers on the fly, effectively behaving as if I had an infinite number of tapes. This rotation gives me roughly exponential backoff for my history; I can look at backups from yesterday, two days ago, four days ago, eight days ago, and so on.

Finally, I decided to encrypt the drives. This lets me cart them around with confidence that if they get lost or stolen, anyone who ends up with them won’t have my personal files. I used LUKS to encrypt the disks, and used both a file on my computer and a password as keys. The file makes it easier to mount the drives without manual intervention (important if my desktop reboots), while the password lets me get at the data if the key file isn’t available (important if the main disk dies and I need my backups).

§ Backup Procedure

# Set up fstab.

The first thing I did was to set up /etc/fstab for the disks. Since I only have one enclosure, I have to swap disks, so only one will ever be connected at the same time. Thus, I mount whichever’s in at the moment on /backups. Likewise, I will associate each one with the dm-crypt name of “backups”, so the device will be /dev/mapper/backups. Thus, I added the following line to /etc/fstab:

/dev/mapper/backups  /backups  auto  defaults  0  0

# Set up `rsync` filters.

I’m using rsync to do the backups, but in some cases I don’t want everything in the source directories to be backed up. Thus, I tell rsync to look at a filter file for each directory so it knows what to skip and what to keep. rsync will go through the filters for each file or directory it considers, and will take the first action that matches. If nothing matched, the file is copied. If a directory is ignored, none of the files in that directory are considered at all, so I had to include a few explicit directory chains.

In /var, I only want to back up a few things, so the final pattern ignores anything that isn’t explicitly included.

+ /backups
+ /backups/**
+ /lib
+ /lib/bzr
+ /lib/bzr/**
+ /lib/svn
+ /lib/svn/**
- *

For my home directory, I include everything, with a few exceptions. For instance, most of my music directory can be reripped from CD if needed, so I don’t need to take up space backing up those files. On the other hand, I have some files that I either purchased online or downloaded and wouldn’t be able to easily replace if they were lost, so I do back them up. Here’s an excerpt from my home filter file:

+ /movies/Star_Wars_Holiday_Special
+ /movies/Star_Wars_Holiday_Special/**
- /movies

+ /music
+ /music/Bonerama
+ /music/Bonerama/Bringing_It_Home
+ /music/Bonerama/Bringing_It_Home/**
+ /music/Jonathan_Coulton
+ /music/Jonathan_Coulton/Thing_a_Week_Three
+ /music/Jonathan_Coulton/Thing_a_Week_Three/03-Code_Monkey*
+ /music/Nine_Inch_Nails
+ /music/Nine_Inch_Nails/Ghosts_I-IV
+ /music/Nine_Inch_Nails/Ghosts_I-IV/**
+ /music/Nine_Inch_Nails/The_Slip
+ /music/Nine_Inch_Nails/The_Slip/**
+ /music/Obelix
+ /music/Obelix/**
+ /music/Solo_String_Project
+ /music/Solo_String_Project/**
- /music/**

- /tmp

# Initialize disks.

I wrote a script to initialize the disks for me: init-backup-disk. It takes two parameters: the name of the device for the backup disk, and the file to use as a key for the partition. If the key file doesn’t exist, it will be created.

After a few sanity checks, the script starts doing things. It starts by checking that the disk is good with badblocks. If it encounters any errors, it stops there and the drive needs to be sent in for warranty replacement. Following that, it goes into the most time-consuming part of the initialization: writing random data to the disk. (badblocks already wrote random data, but its PRNG is somewhat simplistic; /dev/urandom is a much better source of pseudo-random numbers.) Without this step, it would be obvious which bits of the disk had encrypted data on them. I use pv to give a progress meter and time estimate. On my computer, badblocks took a little over two hours and /dev/urandom took about eight hours for each 80GB disk.

# Check and randomize disk. 
badblocks -b 512 -s -w -t random -v $disk || exit 2
</dev/urandom pv -s $(fdisk -l $disk |
    perl -nle 'm{^Disk '${disk}': [0-9.]+ [KMGT]B, (\d+) bytes$} and print $1') |
  dd bs=512 conv=sync,noerror of=$disk

The next step is to format the encrypted partition. I use sfdisk to create a single partition that spans the entire drive, followed by cryptsetup to do the format. I explicitly specify the cipher in order to use ESSIV, which makes certain attacks more difficult. The --batch-mode option keeps it from asking for confirmation before writing. The second call to sfdisk just tells the kernel to reread the disk’s partitions so it will pick up the UUID that cryptsetup created.

# Add and format the LUKS partition. 
echo , | sfdisk --Linux $disk
cryptsetup luksFormat --cipher aes-cbc-essiv:sha256 --batch-mode ${disk}1 $keyfile
sfdisk -R $disk; sleep 5

Next, I open, format, and mount the partition. JFS is the filesystem that’s been nicest to me, of all the journaling filesystems I’ve tried. (In the future, it might be nice to use ZFS for the backup disks—I’d get better detection of disk errors, at least—but I don’t think it would play entirely well with running over an encrypted volume, and they haven’t integrated encryption into ZFS yet, as far as I can tell.)

# Open LUKS partition, format and mount the encrypted volume. 
cryptsetup --key-file $keyfile luksOpen ${disk}1 backups
mkfs -t jfs -q /dev/mapper/backups
mount /backups

Now I run the initial backup. Each backup version is just a separate numbered directory in the partition, so the first one is ‘1’. I’m backing up /etc, some of /var, my and my wife’s home directories, and any databases I have. My wife’s stuff isn’t directly backed up here because she’s on a different computer; I have to initiate her backup from there. The script, in its first requirement for user interaction, will wait until I tell it that’s done.

# Do the initial backup. 
mkdir -vp /backups/1/{etc,var,pip,postgresql,mysql,rivana}
chmod a-r /backups/1
chown rivana /backups/1/rivana
chown postgres /backups/1/postgresql
rsync -avP --filter 'merge /etc/backups/etc-filter' /etc/ /backups/1/etc/
rsync -avP --filter 'merge /etc/backups/var-filter' /var/ /backups/1/var/
rsync -avP --filter 'merge /etc/backups/pip-filter' /home/pip/ /backups/1/pip/
su -c 'pg_dumpall -v >/backups/1/postgresql/dump' postgres
mysqldump -v --all-databases >/backups/1/mysql/dump
echo -n "Hit enter when rivana is backed up.  "
read foo

Now that the backup is done, the script unmounts and deactivates the partition.

# Deactivate the encrypted volume. 
umount /backups
cryptsetup luksClose backups

And I get prompted for the password that will unlock the partition if the key file isn’t available.

# Add user password. 
cryptsetup --key-file $keyfile --verify-passphrase luksAddKey ${disk}1

Finally, the script displays the UUID for the partition, which is needed for later use.

# Display the partition's UUID. 
echo -n 'UUID: '
cryptsetup luksUUID ${disk}1

# Set up crypttab.

Debian, at least, has an /etc/crypttab file that lists encrypted partitions to be enabled at boot time. I put the onsite backup disk in there so it’ll be automatically mounted if the computer reboots. This plus a backup cronjob make the backup process completely automated.

backups  /dev/disk/by-uuid/<onsite UUID>  <key file>  luks

# Do local backups.

I have a simple script to do the daily backups: perform-backup. It’s basically the same as the initial backup, but with the --link-dest option as I mentioned previously.

last_num=$(ls -t /backups | head -1)
((num=$last_num+1))

mkdir -p /backups/$num/{etc,var,pip,postgresql,mysql,rivana}
chown rivana /backups/$num/rivana
chmod a-r /backups/$num
rsync -a --filter 'merge /etc/backups/etc-filter' --link-dest=/backups/$last_num/etc /etc/ /backups/$num/etc/
rsync -a --filter 'merge /etc/backups/var-filter' --link-dest=/backups/$last_num/var /var/ /backups/$num/var/
rsync -a --filter 'merge /etc/backups/pip-filter' --link-dest=/backups/$last_num/pip /home/pip/ /backups/$num/pip/
chown postgres /backups/$num/postgresql
su -c "pg_dumpall >/backups/$num/postgresql/dump" postgres
mysqldump --all-databases >/backups/$num/mysql/dump

# Do Becca’s backup

My wife has her own computer but, fortunately, simpler backup requirements. I have ssh public key authentication set up so she can ssh to my computer without a password, which makes the backups work properly in an automated fashion.

The initial backup is a simple rsync one-liner.

rsync -avP ~/ mithrandir:/backups/1/rivana/

Subsequent backups are a simple script:

#!/bin/sh

num=$(ssh mithrandir ls -t /backups | head -1)
last_num=$(ssh mithrandir ls -t /backups | head -2 | tail -1)

rsync -a --link-dest=/backups/${last_num}/rivana ${HOME}/ mithrandir:/backups/${num}/rivana/

# Backup rotation.

The Towers of Hanoi rotation is effected by a script that goes through and deletes any directories that don’t need to be there: purge-backups. I won’t quote it here because I don’t think it’s all that interesting. It just finds the largest power of two less than or equal to the current number and then works its way down from that to enumerate all of the directories, deleting everything else.

# Offsite backups.

Every so often I’ll bring the offsite disk home and back up to it. The script for that is insert-offsite-backup. It unmounts the onsite disk, waits for me to insert the offsite disk, runs a backup, unmounts the offsite disk, waits for me to reinsert the onsite disk, then remounts that disk. It needs to be told what my offsite UUID is, but it picks up all the other settings from /etc/crypttab.

The backup takes about half an hour, so I have ample time to manually run the backup script on Becca’s computer.

Pip! Gold

Articles

◆ The MTA and GTFS

◆ Car Engine Advice Sought

◆ Kiva

§ Microlending

§ Kiva’s contribution

§ Loan repayment

§ Defaults

§ Interest

§ Kiva’s Funding

§ Criticism

§ Conclusion

§ Competition

§ My Kiva History

◆ Betsy McCaughey on the Obama Stimulus Plan: A Rebuttal

◆ Backup Implementation

§ Backup Overview

§ Backup Procedure

# Set up fstab.

# Set up `rsync` filters.

# Initialize disks.

# Set up crypttab.

# Do local backups.

# Do Becca’s backup

# Backup rotation.

# Offsite backups.

Articles

§ Microlending

§ Kiva’s contribution

§ Loan repayment

§ Defaults

§ Interest

§ Kiva’s Funding

§ Criticism

§ Conclusion

§ Competition

§ My Kiva History

§ Backup Overview

§ Backup Procedure

# Set up fstab.

# Set up rsync filters.

# Initialize disks.

# Set up crypttab.

# Do local backups.

# Do Becca’s backup

# Backup rotation.

# Offsite backups.

# Set up `rsync` filters.