Kiva

I’ve been contributing to Kiva for a number of months now, and am sufficiently appreciative of them to want to tell others about them.  The short version is that Kiva is a distributed, peer-to-peer microlending non-profit with the goal of alleviating global poverty.

§ Microlending

Microfinance is the practice of providing financial services to people in low-income parts of the world who are not generally considered profitable-enough customers to warrant service from traditional banking institutions.

Microlending (or microcredit) is a subset of microfinance that provides loans to low-income people, typically entrepreneurs in low-income area of the world, who do not have access to loans from traditional banking sources.  Sometimes there simply are no banks in the area, but more typically the entrepreneurs don’t make enough money to qualify for loans, don’t have sufficient collateral, or don’t have a sufficient credit history.  Modern microlending started in the 1970s with the Bangladesh-based Grameen Bank.

The advantages of microlending over traditional aid programs are in its benefits to developing economies: it encourages growth of local businesses, puts more money into local economies, encourages local employment opportunities, and gives entrepreneurs a foothold to establish their businesses and a credit record.  In some areas, it provides an alternative to tribal moneylenders, who charge high interest rates and respond to repayment failures with physical violence.  In short, traditional aid addresses people’s basic needs: food, shelter, clothing, education, sustainable agriculture; while microlending works on the next step of growing local economies.

§ Kiva’s contribution

Kiva was created as the first distributed, peer-to-peer microlending organization in the world.  It’s peer-to-peer because all Kiva does is list loan applicants on their website; it’s up to the users of the website to supply the money to fund the loans.  As the loans are repaid, those repayments go to the people who supplied the money.  Kiva is distributed because the funding for any given loan is made up of contributions from many different Kiva users.  (And as the loan is repaid, each contributor is repaid according to his or her portion of the overall loan.)

The actual process is a little more involved.  Kiva doesn’t make loans directly.  Instead, they partner with microfinance institutions (MFIs) in other countries.  The MFIs accept loan applications, evaluate the applicants, disburse loans, and collect payments.  During this process, the MFIs tell Kiva about the loans in batches, Kiva lists the loans, collects the contributions from Kiva users, and sends the money to the MFIs.  When the MFIs collect money, they send the payments (again, in batches) to Kiva, which distributes the payments to the loan contributors.  The MFIs often fund the loans before telling Kiva about them (to avoid the delays in the funding process and get the money to the applicant quickly), and then backfill the accounting once the loan is actually funded.  In some cases, a loan will expire on Kiva without being fully funded; when that happens, any contributors will have their contributions refunded, and if the MFI already disbursed the loan, they will have to find another source of funding for it.

§ Loan repayment

When payments are received on a loan, the money is made available to contributors as “Kiva Credits”.  If you have Kiva credits, you can apply them to a different loan, donate them to Kiva itself, or transfer them into a PayPal account.

§ Defaults

Because most loans are made in developing areas of the world, if someone defaults on their loan, there is often no workable legal recourse for requiring repayment and the people who contributed to the loan are not repaid.  The MFIs do research on the applicants before granting loans, and do a generally good job of selecting people who will repay their loans.  Kiva has an overall default rate of about 2%.  (For comparison, in 2004, the federal Small Business Administration, which makes loans to small businesses in the US, had a 2.4% default rate, which has hit 11.9% with the recession.)

This does mean that any loan made through Kiva carries the risk of losing the money contributed.

§ Interest

Kiva does not collect any interest on its loans.  Consequently, Kiva users will only get back exactly what they put into the loan in the first place, unless the loan defaults.  This means that anyone contributing to Kiva loans over a long term can expect to lose some money to defaults.  Because Kiva is operated as a non-profit organization with the goal of alleviating poverty, they don’t feel it is appropriate to add interest payments to the loans they fund.

The MFIs that Kiva partners with do charge interest, however.  The interest that they charge pays for their staff and administration costs.  Typically, they charge a small percentage of the loan in interest, plus a flat fee to cover the effort necessary for any loan, regardless of its size.  This includes things like interviewing applicants and researching their financial history and dependability.  Because the loan amounts are small, that fee often adds a noticeable percentage to the loan, leading to interest rates that seem surprisingly high.  The average interest rate on Kiva is about 35%, but I’ve seen rates as high as 50%.

The MFI interest rates are another reason that Kiva doesn’t charge interest themselves; they don’t want to add any more to the borrower’s repayment load.

For each of its partners, Kiva shows both the average interest rate for that MFI and the median interest rate for other MFIs in the same region.  People worried about the high rates are encouraged to check up on the MFI before committing to a loan.

§ Kiva’s Funding

Since Kiva doesn’t collect interest and they apply all of a contributor’s money to the loans themselves, they rely on donations to keep running.  After you contribute to a loan, they ask for a donation and suggest an amount equal to 15% of your loan contribution.  You have the option of choosing a different percentage (including 0%) or specifying a particular dollar amount.  You can also give the repayments on your loans to Kiva as a donation, either manually (by donating your Kiva credits) or automatically (by specifying that all repayments should be automatically donated).

Kiva is a 501(c)3 nonprofit, so all donations are tax-deductible, although loans are not.  (As mentioned above, loans are not even guaranteed to maintain value.)

§ Criticism

I feel it would be unfair to talk about Kiva without acknowledging that some people have criticisms of the microlending process.

The most common criticism about microlending in general is the high interest rates.  Kiva does as well as it can in this area, recognizing that MFIs need to charge enough to fund their activities, but also being as transparent as possible about what is being charged.  Kiva also periodically reviews their partners’ claims and practices.  (They have closed relations with at least one partner because of fraud—not interest related, in this case; the partner was claiming much higher amounts on its loans than it was actually disbursing.)

Conversely, some people argue that Kiva should themselves be charging interest.  I can’t find the article now, but I read a piece a while back that argued that when microfinance funding organizations like Kiva pay interest, contributors change from a “charity” mindset to an “investment” mindset and end up putting more money into system.  Personally, I think that the charity mindset is a better fit here.  If the goal is to alleviate poverty, then helping people establish businesses with as few impediments (including additional interest) as possible is a better approach than one that is beholden to the best return on investments.

There are also claims that microlending doesn’t help all that much.  There was an article in the Boston Globe a few months ago titled “Small change: Billions of dollars and a Nobel Prize later, it looks like ‘microlending’ doesn’t actually do much to fight poverty”.  Despite the inflammatory subtitle, the article is reasonably balanced.  It refers to two unpublished papers about studies that concluded that microlending is not as effective as many people think.  For one thing, I’d want to wait until the papers are published; peer review is likely to shake out any methodological problems in the studies.  But the article points out that the studies found that microlending did make a difference for a lot of businesses and helped them either establish themselves or grow.  It’s not a panacea—if for no other reason than that the eligibility requirements for “who can take out a loan and be counted on to repay it if there’s little government enforcement to rely on” mean that only a small part of the population will be considered in the first place—but it is a step in the direction of helping establish functioning local economies in poor regions.  The article provides a number of pro-microfinance perspectives, most of which boil down to, “The studies show that microlending isn’t magic, but it’s a useful tool—as long as you understand that it isn’t magic.”

Kiva in particular was the target of some criticism recently when they began offering loans to entrepreneurs in the US.  Many people felt that people in the US are either well-served by the standard banking system or have access to many government programs and would not get much additional benefit from a microloan.  I tend to regard this as a place where it’s best to give people the ability to make their own decisions.  Kiva felt that there was benefit to offering loans in the US, and those who agree with them can now participate in those loans.  Those who disagree have no obligation to fund US loans, and there are plenty of available loans in poorer countries.  (There was a time when Kiva funded every loan it offered and sometimes ran out of new loans to fund.  These days, there’s more demand, and loans do periodically expire unfunded.)

§ Conclusion

I think that Kiva provides an important service to developing areas of the world, and I put a portion of my charitable giving into the site each month.  Personally, I don’t plan to take money out of it; what I put in goes either towards funding more loans or donations to keep Kiva running.  Other people have different financial approaches, of course.

Kiva also isn’t the only place I donate.  In addition to local giving, I also contribute to more traditional aid organizations, because microlending is only effective if people are already healthy, educated, and empowered.  For giving along more traditional lines, I recommend Heifer International, which mostly focuses on sustainable agriculture in developing regions.

§ Competition

Kiva was the first peer-to-peer microlending website, but many others have followed in its footsteps, and some of its competitors take different approaches to microlending.  If you don’t like Kiva, there might be another organization that’s a better fit for you.

  • The Microfinance Gateway is a large database of microfinance organizations, including aggregators like Kiva.  It contains links to many more organizations than I’m going to link to here.
  • United Prosperity doesn’t directly give contributed money to borrowers; it uses that money as collateral for traditional bank loans.  They feel that this approach is better at integrating people into the traditional banking system.
  • The Rotarian Action Group for Microcredit works to assist Rotary groups with setting up their own microlending operations.
  • MicroPlace collects interest on its microloans, so contributors earn a return on their contributions.
  • Prosper is a more general peer-to-peer lending broker.  Contributors can use it either as a general investment fund or a broker for choosing individual loans to fund.  Anyone with a decent credit score can request a loan, which is bid on by lenders.

§ My Kiva History

If you want, you can look at my lender page on Kiva.  So far, I’ve contributed to 11 loans:

  • Ibrahim (last name withheld for privacy) in Lebanon:  Owns a sewing factory.  Currently 58% repaid on a $3,000 loan.
  • Neang Soknoeun in Cambodia: Weaves and sells groceries.  Currently 33% repaid on a $700 loan.
  • Khosiyat Kuldasheva in Tajikistan: Sells foodstuffs at a local market.  Currently 37% repaid on a $3,000 loan.
  • Harriet Namugabo’s group in Uganda: A 5-person group who are collectively responsible for the loan.  The leader sells medicines.  Currently 66% repaid on a $875 loan.
  • Lilane (last name withheld for privacy) in Lebanon: Owns a clothing store.  Currently 24% repaid on a $1,000 loan.
  • Emélido Oporta Alvarado in Costa Rica: Raises cattle.  Currently 0% repaid on a $1,175 loan.  (His payment schedule is one payment every six months.  The first one is due next April.)
  • Efrain (last name withheld for privacy) in Peru: Grows and sells potatoes and raises cattle.  Currently 0% repaid on a $1,200 loan.  (He doesn’t have a monthly payment schedule; the entirety of the loan is due next March.)
  • Felix Alberto Tórrez Machado in Nicaragua: Runs a cybercafe.  Currently 13% repaid on a $975 loan.
  • Anonymous in Iraq: Runs an auto repair shop.  Currently 0% repaid on a $3,000 loan.  (First payment is due in January.  Kiva also notes that government regulations on foreign loans are murky and they might not be able to receive the money from the borrower’s payments.  I figured it was good to support a business in Iraq, though, so I accepted the risk.)
  • Nuevo Amanecer Group in Bolivia: a 13-person group who would be collectively responsible for the loan, with various jobs.  Loan expired before being fully funded and my contribution was refunded.
  • Gaibullodjon Sobitov in Tajikistan: Processes and sells dried apricots.  Currently 70% raised on a $1,500 loan, with 2 days before the loan expires.

Betsy McCaughey on the Obama Stimulus Plan: A Rebuttal

Betsy McCaughy has written an opinion piece for bloomberg.com titled Ruin Your Health With the Obama Stimulus Plan.  In it, she characterizes the stimulus package (as passed by the House) as containing sinister language that will allow the government to control and reduce the quality of health care available to the American public.  I took a look at the sections of the bill that she has issues with, and I found that her conclusions are gross distortions at best.  More or less everything she points to in the bill is focused on providing a healthcare information technology infrastructure to allow doctors and hospitals easy access to patients’ complete medical information—nothing more.

(Since I wrote my analysis, the people at FactCheck.org have done their own analysis, which is more in depth than mine.  The conclusions are similar—McCaughey is misunderstanding a lot of the bill—although they allow that some parts of the bill don’t necessarily forbid her worst-case scenario.  (They also note that much of what she worries about has been in place already, under a program created by George W Bush: the bill just turns an executive order into established law.))

She also makes a lot of comparisons to Tom Daschle’s book Critical: What We Can Do About the Health-Care Crisis, drawing sinister inferences from the comparisons.  I haven’t read the book, so I can’t judge how right or wrong she is about its contents, but the contents of the actual bill are not what she says they are.

To begin with, she claims that the government will be overseeing what your doctor does and dictating treatments:

One new bureaucracy, the National Coordinator of Health Information Technology, will monitor treatments to make sure your doctor is doing what the federal government deems appropriate and cost effective. The goal is to reduce costs and “guide” your doctor’s decisions (442, 446).

The actual text of the bill is as follows:

                           page 441
19      “(b) PURPOSE.--The National Coordinator shall per-
20 form the duties under subsection (c) in a manner con-
21 sistent with the development of a nationwide health infor-
22 mation technology infrastructure that allows for the elec-
23 tronic use and exchange of information and that--

                           page 442
 1            “(1) ensures that each patient's health informa-
 2      tion is secure and protected, in accordance with ap-
 3      plicable law;
 4            “(2) improves health care quality, reduces med-
 5      ical errors, reduces health disparities, and advances
 6      the delivery of patient-centered medical care;
 7            “(3) reduces health care costs resulting from
 8      inefficiency, medical errors, inappropriate care, du-
 9      plicative care, and incomplete information;
10            “(4) provides appropriate information to help
11      guide medical decisions at the time and place of
12      care;
13            “(5) ensures the inclusion of meaningful public
14      input in such development of such infrastructure;
15            “(6) improves the coordination of care and in-
16      formation among hospitals, laboratories, physician
17      offices, and other entities through an effective infra-
18      structure for the secure and authorized exchange of
19      health care information;
20            “(7) improves public health activities and facili-
21      tates the early identification and rapid response to
22      public health threats and emergencies, including bio-
23      terror events and infectious disease outbreaks;
24            “(8) facilitates health and clinical research and
25      health care quality;

                           page 443
 1            “(9) promotes prevention of chronic diseases;
 2            “(10) promotes a more effective marketplace,
 3      greater competition, greater systems analysis, in-
 4      creased consumer choice, and improved outcomes in
 5      health care services; and
 6            “(11) improves efforts to reduce health dispari-
 7      ties. 

The whole thing is talking about making patients’ medical information available by electronic means to hospitals, doctors, and insurance companies while maintaing the patients’ privacy.  Clause (4), in particular, refers to doctors being able to have all of the information they need when making decisions.  It does not imply anything about the government dictating doctors’ decisions to them.

She goes on to talk about hospitals and doctors not being “meaningful users” of the system, and she seems to imply that this will result in doctors abandoning treatments that aren’t government-sanctioned:

Hospitals and doctors that are not “meaningful users” of the new system will face penalties.  “Meaningful user” isn’t defined in the bill. … What penalties will deter your doctor from going beyond the electronically delivered protocols when your condition is atypical or you need an experimental treatment?

Here’s what the bill has to say about “meaningful users”:

                           page 511
14     “(o) INCENTIVES    FOR  ADOPTION   AND  MEANINGFUL
15 USE OF CERTIFIED EHR TECHNOLOGY.--
16           “(1) INCENTIVE  PAYMENTS.--
17                 “(A) IN GENERAL.--Subject    to the suc-
18           ceeding subparagraphs of this paragraph, with
19           respect to covered professional services fur-
20           nished by an eligible professional during a pay-
21           ment year (as defined in subparagraph (E)), if
22           the eligible professional is a meaningful EHR
23           user (as determined under paragraph (2)) for
24           the reporting period with respect to such year,
25           in addition to the amount otherwise paid under

                           page 512
 1           this part, there also shall be paid to the eligible
 2           professional (or to an employer or facility in the
 3           cases described in clause (A) of section
 4           1842(b)(6)), from the Federal Supplementary
 5           Medical Insurance Trust Fund established
 6           under section 1841 an amount equal to 75 per-
 7           cent of the Secretary's estimate (based on
 8           claims submitted not later than 2 months after
 9           the end of the payment year) of the allowed
10           charges under this part for all such covered
11           professional services furnished by the eligible
12           professional during such year. 

                           page 540
 6     “(3) MEANINGFUL      EHR USER.--
 7           “(A) IN    GENERAL.--For purposes of para-
 8     graph (1), an eligible hospital shall be treated
 9     as a meaningful EHR user for a reporting pe-
10     riod for a payment year (or, for purposes of
11     subsection (b)(3)(B)(ix), for a reporting period
12     under such subsection for a fiscal year) if each
13     of the following requirements are met: 
14                 “(i) MEANINGFUL  USE OF CERTIFIED
15           EHR TECHNOLOGY.--The       eligible hospital
16           demonstrates to the satisfaction of the Sec-
17           retary, in accordance with subparagraph
18           (C)(i), that during such period the hospital
19           is using certified EHR technology in a
20           meaningful manner. 
21                 “(ii) INFORMATION  EXCHANGE.--The
22           eligible hospital demonstrates to the satis-
23           faction of the Secretary, in accordance
24           with subparagraph (C)(i), that during such
25           period such certified EHR technology is

                           page 541
 1           connected in a manner that provides, in
 2           accordance with law and standards appli-
 3           cable to the exchange of information, for
 4           the electronic exchange of health informa-
 5           tion to improve the quality of health care,
 6           such as promoting care coordination. 
 7                 “(iii) REPORTING     ON   MEASURES
 8           USING      EHR.--Subject  to subparagraph
 9           (B)(ii) and using such certified EHR tech-
10           nology, the eligible hospital submits infor-
11           mation for such period, in a form and
12           manner specified by the Secretary, on such
13           clinical quality measures and such other
14           measures as selected by the Secretary
15           under subparagraph (B)(i). 

“Meaningful EHR users” will be given financial incentives to use the electronic system.  If a doctor or hospital is not a “meaningful EHR user”, that means only that they’re not making use of the electronic medical information infrastructure.  It has nothing to do with what treatments they offer.  Furthermore, non-“meaningful users” don’t face penalties; they just don’t have the government helping them pay for the upfront cost of implementing the electronic system.

Next, she claims that the bill creates a department to prevent people from using new medical treatments because they’re more expensive:

In his book, Daschle proposed an appointed body with vast powers to make the “tough” decisions elected politicians won’t make.  The stimulus bill does that, and calls it the Federal Coordinating Council for Comparative Effectiveness Research (190-192).  The goal, Daschle’s book explained, is to slow the development and use of new medications and technologies because they are driving up costs.

Again, the bill:

                           page 190
 7 SEC. 9201. FEDERAL COORDINATING COUNCIL FOR COM-
 8                 PARATIVE EFFECTIVENESS RESEARCH. 
 9        (a) ESTABLISHMENT.--There is hereby established a
10 Federal Coordinating Council for Comparative Effective-
11 ness Research (in this section referred to as the “Coun-
12 cil”). 
13        (b) PURPOSE; DUTIES.--The Council shall--
14             (1) assist the offices and agencies of the Fed-
15        eral Government, including the Departments of
16        Health and Human Services, Veterans Affairs, and
17        Defense, and other Federal departments or agencies,
18        to coordinate the conduct or support of comparative
19        effectiveness and related health services research;
20        and
21             (2) advise the President and Congress on--
22                  (A) strategies with respect to the infra-
23             structure needs of comparative effectiveness re-
24             search within the Federal Government;

                           page 191
 1                  (B) appropriate organizational expendi-
 2             tures for comparative effectiveness research by
 3             relevant Federal departments and agencies; and
 4                  (C) opportunities to assure optimum co-
 5             ordination of comparative effectiveness and re-
 6             lated health services research conducted or sup-
 7             ported by relevant Federal departments and
 8             agencies, with the goal of reducing duplicative
 9             efforts and encouraging coordinated and com-
10             plementary use of resources. 

The supposedly sinister Federal Coordinating Council for Comparative Effectiveness Research seems to be more like the Department of Homeland Security for existing federal health care systems.  Programs like Medicaid, veteran’s benefits, and federal employee health insurance already make estimations on the effectiveness of various treatments, just like every private health insurance company.  The new council would share information across the federal programs that already exist.

Following that, she claims that the bill would change the standards for approved Medicare treatments, implying that fewer treatments will be approved:

Medicare now pays for treatments deemed safe and effective. The stimulus bill would change that and apply a cost- effectiveness standard set by the Federal Council (464).

Page 464 contains references to implementing things specified under section 3004, which describes the process for adopting and implementing the previously-mentioned technology infrastructure.  I can find no mention of Medicare.  It seems very much like this claim is completely fabricated.

Every complaint that the author raises, everything that she claims will affect what care doctors are willing to give, seems to only concern the healthcare information technology infrastructure that Obama has been talking about implementing for months.  I see no places where the government will be dictating anything to doctors, aside from providing financial incentives (not punishments) to work with the new infrastructure.  None of the author’s claims about the government cutting off healthcare seem at all founded in fact.  At best, she’s consistently misreading the text of the bill.  I think it more likely that she has other reasons not to want the bill to pass and is distorting the reality of the situation to steer people in the direction she wants them to go.

This post is available under a CC0 Waiver.  Feel free to copy it anywhere.


Backup Implementation

I previously considered different backup schemes.  Writing that entry crystallized my basic ideas about what I was going to do about the backups.  I then proceeded to implement them.  This entry is a detailed description of what I did.

§ Backup Overview

I ended up buying two 80 GB hard drives and a ThermalTake BlacX external enclosure.  The overall plan is to do backups to one of the drives on a daily, automated basis, and the other on a periodic, maybe monthly basis.  Most of the time, the periodic backup drive will live at my office and will serve as my offsite backup.

I want to have a backup history in the same way that a tape rotation scheme would give me.  That way, if I don’t notice there’s something wrong with a file for a time, there’s still a good chance I can retrieve it.  I also want things stored space-efficiently, so backing up unchanged files doesn’t take up additional space.  This is accomplished pretty easily with rsync; I do one full backup, and then subsequent backups use rsync’s --link-dest option pointing to the most recent complete backup; any files that haven’t changed are simply hardlinked together, so the two directory entries point to the same physical location on the disk.

For the backup history, I decided to go with a variant of the Towers of Hanoi backup rotation.  Instead of using a preset number of pegs, as I would have to do if I were using physical tapes, I can just calculate the numbers on the fly, effectively behaving as if I had an infinite number of tapes.  This rotation gives me roughly exponential backoff for my history; I can look at backups from yesterday, two days ago, four days ago, eight days ago, and so on.

Finally, I decided to encrypt the drives.  This lets me cart them around with confidence that if they get lost or stolen, anyone who ends up with them won’t have my personal files.  I used LUKS to encrypt the disks, and used both a file on my computer and a password as keys.  The file makes it easier to mount the drives without manual intervention (important if my desktop reboots), while the password lets me get at the data if the key file isn’t available (important if the main disk dies and I need my backups).

§ Backup Procedure

# Set up fstab.

The first thing I did was to set up /etc/fstab for the disks.  Since I only have one enclosure, I have to swap disks, so only one will ever be connected at the same time.  Thus, I mount whichever’s in at the moment on /backups.  Likewise, I will associate each one with the dm-crypt name of “backups”, so the device will be /dev/mapper/backups.  Thus, I added the following line to /etc/fstab:

/dev/mapper/backups  /backups  auto  defaults  0  0

# Set up rsync filters.

I’m using rsync to do the backups, but in some cases I don’t want everything in the source directories to be backed up.  Thus, I tell rsync to look at a filter file for each directory so it knows what to skip and what to keep.  rsync will go through the filters for each file or directory it considers, and will take the first action that matches.  If nothing matched, the file is copied.  If a directory is ignored, none of the files in that directory are considered at all, so I had to include a few explicit directory chains.

In /var, I only want to back up a few things, so the final pattern ignores anything that isn’t explicitly included.

+ /backups
+ /backups/**
+ /lib
+ /lib/bzr
+ /lib/bzr/**
+ /lib/svn
+ /lib/svn/**
- *

For my home directory, I include everything, with a few exceptions.  For instance, most of my music directory can be reripped from CD if needed, so I don’t need to take up space backing up those files.  On the other hand, I have some files that I either purchased online or downloaded and wouldn’t be able to easily replace if they were lost, so I do back them up.  Here’s an excerpt from my home filter file:

+ /movies/Star_Wars_Holiday_Special
+ /movies/Star_Wars_Holiday_Special/**
- /movies

+ /music
+ /music/Bonerama
+ /music/Bonerama/Bringing_It_Home
+ /music/Bonerama/Bringing_It_Home/**
+ /music/Jonathan_Coulton
+ /music/Jonathan_Coulton/Thing_a_Week_Three
+ /music/Jonathan_Coulton/Thing_a_Week_Three/03-Code_Monkey*
+ /music/Nine_Inch_Nails
+ /music/Nine_Inch_Nails/Ghosts_I-IV
+ /music/Nine_Inch_Nails/Ghosts_I-IV/**
+ /music/Nine_Inch_Nails/The_Slip
+ /music/Nine_Inch_Nails/The_Slip/**
+ /music/Obelix
+ /music/Obelix/**
+ /music/Solo_String_Project
+ /music/Solo_String_Project/**
- /music/**

- /tmp

# Initialize disks.

I wrote a script to initialize the disks for me: init-backup-disk.  It takes two parameters: the name of the device for the backup disk, and the file to use as a key for the partition.  If the key file doesn’t exist, it will be created.

After a few sanity checks, the script starts doing things.  It starts by checking that the disk is good with badblocks.  If it encounters any errors, it stops there and the drive needs to be sent in for warranty replacement.  Following that, it goes into the most time-consuming part of the initialization: writing random data to the disk.  (badblocks already wrote random data, but its PRNG is somewhat simplistic; /dev/urandom is a much better source of pseudo-random numbers.)  Without this step, it would be obvious which bits of the disk had encrypted data on them.  I use pv to give a progress meter and time estimate.  On my computer, badblocks took a little over two hours and /dev/urandom took about eight hours for each 80GB disk.

# Check and randomize disk. 
badblocks -b 512 -s -w -t random -v $disk || exit 2
</dev/urandom pv -s $(fdisk -l $disk |
    perl -nle 'm{^Disk '${disk}': [0-9.]+ [KMGT]B, (\d+) bytes$} and print $1') |
  dd bs=512 conv=sync,noerror of=$disk

The next step is to format the encrypted partition.  I use sfdisk to create a single partition that spans the entire drive, followed by cryptsetup to do the format.  I explicitly specify the cipher in order to use ESSIV, which makes certain attacks more difficult.  The --batch-mode option keeps it from asking for confirmation before writing.  The second call to sfdisk just tells the kernel to reread the disk’s partitions so it will pick up the UUID that cryptsetup created.

# Add and format the LUKS partition. 
echo , | sfdisk --Linux $disk
cryptsetup luksFormat --cipher aes-cbc-essiv:sha256 --batch-mode ${disk}1 $keyfile
sfdisk -R $disk; sleep 5

Next, I open, format, and mount the partition.  JFS is the filesystem that’s been nicest to me, of all the journaling filesystems I’ve tried.  (In the future, it might be nice to use ZFS for the backup disks—I’d get better detection of disk errors, at least—but I don’t think it would play entirely well with running over an encrypted volume, and they haven’t integrated encryption into ZFS yet, as far as I can tell.)

# Open LUKS partition, format and mount the encrypted volume. 
cryptsetup --key-file $keyfile luksOpen ${disk}1 backups
mkfs -t jfs -q /dev/mapper/backups
mount /backups

Now I run the initial backup.  Each backup version is just a separate numbered directory in the partition, so the first one is ‘1’.  I’m backing up /etc, some of /var, my and my wife’s home directories, and any databases I have.  My wife’s stuff isn’t directly backed up here because she’s on a different computer; I have to initiate her backup from there.  The script, in its first requirement for user interaction, will wait until I tell it that’s done.

# Do the initial backup. 
mkdir -vp /backups/1/{etc,var,pip,postgresql,mysql,rivana}
chmod a-r /backups/1
chown rivana /backups/1/rivana
chown postgres /backups/1/postgresql
rsync -avP --filter 'merge /etc/backups/etc-filter' /etc/ /backups/1/etc/
rsync -avP --filter 'merge /etc/backups/var-filter' /var/ /backups/1/var/
rsync -avP --filter 'merge /etc/backups/pip-filter' /home/pip/ /backups/1/pip/
su -c 'pg_dumpall -v >/backups/1/postgresql/dump' postgres
mysqldump -v --all-databases >/backups/1/mysql/dump
echo -n "Hit enter when rivana is backed up.  "
read foo

Now that the backup is done, the script unmounts and deactivates the partition.

# Deactivate the encrypted volume. 
umount /backups
cryptsetup luksClose backups

And I get prompted for the password that will unlock the partition if the key file isn’t available.

# Add user password. 
cryptsetup --key-file $keyfile --verify-passphrase luksAddKey ${disk}1

Finally, the script displays the UUID for the partition, which is needed for later use.

# Display the partition's UUID. 
echo -n 'UUID: '
cryptsetup luksUUID ${disk}1

# Set up crypttab.

Debian, at least, has an /etc/crypttab file that lists encrypted partitions to be enabled at boot time.  I put the onsite backup disk in there so it’ll be automatically mounted if the computer reboots.  This plus a backup cronjob make the backup process completely automated.

backups  /dev/disk/by-uuid/<onsite UUID>  <key file>  luks

# Do local backups.

I have a simple script to do the daily backups: perform-backup.  It’s basically the same as the initial backup, but with the --link-dest option as I mentioned previously.

last_num=$(ls -t /backups | head -1)
((num=$last_num+1))

mkdir -p /backups/$num/{etc,var,pip,postgresql,mysql,rivana}
chown rivana /backups/$num/rivana
chmod a-r /backups/$num
rsync -a --filter 'merge /etc/backups/etc-filter' --link-dest=/backups/$last_num/etc /etc/ /backups/$num/etc/
rsync -a --filter 'merge /etc/backups/var-filter' --link-dest=/backups/$last_num/var /var/ /backups/$num/var/
rsync -a --filter 'merge /etc/backups/pip-filter' --link-dest=/backups/$last_num/pip /home/pip/ /backups/$num/pip/
chown postgres /backups/$num/postgresql
su -c "pg_dumpall >/backups/$num/postgresql/dump" postgres
mysqldump --all-databases >/backups/$num/mysql/dump

# Do Becca’s backup

My wife has her own computer but, fortunately, simpler backup requirements.  I have ssh public key authentication set up so she can ssh to my computer without a password, which makes the backups work properly in an automated fashion.

The initial backup is a simple rsync one-liner.

rsync -avP ~/ mithrandir:/backups/1/rivana/

Subsequent backups are a simple script:

#!/bin/sh

num=$(ssh mithrandir ls -t /backups | head -1)
last_num=$(ssh mithrandir ls -t /backups | head -2 | tail -1)

rsync -a --link-dest=/backups/${last_num}/rivana ${HOME}/ mithrandir:/backups/${num}/rivana/

# Backup rotation.

The Towers of Hanoi rotation is effected by a script that goes through and deletes any directories that don’t need to be there: purge-backups.  I won’t quote it here because I don’t think it’s all that interesting.  It just finds the largest power of two less than or equal to the current number and then works its way down from that to enumerate all of the directories, deleting everything else.

# Offsite backups.

Every so often I’ll bring the offsite disk home and back up to it.  The script for that is insert-offsite-backup.  It unmounts the onsite disk, waits for me to insert the offsite disk, runs a backup, unmounts the offsite disk, waits for me to reinsert the onsite disk, then remounts that disk.  It needs to be told what my offsite UUID is, but it picks up all the other settings from /etc/crypttab.

The backup takes about half an hour, so I have ample time to manually run the backup script on Becca’s computer.


Backups

I had a dream last night that the apartment beneath ours caught on fire, we had to rush out of the building, and my computer and all of its data was destroyed.

I’ve been pondering a formal backup system for a while now.  (My current system involves making sure important files are in a version control system and exist on at least my laptop and desktop.  This is pretty ad-hoc, inconsistently updated, and not entirely comprehensive.)  I’m taking my dream as impetus to actually set something up.  This post is to help me organize my thoughts and see if anyone has any comments or suggestions.

§ My Requirements

I want to have a full rotating versioned backup system, where I have complete daily backups for a recent time span (say a week or so) and more sporadic backups back to as much as a year in the past.  Ideally, the backups should be stored in a space-efficient manner so unchanged files don’t take up more space than a single copy would require.  The backups should have off-site redundancy.  They should be relatively easy to use; they should be fully automated on a day-to-day basis, with notification when things go wrong.  Ease of setup would be nice but not necessary.

§ My Data

I currently have about 720 GB of data in my home directory, plus a few hundred MB elsewhere on the computer that I’d want to back up.  I also have about 11GB in a bzr repository, but all of that should remain duplicated in my home directory.  Most of the data in my home directory is in media files that I can either replace (rerip CDs, etc.) or live without; only 25 GB of it is stuff that I must back up.  (A further 130 GB is stuff that would be nice to back up, but I can just burn it to DVD and consider those my backups; the data is essentially static.)

§ JWZ Backups

The easiest approach is the JWZ backup solution.  For all of my data, that would be two 1 TB external hard drives, for about $220.  If I restrict myself to the “must backup” data, I could make do with two 60 GB external hard drives for about $80.  In either case, I’d keep one drive at the office and swap them periodically.

The advantage of this approach is that I control everything.  I can put encrypted volumes on the drives, so if they get lost or stolen, my data isn’t usable to other people.  I can use rsync with hardlinks between datestamped directories to get versioned backups with efficient disk usage.  The drawbacks are a modest initial monetary outlay and the need to coordinate shuttling drives back and forth.

§ Amazon S3

Another approach is to use Amazon S3 to store my data.  It’s offsite by definition (and stored among multiple data centers; if I write data to it, I can reasonably trust that I’ll get that data back).  It’s not too expensive: at $0.17/GB-month, my minimal backup will cost about $3.85/month.  Throw in transfer costs and churn, and I doubt I’d exceed $6/month.  (The initial upload would be $2.56.  A full restore would cost me $4.36.)  With S3, I would only back up the minimal data; the 130 GB of optional backups would cost an additional $20/month, which would exceed the cost of the full do-it-myself hard drive backups in one year.

The complication to S3 is that it’s just a web-based data storage service; you need additional software to make a reasonable backup solution.

# Jungle Disk

From everything I’ve read, Jungle Disk is currently the best software for storing filesystem data on S3.  It runs on Windows, Mac OSX, and Linux, and exports your S3 buckets as a WebDAV disk, which you can then mount and treat like an ordinary (unlimited capacity) disk drive.  All data is encrypted before it’s sent to S3.

I like this approach.  Since it looks like a disk, I can use the same rsync setup I would with my own disks, and since the data is encrypted, I don’t need to worry too much about it being transported over the Internet and stored on someone else’s servers.  The main drawback is that it’s proprietary software.  In addition to my principled preference of open source software to proprietary, there’s also the issue that, especially because the data’s encrypted, this software would be my only access to my backups.  If something went wrong and I couldn’t get support from the company (e.g. they went out of business), I’d be out of luck.

The software costs $20.  Assuming $5/month on S3, it would take one year for this approach to cost more than the minimal get-my-own-disks approach.

# Other S3 software

I haven’t seen anything else that will let me back up to S3 and keep versioned backups in a space-efficient manner.  Most of the S3 backup software I’ve seen doesn’t do versions, and the few that do don’t appear to do it space-efficiently.  As always, I have the option of writing my own, but that would take a fair amount of time and effort, and I’d be likely to give up partway through, continuing to leave myself without good backups.

§ Conclusion

Barring any better suggestions from others, I’m leaning towards the two smallish hard drives.  They’d pay for themselves after a year of use, and I get complete control of my data (for better or worse).  I like the idea of using S3, but it’s more expensive in the long run, and I’m not completely happy with any of the software I’ve found to use with it.


Truffle-Topped Amaretto Brownies

I made this for Thanksgiving this year, and have already been asked for the recipe, even though I haven’t had any yet.  So here goes.

  • Brownie Layer
    • 1 cup sugar
    • 1 cup butter
    • 4 Tablespoons water
    • 2 cups semi-sweet chocolate morsels (~300g)
    • 1/2 cup amaretto
    • 2 teaspoons vanilla extract
    • 4 eggs
    • 1 1/2 cups all-purpose flour (~190g)
    • 1/2 teaspoon baking soda
    • 1/2 teaspoon salt
    • 1 cup chopped or sliced almonds (optionally toasted)
    • 1/2 cup chopped maraschino cherries
  • Truffle Layer
    • 8 oz cream cheese, softened
    • 1/4 cup powdered sugar (30g)
    • 1 cup semi-sweet chocolate morsels (~150g)
    • 2-3 Tablespoons amaretto
  • Topping
    • 1 cup semi-sweet chocolate morsels (~150g)
    • 1/2 cup whipping cream
    • 1 cup sliced almonds, lightly toasted
    • maraschino cherries for garnish

§ Brownie Layer

Preheat oven to 325°F.  Prepare a 9×13 baking dish.  (I line it with a sling of parchment paper and then spray it with Baker’s Joy.)

In a saucepan, bring the sugar, butter, and water to a boil.  Remove from heat.  Add chocolate, amaretto, and vanilla extract, stirring until chocolate is melted.  Add eggs, one at a time, stirring until blended.

Whisk together flour, baking soda, and salt.  Add to chocolate mixture, stirring well.  Stir in almonds and cherries.

Pour mixture into baking dish and bake for 42-48 minutes.

Leave them in the dish to cool.

§ Truffle Layer

While the brownies are cooling, beat the cream cheese and powdered sugar in a stand mixer on medium speed until the mixture is smooth.  Melt chocolate and add with the amaretto to the cream cheese, mixing until well-blended.

Spread over brownies and refrigerate until firm, at least 1 hour.

§ Topping

In a saucepan, melt the chocolate in the whipping cream.  Spread evenly over the brownies.  Sprinkle with almonds and cherries.  Refrigerate until set, at least 1 hour.

Cut into bars and serve.