Exporting and Importing Bigtable Data

So I’ve found several articles that are rated high on Google that seem to go into detail about exporting and importing into and from big table, but not both within the same scenario. What I’m talking about it exporting a table from one big table instance and importing that data into another bigtable instance.

Experiment One - SequenceFile

So this trial ended up as a failure. I ran an export job using the SequenceFile template provided by Google, created a new bigtable instance, and then ran an import job. I received a slew of hadoop.hbase errors. All which lead me to just think: “Lets hit it again with something else before I spend any mental energy trying to figure out what the error means.” So this lead me to my second experiment.

Experiment Two - Avro

This trial also ended as a failure. I ran an export job using the Avro file job template provided by Google, used the same bigtable instance I created earlier in the first experiment, and then ran an import job. I received errors again. What could be wrong?

The Obvious Isn’t Always So

My next thought was maybe the table had to be formed and formed properly prior to running the import (Spoiler alert, this is correct!). So first I tried just running a command with cbt to create a table with the name of the table I was going to import. (If you’re unfamiliar with the cbt tool, please see this link and become familiar. It’s a straightforward way to interact with big table instances via the command line: https://cloud.google.com/bigtable/docs/quickstart-cbt).

I just created the table and I retried the second experiment. It failed. The proper column families didn’t exist in the table I was trying to import. So my next step was to ensure that the table had the same structure of the table I exported from, and then ran the import job.

Experiment Three - Ensuring Proper Table Formation

Each organization has a way of ensuring tables exist and has the proper column families. After doing that, the SequenceFile and the Avro import job both succeeded with no data duplication and no errors.

Lessons Learned

  • Ensure the table you’re exporting exists in the target bigtable instance
  • Ensure the table you’re importing into has the same structure as the table you exported from (column families)

Additional Thoughts

Here are some additional thoughts I have on some of the methods that Google provides for importing and exporting data to and from bigtable.

Cleanest Method

This is simply comparing the number of files produced by the two export processes that Google provides. The Avro method produces more files than the SequenceFile method.

Cover and thumbnail photo was taken in Pilanesberg National Park, North West, South Africa by José Luis Bonilla on an iPhone Xs Max. September 2019.