(*
QubitPartialTrace[]
by C. Jess Riedel
University of California, Santa Barbara
jessriedel@gmail.com
I give up all rights to this work and release it into the public domain. It's not required, but it would be nice of you to drop me a line if you find this useful

Assumptions: '\[Rho]' is a 2^N x 2^N array (list of lists) of complex numbers, where N is the number of bits. 'traced' is a list of integers between 1 and N, inclusive, denoting which spins are to be traced out.
Comments: Spin 1 is the 'most sigificant bit', in the sense that the largest blocks of \[Rho] correspond to particular states of spin 1. Spin N is the 'least sigificant bit', in the sense that the columns and rows of \[Rho] alernate which state of spin N they correspond to.
All of the significant computation time takes place in the Sum[] on the last line of QubitPartialTrace[]. Since this line is simple and is written using Mathematica functions optimized for matrix manipulation, the computation time likely cannot be improved. In particular, constructing the list manually using the Table[] function is significantly slower, as is using For[] loops. Mathematica compilation of any part of this function never improved the computation time.
*)
QubitPartialTrace[\[Rho]_, traced_] :=
Module[{untraced, numBit, tracedSetIndex, untracedSetIndex},
numBit = Log[2, Length[\[Rho]]];
If[Length[traced] == 0, Return[\[Rho]]];
If[Length[traced] == numBit, Return[{{Tr[\[Rho]]}}]];
untraced = Complement[Range[numBit], traced];
tracedSetIndex = MakeSetIndexU[traced, numBit];
untracedSetIndex = MakeSetIndexU[untraced, numBit];
Return[Sum[
\[Rho][[1 + it + untracedSetIndex,
1 + it + untracedSetIndex]], {it, tracedSetIndex}
]];
];
MakeSetIndex[spins_, numBit_] :=
Flatten[Outer @@
Join[{Plus}, Table[{0, 1}*2^(numBit  n), {n, spins}]]];