Hi, I’m having a little trouble comprehending the data structure here.

In the get_item function in dataset.py, I’ve been given to understand that “pose” here is an array with shape (4, 4), which I take it to mean a (N, 4, 4) ndarray, a match with the discription in th PDF.

In the forward function in DexGraspDetector , however, the “pose” becomes a (B, 4, 4) tensor. It’s my understanding that “B” here stands for batch, and so I thought the “pose” should be a (B, N, 4, 4). I’m confused rightnow and would appriciate any help.

Also, the “N” for “pose” for each object is not the same, so the “asarray” in dataset.py would not work as expected. I was thinking filling the gaps with zeros, but that seems a little risky, especially when I’m still confused over the training part.

I would appriciate any type of help or guidence. Thanks a lot!

Here are somes pics for the functions I’ve mentioned.

Hi, enter-port.

The array with shape (N, 4, 4) mentioned in the PDF indicates that when you load the ‘pose.npy’ file using the ‘numpy.load’ function, you will obtain an array comprising N poses, with each pose being a transformation matrix with shape (4, 4).

As is well known, in the training process, the neural network usually takes batch data as input. Consequently, a single batch of “pose” data should be formatted with the shape (B, 4, 4), where B represents the batch size.

It’s common that each object has different sample size in practice. In fact, we have processed the data and made the sizes of different object not much different. Handling such problem is a part of the project. For the ‘dataset.py’, the only requirement is that the object returned from the __ getitem __ should meet the format given in the code comment.

OK, I think I got it. So actually in the ‘pose.npy’ file there are N poses, and each pose corresponds with a result in the ‘label.npy’, so in nerual network training each (4, 4) matrix is an input together with other parameters. Is that correct?