It's hard to say why it might be in the tutorial, but usually you have (batch_number, **your_data) shape as input in your network, output in case of classification usually has (batch_number, number_of_classes), and you're right that in that case you should use dim=1(or recommended way use even dim=-1 because you can have more complicated output, for example - (batch_number, some_more_data ...