Train/Test splits for the embedding itself #208
Replies: 3 comments 5 replies
-
Hi @wulfdewolf you'd have to tell us a bit more about your scenario for me to understand the images; test and train embeddings should look similar, if the data is similarly IID. You test here isn't collapsed, but I don't know the coloring. You'd also first have to establish the consistency of train runs - it should be very high and not overfitting. |
Beta Was this translation helpful? Give feedback.
-
Hi, apologies, I could've given a lot more information. I used the following call:
I added some weight decay because I thought it might help improving the projection of the unseen data, but to no avail. |
Beta Was this translation helpful? Give feedback.
-
Hi @wulfdewolf sorry for my delay. To be clear, you should pick parameters that maximize consistency without overfitting, of course; in our user guide Step 3 is our recommendation for doing train/validation splits. The nature of your data (trial based) makes this easy to do. If you are over-parameterizing your model on small data (as you suggest might be the case/you train longer than typical, ie 20K is a lot), then that will become apparent in the train/val set up. Generally we recommend finding parameters that provide high consistency across runs for train/val and only then going on to use those parameters on the full data (much like in the way one uses tSNE or UMAP on the full dataset). I hope that clarifies. You could consider checkpointing the model at regular intervals (e.g., every 1k steps) and then plot the consistency (on both train and test) along with the loss curves over time. As a side comment, you could set the temperature mode to ‘constant’ and set the temperature to 0.1 (for a smoother embedding) or 1.0 (for a more “clustered” embedding). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Are these not important for evaluating the learnt embedding?
In the documentation and the paper, I understand that all shown embeddings are always for the training data.
From my own experiments, even though the embedding for my training data looks very nice, when I project some hold-out data at the end of the session into the same space (without
adapt=True
), the embedding practically collapses.Does that mean my original embedding was overfitted?
I might be missing something simple here. I understand that we are interested in the structure in the embedding space and if we can find one that has structure for the training data, that is very nice.
But, if it doesn't transfer to some unseen data from a not-so different distribution, should I trust it?
This is one example, but I've tried it for multiple sessions and other variables.
Train:

Test:

Beta Was this translation helpful? Give feedback.
All reactions