![]() infer_auto_device_map() (or device_map="auto" in load_checkpoint_and_dispatch()) attributes devices sequentially (to avoid moving things back and forth) so if your first layer is bigger than the size of the GPU you have, it will end up with everything on the CPU/Disk.Move a few modules to the disk device if you get crashes due to lack of RAM. Therefore, an automatically computed device map might be too intense on the CPU. While PyTorch is very good at managing GPU RAM efficiently (and giving it back when not needed), it’s not entirely true with Python and CPU RAM. infer_auto_device_map() (or device_map="auto" in load_checkpoint_and_dispatch()) tries to maximize GPU and CPU RAM it sees available when you execute it.This will be fixed in further development. While this could theoretically work on just one CPU with potential disk offload, you need at least one GPU to run this API. ![]() We are aware of the current limitations in the API: ![]() don’t put one of the first weights on GPU 0, then weights on GPU 1 and the last weight back to GPU 0) to avoid making many transfers of data between the GPUs. To be the most efficient, make sure your device map puts the parameters on the GPUs in a sequential manner (e.g. ![]()
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |