AI (Artificial Intelligence) is a computer science subfield created in the 1960s and concerned with solving tasks that are easy for humans but difficult for computers. A so-called Strong AI, in particular, would be a system that can do anything that a human can do (perhaps without purely physical things). This is fairly generic and includes all kinds of tasks such as planning, moving around the world, recognizing objects and sounds, speaking, translating, performing social or business transactions, creative work (making art or poetry), etc.
NLP (Natural Language Processing) is simply the language – related (usually written) part of AI.
Machine learning is concerned with one aspect of this: given some AI problem that can be described in discrete terms (e.g. from a specific set of actions, one of which is the right one), and given a lot of information about the world, figure out what the “correct” action is, without the programmer having it in. There is typically a need for some external process to judge whether or not the action was correct. It’s a function in mathematical terms: you feed in some input, and you want it to produce the right output, so the whole problem is just building a model of this mathematical function in some automatic way. If I can write a very clever program with human-like behaviour, to draw a distinction with AI, it can be AI, but unless its parameters are automatically learned from data, it is not machine learning.
Deep learning is a type of machine learning that is now very popular. It involves a particular type of mathematical model that can be considered as a composition of a certain type of simple blocks (function composition) and where some of these blocks can be adjusted to better predict the end result.
The word “deep” means that many of these blocks are stacked on top of each other, and the tricky bit is how to adjust the blocks that are far from the output, as there may be very indirect effects on the output from a small change. This is done through something called Backpropagation within a larger process called Gradient Descent that allows you to change the parameters to improve your model.