EDUBERROCAL.NET

Artificial Intelligence A-Z

24-July-2025

It has been a while since I wrote AI: Crawl, Walk, Run; a follow up article was due. The last 12 months have been ridiculous. I have not had time to write much between work, family, and personal matters. However, I have not abandoned the blog. I intend to pick it up again and keep writing about my AI journey.

In this article, I plan to wrap up what I did in the Udemy's Artificial Intelligence course I completed last year, to then move on to other (more fun) AI projects I have in mind. I will not go into too much detail this time.

Deep Convolutional Q-Learning and Eligibility Trace

In the previous article, I wrote about Deep Q-Learning and showed the lunar lander game. The next topic in the course involved Deep Convolutional Q-Learning (DCQN) and eligibility trace (i.e., n-step Q-Learning). In essence, DCQN uses Convolutional Neural Networks (CNNs), adding convolutional and pooling layers in front of the neural network (NN) to "see" the environment, as opposed to having the state fed directly to the network. The state, in this case, is an image, hence making the AI more realistic: the AI has "eyes" now.

(image source)

Eligibility trace, or n-step Q-Learning, is a learning process where multiple steps are taken at a time before the reward is calculated and the NN is adjusted. The key aspect of this technique is that it is possible to know which steps were critical for the good (or bad) outcome from the combination of steps taken. Steps that, by isolation (without the trace), may look good in the local context, could be bad choices when considering the outcome down the road.

The following video shows a Pac-Man playing an AI trained using n-step DCQN:

You can find the code here.

A3C

Next was the Asynchronous Advantage Actor-Critic (A3C) algorithm. According to the course, it is the most advanced AI algorithm. A3C is more complex and includes some interesting ideas. The name comes from the 3 pieces that, together, form A3C.

The first is the Asynchronous piece. A3C uses multiple agents exploring different parts of the environment, collaborating to find a better solution. The idea here is to avoid falling into local maxima. If you ask me, I would have called this the PAAC (Parallel Advantage Actor-Critic) algorithm, given that the agents do sync (but explore the environment in parallel).

Advantage refers to a value that is calculated using the Q-values for each agent (the policy) and a common value called Critic. In typical Q-Learning, the Q values output from each training step are used to calculate the Value Loss, which in turn is used in backpropagation to adjust the weights in the NN. The larger the loss, the more the neurons' weights contributing to those Q values must be adjusted. The Critic works here similarly. The Critic represents a type of average of the fitness of an agent with respect to all others. By looking at different parts of the environments in parallel, the agents can avoid local maxima where the model adapts well to one part of the environment but not another. If one agent is "falling behind", it will have a larger weight when adjusting the NN.

Actor-Critic refers to the fact that now we have a Critic in addition to the Actor, as explained in the previous paragraph.

Although not mandatory, A3C models sometimes include hidden extra layers called Long Short-Term Memory (LSTM). These layers allow the model to remember what actions were taken before, adding the capability to "see movement". Imagine, for instance, an AI playing a game of soccer (i.e., football). A frame of the game shows the ball's position and the players, but it can not show in what direction they are moving. Or if they are moving at all! By adding LSTM, the AI is now able to consider its momentum in the game, carrying information across many time steps.

(image source)

For a better explanation of how A3C works, please read Simple Reinforcement Learning with Tensorflow Part 8: Asynchronous Actor-Critic Agents (A3C), by Arthur Juliani.

The following video shows an AI using a trained A3C model (no LSTM) playing the game of Kungfu:

You can find the code here.

Fine-Tuning LLMs with Hugging Face

The last part of the course involved fine-tuning an open source LLM (Meta's Llama 2) with lingo from a specific field of knowledge and then chatting with it. In our case, the LLM was fine-tuned using medical terms.

It was here where I was introduced to the world of Hugging Face (HF), a massive library of pre-trained models, sample datasets, and even applications. What I love about HF is how easy we can use its resources within Python/PyTorch. All that is required is installing the python package huggingface_hub.

$ pip install huggingface_hub

Importing a model is as easy as:

llama_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path="aboonaji/llama2finetune-v2", ...)

Where aboonaji is the user in HF. I plan to use this resource extensively moving forward!

For the exercise, I created a simple prompt to ask about the medical term Ascariasis, restricting the output to just 300 words:

user_prompt = "Please tell me about Ascariasis"
text_generation_pipeline = pipeline(task="text-generation",
                                    model=llama_model,
                                    tokenizer=llama_tokenizer,
                                    max_length=300)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])

The output was:

<s>[INST] Please tell me about Ascariasis [/INST] Ascariasis is a parasitic infection caused by the Ascaris lumbricoides roundworm. Unterscheidung between Ascaris suum and Ascaris lumbricoides is difficult, and the two species are often considered to be the same. The infection is caused by the ingestion of infective larvae, which then migrate through the body and can cause a range of symptoms, including:

1. Abdominal pain
2. Diarrhea
3. Vomiting
4. Weight loss
5. Fatigue
6. Anemia
7. Malnutrition

Ascariasis is a common infection worldwide, particularly in areas with poor sanitation and hygiene. The infection is most commonly found in tropical and subtropical regions, where the parasite is more likely to be present in contaminated soil, water, or food.

The diagnosis of Ascariasis is typically made through a combination of physical examination, medical history, and laboratory tests. Laboratory tests may include:

1. Stool examination: A stool sample can be examined for the presence of Ascaris eggs or larvae.
2. Blood tests: Blood tests can be used to measure the levels of anemia or

You can find the code here.