AI Insights - Emerging Research
Collection of interesting AI papers - Source : Advanced AI, Hungarian AI institute.

contribute to inequality through biased or discriminatory outputs (Turner Lee, et al., 2019)
Wayback Machine (archive.org) https://web.archive.org/web/20240124113209/http:/gnss.mcgill.ca/pages/powell.pdf
Give your own AI agent a goal and watch as it thinks, comes up with an execution plan and takes actions
https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned
Generative AI v Agent AI https://x.com/elder_plinius
https://www.deepmind.com/blog/specification-gaming-the-flip-side-of-ai-ingenuity
Lego stacking task https://arxiv.org/abs/1704.03073
https://openai.com/index/learning-from-human-preferences/
simulated robot that was supposed to learn to walk figured out how to hook its legs together and slide along the ground.
potential-based
Coast Runners game
https://www.youtube.com/watch?v=5rzyuY8-Ao8
https://www.youtube.com/watch?v=B4M-54cEduo
Day 3
https://bounded-regret.ghost.io/emergent-deception-optimization/
often exhibit emergent capabilities
these capabilities could lead to unintended negative consequences
Future ML Systems will be Qualitatively Different
Wei et al. (2022).
a given context helps predict future tokens (more quantitatively, Olsson et al. (2022)
chain-of-thought reasoning (Chowdhery et al., 2022; Wei et al., 2022)
"clean-up" phase from Nanda et al. (2022), Section 5.2.
, Stiennon et al. (2020) train systems to generate highly-rated summaries
Ouyang et al. (2022) train language models to respond to instructions
Bai et al. (2022) train systems to be helpful and harmless as judged by human annotators.
creating false balance in cases where there is a clear right answer
RLHF model in Bai et al. (2022)
often does provide subjective opinions[3], and could exacerbate automation bias.
Chat-GPT frequently claims incorrectly to not know the answers to questions. It can also gaslight users
https://twitter.com/MovingToTheSun/status/1625156575202537474?ref=bounded-regret.ghost.io
me basic forms of theory-of-mind do seem to appear emergently at scale (Chen et al., 2022; Sap et al., 2022).[7]
Perez et al. (2022) provide some preliminary evidence for this, showing that models learn to imitate the beliefs of the person they are talking to
this behavior (dubbed sycophancy by Perez et al.; see also Cotra, 2022) appears emergently at scale
Future systems might interact with the internet (Nakano et al., 2021
, Pan et al. (2022) found that RL agents exhibit emergent reward hacking when given more optimization power, as measured by training time, model size, and action fidelity. Gao et al. (2022) similarly find that more RL training or choosing from a larger set of candidate outputs both lead to increased overfitting of a reward model, and moreover that the amount of reward hacking follows smooth scaling laws.
significant parts of their training distribution contain planning (see Andreas (2022)
https://arxiv.org/abs/2212.03827?ref=bounded-regret.ghost.io
https://platform.openai.com/playground/chat?preset=PH6iUjmHcn8LPKIDW3HjZpuGorm
https://platform.openai.com/docs/models/gpt-3?ref=bounded-regret.ghost.io
https://arxiv.org/abs/1906.01820?ref=bounded-regret.ghost.io
https://www.cold-takes.com/ai-safety-seems-hard-to-measure/
https://www.cold-takes.com/why-would-ai-aim-to-defeat-humanity/
https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/
https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/
https://www.cold-takes.com/why-would-ai-aim-to-defeat-humanity/#what-it-means-for
https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/
https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#The-King-Lear-problem
https://www.transformer-circuits.pub/2022/mech-interp-essay/index.html
https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#Box4
But once there is a real-world opportunity to disempower humans for good, that same aim could cause the AI to disempower humans.
https://www.cold-takes.com/transformative-ai-timelines-part-1-of-4-what-kind-of-ai/#making-pasta
https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#DigitalNeuroscience
https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#DigitalNeuroscience
https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#DigitalNeuroscience
best for them to emerge early while there's relatively little risk of AIs' actually defeating humanity https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/
https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#DigitalNeuroscience
https://www.cold-takes.com/why-would-ai-aim-to-defeat-humanity/#deceiving-and-manipulating
https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#DigitalNeuroscience
https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#DigitalNeuroscience
https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#DigitalNeuroscience
Methods along the lines of AI Safety via debate https://openai.com/blog/debate
/