AI Insights - Emerging Research

03/08/2024

Collection of interesting AI papers - Source : Advanced AI, Hungarian AI institute.

contribute to inequality through biased or discriminatory outputs (Turner Lee, et al., 2019)

https://web.archive.org/web/20230911130611/https:/www.brookings.edu/articles/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms

https://web.archive.org/web/20230911130611/https:/www.sipri.org/sites/default/files/2020-06/artificial_intelligence_strategic_stability_and_nuclear_risk.pdf

Wayback Machine (archive.org) https://web.archive.org/web/20240124113209/http:/gnss.mcgill.ca/pages/powell.pdf

https://web.archive.org/web/20230911130611/https://aisafetyfundamentals.com/governance-blog/overview-of-ai-risk-exacerbation

Give your own AI agent a goal and watch as it thinks, comes up with an execution plan and takes actions

https://agentgpt.reworkd.ai

https://venturebeat.com/ai/as-ai-agents-like-auto-gpt-speed-up-generative-ai-race-we-all-need-to-buckle-up-the-ai-beat/

https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned

Generative AI v Agent AI https://x.com/elder_plinius

https://www.deepmind.com/blog/specification-gaming-the-flip-side-of-ai-ingenuity

Lego stacking task https://arxiv.org/abs/1704.03073

https://openai.com/index/learning-from-human-preferences/

simulated robot that was supposed to learn to walk figured out how to hook its legs together and slide along the ground.

potential-based

Coast Runners game

https://www.youtube.com/watch?v=5rzyuY8-Ao8

https://www.youtube.com/watch?v=B4M-54cEduo

Day 3

https://bounded-regret.ghost.io/emergent-deception-optimization/

often exhibit emergent capabilities

these capabilities could lead to unintended negative consequences

Future ML Systems will be Qualitatively Different

Wei et al. (2022).

a given context helps predict future tokens (more quantitatively, Olsson et al. (2022)

chain-of-thought reasoning (Chowdhery et al., 2022; Wei et al., 2022)

"clean-up" phase from Nanda et al. (2022), Section 5.2.

, Stiennon et al. (2020) train systems to generate highly-rated summaries

Ouyang et al. (2022) train language models to respond to instructions

Bai et al. (2022) train systems to be helpful and harmless as judged by human annotators.

creating false balance in cases where there is a clear right answer

RLHF model in Bai et al. (2022)

often does provide subjective opinions[3], and could exacerbate automation bias.

Chat-GPT frequently claims incorrectly to not know the answers to questions. It can also gaslight users

https://twitter.com/MovingToTheSun/status/1625156575202537474?ref=bounded-regret.ghost.io

me basic forms of theory-of-mind do seem to appear emergently at scale (Chen et al., 2022; Sap et al., 2022).[7]

Perez et al. (2022) provide some preliminary evidence for this, showing that models learn to imitate the beliefs of the person they are talking to

this behavior (dubbed sycophancy by Perez et al.; see also Cotra, 2022) appears emergently at scale

https://jsteinhardt.stat.berkeley.edu/talks/satml/tutorial.html?ref=bounded-regret.ghost.io#slideIndex=3&level=2

Future systems might interact with the internet (Nakano et al., 2021

, Pan et al. (2022) found that RL agents exhibit emergent reward hacking when given more optimization power, as measured by training time, model size, and action fidelity. Gao et al. (2022) similarly find that more RL training or choosing from a larger set of candidate outputs both lead to increased overfitting of a reward model, and moreover that the amount of reward hacking follows smooth scaling laws.

significant parts of their training distribution contain planning (see Andreas (2022)

https://arxiv.org/abs/2212.03827?ref=bounded-regret.ghost.io

https://platform.openai.com/playground/chat?preset=PH6iUjmHcn8LPKIDW3HjZpuGorm

https://platform.openai.com/docs/models/gpt-3?ref=bounded-regret.ghost.io

https://arxiv.org/abs/1906.01820?ref=bounded-regret.ghost.io

https://www.cold-takes.com/ai-safety-seems-hard-to-measure/

https://www.cold-takes.com/why-would-ai-aim-to-defeat-humanity/

https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/

https://www.cold-takes.com/why-ai-alignment-could-be-hard-with-modern-deep-learning/#analogy-the-young-ceo

https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/

https://www.cold-takes.com/why-would-ai-aim-to-defeat-humanity/#what-it-means-for

https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/

https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#The-King-Lear-problem

https://www.transformer-circuits.pub/2022/mech-interp-essay/index.html

https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#Box4

But once there is a real-world opportunity to disempower humans for good, that same aim could cause the AI to disempower humans.

https://www.cold-takes.com/transformative-ai-timelines-part-1-of-4-what-kind-of-ai/#making-pasta

https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#DigitalNeuroscience

https://www.cold-takes.com/transformative-ai-timelines-part-1-of-4-what-kind-of-ai/#explosive-scientific-and-technological-advancement

best for them to emerge early while there's relatively little risk of AIs' actually defeating humanity https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/

https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#DigitalNeuroscience

https://www.cold-takes.com/why-would-ai-aim-to-defeat-humanity/#deceiving-and-manipulating

https://www.cold-takes.com/p/4d63edc6-4be6-4c77-ae5b-c70e730acb58#DigitalNeuroscience

Methods along the lines of AI Safety via debate https://openai.com/blog/debate

AI Insights - Emerging Research

Advanced settings