TD-Gammon について

Words near each other

・ Te Ao Māori
・ Te Ao-kapurangi
・ Te Apiti
・ Te Apiti Wind Farm
・ Te Arai
・ Te Arai River
・ Te Araroa
・ Te Araroa Trail
・ Te Arataura
・ Te Aratauwhāiti
・ TD Waterhouse
・ TD-1 RNA motif
・ TD-1A
・ TD-2 RNA motif
・ TD-3
・ TD-Gammon
・ TD-SCDMA
・ TD/SMP
・ TD1
・ TD1 Catalog of Stellar Ultraviolet Fluxes
・ TD2000
・ TDA
・ TDAC
・ TDB
・ TDC
・ TDC A/S
・ TDC Binh Duong F.C.
・ TDC Games
・ TDC Northern Stars
・ TDCAA

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

TD-Gammon ：ウィキペディア英語版

TD-Gammon

TD-Gammon was a computer backgammon program developed in 1992 by Gerald Tesauro at IBM's Thomas J. Watson Research Center. Its name comes from the fact that it is an artificial neural net trained by a form of temporal-difference learning, specifically TD-lambda.
TD-Gammon achieved a level of play just slightly below that of the top human backgammon players of the time. It explored strategies that humans had not pursued and led to advances in the theory of correct backgammon play.
==Algorithm for play and learning==
Each turn while playing a game, TD-Gammon examines all possible legal moves and all their possible responses (two-ply look-ahead), feeds each resulting board position into its evaluation function, and chooses the move that leads to the board position that got the highest score. In this respect, TD-Gammon is no different than almost any other computer board-game program. TD-Gammon's innovation was in how it learned its evaluation function.
TD-Gammon's learning algorithm consists of updating the weights in its neural net after each turn to reduce the difference between its evaluation of previous turns' board positions and its evaluation of the present turn's board position—hence "temporal-difference learning". The score of any board position is a set of four numbers reflecting the program's estimate of the likelihood of each possible game result: White wins normally, Black wins normally, White wins a gammon, Black wins a gammon. For the final board position of the game, the algorithm compares with the actual result of the game rather than its own evaluation of the board position.
After each turn, each weight in the neural net gets updated according to the following rule:
:

w_ - w_t = \alpha(Y_ - Y_t)\sum_^\lambda^ \nabla_w Y_k

where:
: - Y_t
| is the difference between the current and previous turn's board evaluations.
|- valign="top"
|

\alpha

| is a "learning rate" parameter.
|- valign="top"
|

\lambda

| is a parameter that affects how much the present difference in board evaluations should feed back to previous estimates.

\lambda = 0

makes the program correct only the previous turn's estimate;

\lambda = 1

makes the program attempt to correct the estimates on all previous turns; and values of

\lambda

between 0 and 1 specify different rates at which the importance of older estimates should "decay" with time.
|- valign="top"
|

\nabla_w Y_k

| is the gradient of neural-network output with respect to weights: that is, how much changing the weight affects the output.〔
|}

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「TD-Gammon」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース