3

votes

1

answer

161

views

0

votes

0

answers

81

views

4 months ago by
Bruce
▴
45

0

votes

0

answers

83

views

0

votes

0

answers

87

views

0

votes

0

answers

73

views

0

votes

0

answers

74

views

0

votes

0

answers

87

views

4 months ago by
Roman
▴
15

0

votes

0

answers

98

views

0

votes

0

answers

93

views

1

vote

0

answers

90

views

3

votes

1

answer

221

views

2

votes

0

answers

244

views

10

votes

0

answers

199

views

0

votes

0

answers

154

views

updated 16 months ago by
Admin User
1
•
written 16 months ago by
Dustin

0

votes

0

answers

152

views

Interpretation of convexity lemma

Best-of-All-Worlds Bounds for Online Learning with Feedback Graphs

Best-of-All-Worlds Bounds for Online Learning with Feedback Graphs

updated 16 months ago by
Admin User
1
•
written 16 months ago by
Dustin

0

votes

0

answers

137

views

Generalizing attention length beyond training data length

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

updated 16 months ago by
Admin User
1
•
written 16 months ago by
Dustin

Recent Awards •
All

Invitee to
Yuval Filmus
▴
15

Founding Contributor to
FrankS
▴
55

Invitee to
marco.fellous-asiani
▴
15

Founding Contributor to
Ken
▴
45

Recent Replies

Answer: Implementing an algorithm
by
Andy
46

Use a distibution that's symmetric around the origin and normalize the results so they lie on the sphere. E.g. you can use a Gaussian. He…

Answer: Application of Property RD
by
Dustin
125

It is used to prove the bound in proposition 4.6 which is then used crucially in proposition 4.7 to prove the uniform boundedness of the op…

Traffic: 1 users visited in the last hour