一.google論文系列
1. google系列論文譯序
2. The anatomy of a large-scale hypertextual Web search engine (譯 zz)
3. web search for a planet :the google cluster architecture(譯)
5. MapReduce: Simplied Data Processing on Large Clusters (譯)
6. Bigtable: A Distributed Storage System for Structured Data (譯)
7. Chubby: The Chubby lock service for loosely-coupled distributed systems (譯)
8. Sawzall:Interpreting the Data--Parallel Analysis with Sawzall (譯 zz)
9. Pregel: A System for Large-Scale Graph Processing (譯)
10. Dremel: Interactive Analysis of WebScale Datasets(譯zz)
11. Percolator: Large-scale Incremental Processing Using Distributed Transactions and Notifications(譯zz)
12. MegaStore: Providing Scalable, Highly Available Storage for Interactive Services(譯zz)
13. Case Study GFS: Evolution on Fast-forward (譯)
14. Google File System II: Dawn of the Multiplying Master Nodes
15. Tenzing - A SQL Implementation on the MapReduce Framework (譯)
16. F1-The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
17. Elmo: Building a Globally Distributed, Highly Available Database
18. PowerDrill:Processing a Trillion Cells per Mouse Click
19. Google-Wide Profiling:A Continuous Profiling Infrastructure for Data Centers
20. Spanner: Google’s Globally-Distributed Database(譯zz)
21. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure(筆記)
22. Omega: flexible, scalable schedulers for large compute clusters
23. CPI2: CPU performance isolation for shared compute clusters
24. Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams(譯)
25. F1: A Distributed SQL Database That Scales
26. MillWheel: Fault-Tolerant Stream Processing at Internet Scale(譯)
27. B4: Experience with a Globally-Deployed Software Defined WAN
28. The Datacenter as a Computer
29. Google brain-Building High-level Features Using Large Scale Unsupervised Learning
30. Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing(譯zz)
31. Large-scale cluster management at Google with Borg
二.分布式理論系列
00. Appraising Two Decades of Distributed Computing Theory Research
0. 分布式理論系列譯序
1. A brief history of Consensus_ 2PC and Transaction Commit (譯)
2. 拜占庭將軍問題 (譯) --Leslie Lamport
3. Impossibility of distributed consensus with one faulty process (譯)
5. Time Clocks and the Ordering of Events in a Distributed System(譯) --Leslie Lamport
6. 關於Paxos的歷史
7. The Part Time Parliament (譯 zz) --Leslie Lamport
8. How to Build a Highly Available System Using Consensus(譯)
9. Paxos Made Simple (譯) --Leslie Lamport
10. Paxos Made Live - An Engineering Perspective(譯)
15. Single-Message Communication(譯)
17. Problems, Unsolved Problems and Problems in Concurrency
18. Hints for Computer System Design
20. Wait-Free Synchronization
21. White Paper Introduction to IEEE 1588 & Transparent Clocks
23. Life beyond Distributed Transactions:an Apostate’s Opinion(譯zz)
三.數據庫理論系列
0. A Relational Model of Data for Large Shared Data Banks --E.F.Codd 1970
1. SEQUEL:A Structured English Query Language 1974
2. Implentation of a Structured English Query Language 1975
3. A System R: Relational Approach to Database Management 1976
4. Granularity of Locks and Degrees of Consistency in a Shared DataBase --Jim Gray 1976
5. Access Path Selection in a RDBMS 1979
6. The Transaction Concept:Virtues and Limitations --Jim Gray7. 2pc-2階段提交:Notes on Data Base Operating Systems --Jim Gray
8. 3pc-3階段提交:NONBLOCKING COMMIT PROTOCOLS
9. MVCC:Multiversion Concurrency Control-Theory and Algorithms --1983
10. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging-199215. Architecture of a Database System(譯zz) -Joseph M. Hellerstein, Michael Stonebraker, James Hamilton
四.大規模存儲與計算(NoSql理論系列)
0. Towards Robust Distributed Systems:Brewer's 2000 PODC key notes
1. CAP理論
2. Harvest, Yield, and Scalable Tolerant Systems
3. 關於CAP
4. BASE模型:BASE an Acid Alternative
5. 最終一致性
6. 可擴展性設計模式
7. 可伸縮性原則
8. NoSql生態系統
9. scalability-availability-stability-patterns
10. The 5 Minute Rule and the 5 Byte Rule (譯)
11. The Five-Minute Rule Ten Years Later and Other Computer Storage Rules of Thumb
12. The Five-Minute Rule 20 Years Later(and How Flash Memory Changes the Rules)
13. 關於MapReduce的爭論
16. MapReduce和並行數據庫,朋友還是敵人?(zz)
17. MapReduce and Parallel DBMSs-Friends or Foes (譯)
18. MapReduce:A Flexible Data Processing Tool (譯)
19. A Comparision of Approaches to Large-Scale Data Analysis (譯)
22. Map-Reduce-Merge: simplified relational data processing on large clusters
23. MapReduce Online
24. Graph Twiddling in a MapReduce World
25. Spark: Cluster Computing with Working Sets
26. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
27. Big Data Lambda Architecture
28. The 8 Requirements of Real-Time Stream Processing
29. The Log: What every software engineer should know about real-time data's unifying abstraction
30. Lessons from Giant-Scale Services
五.基本算法和數據結構
2. 大數據量,海量數據處理方法總結(續)
3. Consistent Hashing And Random Trees
4. Merkle Trees
5. Scalable Bloom Filters
6. Introduction to Distributed Hash Tables
7. B-Trees and Relational Database Systems
8. The log-structured merge-tree (譯)
10. Data Structures for Spatial Database
11. Gossip
13. The Graph Traversal Pattern
六.基本系統和實踐經驗
2. Dynamo: Amazon’s Highly Available Key-value Store (譯zz)
3. Cassandra - A Decentralized Structured Storage System (譯zz)
4. PNUTS: Yahoo!’s Hosted Data Serving Platform (譯zz)
5. Yahoo!的分布式數據平台PNUTS簡介及感悟(zz)
6. LevelDB:一個快速輕量級的key-value存儲庫(譯)
7. LevelDB理論基礎
11. Sawzall原理與應用
12. Storm原理與實現
13. Designs, Lessons and Advice from Building Large Distributed Systems --Jeff Dean
14. Challenges in Building Large-Scale Information Retrieval Systems --Jeff Dean
15. Experiences with MapReduce, an Abstraction for Large-Scale Computation --Jeff Dean
16. Taming Service Variability,Building Worldwide Systems,and Scaling Deep Learning --Jeff Dean
17. Large-Scale Data and Computation:Challenges and Opportunitis --Jeff Dean
18. Achieving Rapid Response Times in Large Online Services --Jeff Dean
19. The Tail at Scale(譯) --Jeff Dean & Luiz André Barroso
20. How To Design A Good API and Why it Matters
21. Event-Based Systems:Architect's Dream or Developer's Nightmare?
22. Autopilot: Automatic Data Center Management
七.其他輔助系統
1. The ganglia distributed monitoring system:design, implementation, and experience
2. Chukwa: A large-scale monitoring system
3. Scribe : a way to aggregate data and why not, to directly fill the HDFS?
4. Benchmarking Cloud Serving Systems with YCSB
5. Dynamo Dremel ZooKeeper Hive 簡述
八. Hadoop相關
1. The Hadoop Distributed File System(譯)
2. HDFS scalability:the limits to growth(譯)
3. Name-node memory size estimates and optimization proposal.
5. HFile:A Block-Indexed File Format to Store Sorted Key-Value Pairs
6. HFile V2
7. Hive - A Warehousing Solution Over a Map-Reduce Framework
8. Hive – A Petabyte Scale Data Warehouse Using Hadoop
10. ZooKeeper: Wait-free coordination for Internet-scale systems
11. The life and times of a zookeeper
13. Apache Hadoop Goes Realtime at Facebook (譯)
14. Hadoop平台優化綜述(zz)
15. The Anatomy of Hadoop I/O Pipeline (譯)
17. 下一代Apache Hadoop MapReduce
九.深入理解計算機系統
十.其他
On Computable Numbers with an Application to the Entscheidungsproblem-1936.5.28-A.M.Turing
The First Draft Report on the EDVAC-1945.6.30-John von Neumann
Reflections on Trusting Trust --Ken Thompson
Who Needs an Architect?
Go To statements considered harmfull --Edsger W.Dijkstra
No Silver Bullet Essence and Accidents of Software Engineering --Frederick P. Brooks
參考:
1 http://duanple.blog.163.com/blog/static/709717672011330101333271/
2 http://blog.nosqlfan.com/html/1647.html
3 https://github.com/mmcgrana/services-engineering
4 http://blog.jobbole.com/84575/